This repository has no description
1# Android Integration Requirements
2
3This document explains what the kima Android app must implement to consume the models in `models/`. All requirements are derived from working code in `reference_code/` — copy or adapt those files as needed.
4
5## Critical: camera config + activity orientation
6
7The recommended model `yolo26n_v11_rect_512x384.tflite` expects a **landscape 4:3** input. If the camera delivers any other aspect ratio or orientation, the detector's letterbox math will pad with gray and waste capacity. The square models (`square_416`, `square_512`) tolerate any aspect ratio gracefully via letterboxing, but they're slower for the same effective resolution.
8
9### 1. Pin CameraX `ImageAnalysis` to 4:3
10
11In whatever Compose / Activity sets up the `CameraX` use cases, the `ImageAnalysis.Builder` needs `setTargetAspectRatio(AspectRatio.RATIO_4_3)`:
12
13```kotlin
14import androidx.camera.core.AspectRatio
15import androidx.camera.core.ImageAnalysis
16
17val imageAnalysis = ImageAnalysis.Builder()
18 .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
19 .setOutputImageFormat(ImageAnalysis.OUTPUT_IMAGE_FORMAT_RGBA_8888)
20 .setTargetAspectRatio(AspectRatio.RATIO_4_3) // ← CRITICAL for the rect model
21 .build()
22```
23
24`setTargetAspectRatio` is a *hint* — CameraX picks the closest available 4:3 mode (typically 1280×960, 1440×1080, or 2048×1536 depending on the device). The actual W×H still varies, but the *aspect* is guaranteed.
25
26If a device can't deliver 4:3 (rare), the existing letterbox math in the detector handles it gracefully — at worst, the model wastes some pixels on padding and quality drops slightly, but it doesn't crash.
27
28See `reference_code/CameraView_snippet.kt` for the full surrounding context.
29
30### 2. Lock the activity to landscape
31
32Otherwise the camera frame arrives portrait-rotated and a 384×512 landscape model gets a 512×384 portrait input — wrong aspect, large letterbox bars. Add to the main activity in `AndroidManifest.xml`:
33
34```xml
35<activity
36 android:name=".MainActivity"
37 android:screenOrientation="landscape"
38 ...>
39```
40
41See `reference_code/AndroidManifest_snippet.xml` for an example.
42
43**Alternative if landscape lock isn't acceptable** (e.g., kima needs portrait support too): inside the detector, rotate the bitmap by 90° before resizing if `imgH > imgW`. Adds minimal CPU cost. Implementation sketch in `LESSONS_LEARNED.md`.
44
45## Critical: letterbox-aware preprocessing in the detector
46
47YOLO models are trained on **letterboxed** input — the source image is scaled to fit the model's input shape while preserving aspect ratio, then padded with gray (RGB 114,114,114) to fill any remaining space. **The Android detector MUST do the same**, otherwise the model's geometry priors are violated and detection quality plummets.
48
49If the camera aspect already matches the model aspect (4:3 camera + 4:3 model), the letterbox math degenerates to a plain resize with `padX=0, padY=0` — but you still need the math in place for the cases where it doesn't match.
50
51### Reference implementation
52
53`reference_code/ImageDetector.android.kt` is the working letterbox-aware detector from this project. Key points:
54
55```kotlin
56// Letterbox: scale source to fit model input while preserving aspect ratio.
57val scale = min(
58 inputW.toFloat() / imgW.toFloat(),
59 inputH.toFloat() / imgH.toFloat()
60)
61val scaledW = (imgW * scale).roundToInt().coerceAtLeast(1)
62val scaledH = (imgH * scale).roundToInt().coerceAtLeast(1)
63val padX = (inputW - scaledW) / 2f
64val padY = (inputH - scaledH) / 2f
65
66val resized = Bitmap.createBitmap(inputW, inputH, Bitmap.Config.ARGB_8888)
67val canvas = Canvas(resized)
68canvas.drawColor(Color.rgb(114, 114, 114)) // ← gray fill matching YOLO training
69canvas.drawBitmap(
70 srcBitmap, null,
71 RectF(padX, padY, padX + scaledW, padY + scaledH),
72 Paint(Paint.FILTER_BITMAP_FLAG)
73)
74```
75
76After inference, the output bboxes (which are normalized `[0, 1]` over the **letterboxed** input tensor) need to be **un-letterboxed** back to the original image's coordinate space:
77
78```kotlin
79val x1pLb = min(x1, x2) * inputW // letterboxed pixel coords
80val y1pLb = min(y1, y2) * inputH
81val x2pLb = max(x1, x2) * inputW
82val y2pLb = max(y1, y2) * inputH
83
84// Subtract padding offset, divide by scale ratio → original image coords
85val left = ((x1pLb - padX) / scale).coerceIn(0f, imgWF)
86val top = ((y1pLb - padY) / scale).coerceIn(0f, imgHF)
87val right = ((x2pLb - padX) / scale).coerceIn(0f, imgWF)
88val bottom = ((y2pLb - padY) / scale).coerceIn(0f, imgHF)
89```
90
91If you skip the un-letterbox step, bboxes will be slightly offset and scaled wrong — most visibly at the edges of the frame.
92
93## TFLite interpreter setup
94
95The reference `CustomObjectModel.android.kt` shows the full delegate fallback chain (GPU → NNAPI → CPU) with logging. Key snippet:
96
97```kotlin
98import org.tensorflow.lite.Interpreter
99import org.tensorflow.lite.gpu.GpuDelegate
100
101var selectedDelegate = "CPU"
102val (options, gpuDelegate) = runCatching {
103 val delegate = GpuDelegate()
104 val opts = Interpreter.Options().apply {
105 addDelegate(delegate)
106 setNumThreads(2)
107 }
108 selectedDelegate = "GPU"
109 Logger.i { "TFLite: GPU delegate constructed" }
110 opts to delegate
111}.onFailure { t ->
112 Logger.w(t) { "TFLite: GPU delegate not available; trying NNAPI" }
113}.getOrElse {
114 // fall back to NNAPI, then CPU 4-thread
115 // ... see reference_code/CustomObjectModel.android.kt
116}
117
118val interpreter = Interpreter(model, options)
119
120// IMPORTANT: log the actual loaded shape so you can verify GPU is engaging
121val inputShape = interpreter.getInputTensor(0)?.shape()
122val outputShape = interpreter.getOutputTensor(0)?.shape()
123Logger.i {
124 "TFLite: model='$path' delegate=$selectedDelegate " +
125 "inputShape=${inputShape?.toList()} outputShape=${outputShape?.toList()}"
126}
127```
128
129**Verify after install**: `adb logcat -s TFLite:I` after launching the app should show one line per model load. Confirm:
130- `delegate=GPU` (not CPU — would mean ~10× slower)
131- `inputShape=[1, 384, 512, 3]` for `yolo26n_v11_rect_512x384.tflite` (or match the chosen model)
132
133## Output parsing
134
135All models output `[1, 300, 6]` post-NMS:
136
137```kotlin
138val outputShape = interpreter.getOutputTensor(0).shape() // [1, 300, 6]
139val output = TensorBuffer.createFixedSize(outputShape, DataType.FLOAT32)
140interpreter.run(tensorImage.buffer, output.buffer)
141val array = output.floatArray // 1800 floats
142
143// Layout: 300 rows of 6 columns each, sorted by confidence descending
144// Row i: array[i*6 .. i*6+5] = [x1, y1, x2, y2, conf, cls]
145for (i in 0 until 300) {
146 val row = i * 6
147 val conf = array[row + 4]
148 if (conf < 0.25f) break // remaining rows are empty/zero
149 val x1 = array[row + 0]
150 val y1 = array[row + 1]
151 val x2 = array[row + 2]
152 val y2 = array[row + 3]
153 val cls = array[row + 5].toInt() // 0=basketball, 1=basketball_hoop
154 // (then un-letterbox per the snippet above and emit a detection)
155}
156```
157
158The reference `ImageDetector.android.kt` handles a subtle wrinkle: depending on how ultralytics happens to export, the output may be `[1, 300, 6]` (300 detections, 6 fields each) OR `[1, 6, 300]` (6 fields, 300 detections each). The reference code auto-detects via the `dim2 == 6 ? elements-first : channels-first` check. Worth keeping that flexibility.
159
160## Permissions + dependencies
161
162The kima app likely already has these, but for completeness:
163
164```xml
165<!-- AndroidManifest.xml -->
166<uses-permission android:name="android.permission.CAMERA" />
167<uses-feature android:name="android.hardware.camera" android:required="false" />
168```
169
170```kotlin
171// build.gradle.kts dependencies
172implementation("androidx.camera:camera-core:1.x.x")
173implementation("androidx.camera:camera-camera2:1.x.x")
174implementation("androidx.camera:camera-lifecycle:1.x.x")
175implementation("androidx.camera:camera-view:1.x.x")
176
177implementation("org.tensorflow:tensorflow-lite:2.x.x")
178implementation("org.tensorflow:tensorflow-lite-gpu:2.x.x")
179implementation("org.tensorflow:tensorflow-lite-support:0.x.x")
180```
181
182## Common gotchas
183
1841. **Wrong delegate** — if the GPU delegate fails silently (e.g. on some emulators or low-end devices) and falls back to CPU, inference goes from ~7 Hz to ~1 Hz. The logging in `CustomObjectModel.android.kt` makes this visible.
185
1862. **Wrong input dtype** — input is **float32** even for the fp16-internal models. The fp16 versions only have fp16 weights internally; their I/O tensors are still float32. Don't try to feed Int8/Uint8 buffers without re-quantizing.
187
1883. **Pixel normalization** — YOLO expects pixel values in `[0, 1]`, not `[0, 255]`. The reference uses `NormalizeOp(0f, 255f)` which performs `(x - 0) / 255 → [0, 1]`. Skipping this is a common cause of "model loads fine, detections all garbage".
189
1904. **NMS double-application** — these models have NMS baked in. **Don't run another NMS pass on the output** — it'll discard valid detections.
191
1925. **Coordinate system after rotation** — if you rotate the bitmap to landscape inside the detector (Option B from above), you also need to track that rotation when emitting bounding box coordinates back to the rest of the app, otherwise downstream consumers see boxes in the wrong reference frame.
193
1946. **Filename probe before shipping** — see `LESSONS_LEARNED.md` and the `MODELS.md` shape table. Always run `tf.lite.Interpreter` against any tflite once after copying it to verify the input shape matches what you expect. Filenames have lied before in this project.