This repository has no description
0

Configure Feed

Select the types of activity you want to include in your feed.

at release-4.15.1 194 lines 9.5 kB View raw View rendered
1# Android Integration Requirements 2 3This document explains what the kima Android app must implement to consume the models in `models/`. All requirements are derived from working code in `reference_code/` — copy or adapt those files as needed. 4 5## Critical: camera config + activity orientation 6 7The recommended model `yolo26n_v11_rect_512x384.tflite` expects a **landscape 4:3** input. If the camera delivers any other aspect ratio or orientation, the detector's letterbox math will pad with gray and waste capacity. The square models (`square_416`, `square_512`) tolerate any aspect ratio gracefully via letterboxing, but they're slower for the same effective resolution. 8 9### 1. Pin CameraX `ImageAnalysis` to 4:3 10 11In whatever Compose / Activity sets up the `CameraX` use cases, the `ImageAnalysis.Builder` needs `setTargetAspectRatio(AspectRatio.RATIO_4_3)`: 12 13```kotlin 14import androidx.camera.core.AspectRatio 15import androidx.camera.core.ImageAnalysis 16 17val imageAnalysis = ImageAnalysis.Builder() 18 .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST) 19 .setOutputImageFormat(ImageAnalysis.OUTPUT_IMAGE_FORMAT_RGBA_8888) 20 .setTargetAspectRatio(AspectRatio.RATIO_4_3) // ← CRITICAL for the rect model 21 .build() 22``` 23 24`setTargetAspectRatio` is a *hint* — CameraX picks the closest available 4:3 mode (typically 1280×960, 1440×1080, or 2048×1536 depending on the device). The actual W×H still varies, but the *aspect* is guaranteed. 25 26If a device can't deliver 4:3 (rare), the existing letterbox math in the detector handles it gracefully — at worst, the model wastes some pixels on padding and quality drops slightly, but it doesn't crash. 27 28See `reference_code/CameraView_snippet.kt` for the full surrounding context. 29 30### 2. Lock the activity to landscape 31 32Otherwise the camera frame arrives portrait-rotated and a 384×512 landscape model gets a 512×384 portrait input — wrong aspect, large letterbox bars. Add to the main activity in `AndroidManifest.xml`: 33 34```xml 35<activity 36 android:name=".MainActivity" 37 android:screenOrientation="landscape" 38 ...> 39``` 40 41See `reference_code/AndroidManifest_snippet.xml` for an example. 42 43**Alternative if landscape lock isn't acceptable** (e.g., kima needs portrait support too): inside the detector, rotate the bitmap by 90° before resizing if `imgH > imgW`. Adds minimal CPU cost. Implementation sketch in `LESSONS_LEARNED.md`. 44 45## Critical: letterbox-aware preprocessing in the detector 46 47YOLO models are trained on **letterboxed** input — the source image is scaled to fit the model's input shape while preserving aspect ratio, then padded with gray (RGB 114,114,114) to fill any remaining space. **The Android detector MUST do the same**, otherwise the model's geometry priors are violated and detection quality plummets. 48 49If the camera aspect already matches the model aspect (4:3 camera + 4:3 model), the letterbox math degenerates to a plain resize with `padX=0, padY=0` — but you still need the math in place for the cases where it doesn't match. 50 51### Reference implementation 52 53`reference_code/ImageDetector.android.kt` is the working letterbox-aware detector from this project. Key points: 54 55```kotlin 56// Letterbox: scale source to fit model input while preserving aspect ratio. 57val scale = min( 58 inputW.toFloat() / imgW.toFloat(), 59 inputH.toFloat() / imgH.toFloat() 60) 61val scaledW = (imgW * scale).roundToInt().coerceAtLeast(1) 62val scaledH = (imgH * scale).roundToInt().coerceAtLeast(1) 63val padX = (inputW - scaledW) / 2f 64val padY = (inputH - scaledH) / 2f 65 66val resized = Bitmap.createBitmap(inputW, inputH, Bitmap.Config.ARGB_8888) 67val canvas = Canvas(resized) 68canvas.drawColor(Color.rgb(114, 114, 114)) // ← gray fill matching YOLO training 69canvas.drawBitmap( 70 srcBitmap, null, 71 RectF(padX, padY, padX + scaledW, padY + scaledH), 72 Paint(Paint.FILTER_BITMAP_FLAG) 73) 74``` 75 76After inference, the output bboxes (which are normalized `[0, 1]` over the **letterboxed** input tensor) need to be **un-letterboxed** back to the original image's coordinate space: 77 78```kotlin 79val x1pLb = min(x1, x2) * inputW // letterboxed pixel coords 80val y1pLb = min(y1, y2) * inputH 81val x2pLb = max(x1, x2) * inputW 82val y2pLb = max(y1, y2) * inputH 83 84// Subtract padding offset, divide by scale ratio → original image coords 85val left = ((x1pLb - padX) / scale).coerceIn(0f, imgWF) 86val top = ((y1pLb - padY) / scale).coerceIn(0f, imgHF) 87val right = ((x2pLb - padX) / scale).coerceIn(0f, imgWF) 88val bottom = ((y2pLb - padY) / scale).coerceIn(0f, imgHF) 89``` 90 91If you skip the un-letterbox step, bboxes will be slightly offset and scaled wrong — most visibly at the edges of the frame. 92 93## TFLite interpreter setup 94 95The reference `CustomObjectModel.android.kt` shows the full delegate fallback chain (GPU → NNAPI → CPU) with logging. Key snippet: 96 97```kotlin 98import org.tensorflow.lite.Interpreter 99import org.tensorflow.lite.gpu.GpuDelegate 100 101var selectedDelegate = "CPU" 102val (options, gpuDelegate) = runCatching { 103 val delegate = GpuDelegate() 104 val opts = Interpreter.Options().apply { 105 addDelegate(delegate) 106 setNumThreads(2) 107 } 108 selectedDelegate = "GPU" 109 Logger.i { "TFLite: GPU delegate constructed" } 110 opts to delegate 111}.onFailure { t -> 112 Logger.w(t) { "TFLite: GPU delegate not available; trying NNAPI" } 113}.getOrElse { 114 // fall back to NNAPI, then CPU 4-thread 115 // ... see reference_code/CustomObjectModel.android.kt 116} 117 118val interpreter = Interpreter(model, options) 119 120// IMPORTANT: log the actual loaded shape so you can verify GPU is engaging 121val inputShape = interpreter.getInputTensor(0)?.shape() 122val outputShape = interpreter.getOutputTensor(0)?.shape() 123Logger.i { 124 "TFLite: model='$path' delegate=$selectedDelegate " + 125 "inputShape=${inputShape?.toList()} outputShape=${outputShape?.toList()}" 126} 127``` 128 129**Verify after install**: `adb logcat -s TFLite:I` after launching the app should show one line per model load. Confirm: 130- `delegate=GPU` (not CPU — would mean ~10× slower) 131- `inputShape=[1, 384, 512, 3]` for `yolo26n_v11_rect_512x384.tflite` (or match the chosen model) 132 133## Output parsing 134 135All models output `[1, 300, 6]` post-NMS: 136 137```kotlin 138val outputShape = interpreter.getOutputTensor(0).shape() // [1, 300, 6] 139val output = TensorBuffer.createFixedSize(outputShape, DataType.FLOAT32) 140interpreter.run(tensorImage.buffer, output.buffer) 141val array = output.floatArray // 1800 floats 142 143// Layout: 300 rows of 6 columns each, sorted by confidence descending 144// Row i: array[i*6 .. i*6+5] = [x1, y1, x2, y2, conf, cls] 145for (i in 0 until 300) { 146 val row = i * 6 147 val conf = array[row + 4] 148 if (conf < 0.25f) break // remaining rows are empty/zero 149 val x1 = array[row + 0] 150 val y1 = array[row + 1] 151 val x2 = array[row + 2] 152 val y2 = array[row + 3] 153 val cls = array[row + 5].toInt() // 0=basketball, 1=basketball_hoop 154 // (then un-letterbox per the snippet above and emit a detection) 155} 156``` 157 158The reference `ImageDetector.android.kt` handles a subtle wrinkle: depending on how ultralytics happens to export, the output may be `[1, 300, 6]` (300 detections, 6 fields each) OR `[1, 6, 300]` (6 fields, 300 detections each). The reference code auto-detects via the `dim2 == 6 ? elements-first : channels-first` check. Worth keeping that flexibility. 159 160## Permissions + dependencies 161 162The kima app likely already has these, but for completeness: 163 164```xml 165<!-- AndroidManifest.xml --> 166<uses-permission android:name="android.permission.CAMERA" /> 167<uses-feature android:name="android.hardware.camera" android:required="false" /> 168``` 169 170```kotlin 171// build.gradle.kts dependencies 172implementation("androidx.camera:camera-core:1.x.x") 173implementation("androidx.camera:camera-camera2:1.x.x") 174implementation("androidx.camera:camera-lifecycle:1.x.x") 175implementation("androidx.camera:camera-view:1.x.x") 176 177implementation("org.tensorflow:tensorflow-lite:2.x.x") 178implementation("org.tensorflow:tensorflow-lite-gpu:2.x.x") 179implementation("org.tensorflow:tensorflow-lite-support:0.x.x") 180``` 181 182## Common gotchas 183 1841. **Wrong delegate** — if the GPU delegate fails silently (e.g. on some emulators or low-end devices) and falls back to CPU, inference goes from ~7 Hz to ~1 Hz. The logging in `CustomObjectModel.android.kt` makes this visible. 185 1862. **Wrong input dtype** — input is **float32** even for the fp16-internal models. The fp16 versions only have fp16 weights internally; their I/O tensors are still float32. Don't try to feed Int8/Uint8 buffers without re-quantizing. 187 1883. **Pixel normalization** — YOLO expects pixel values in `[0, 1]`, not `[0, 255]`. The reference uses `NormalizeOp(0f, 255f)` which performs `(x - 0) / 255 → [0, 1]`. Skipping this is a common cause of "model loads fine, detections all garbage". 189 1904. **NMS double-application** — these models have NMS baked in. **Don't run another NMS pass on the output** — it'll discard valid detections. 191 1925. **Coordinate system after rotation** — if you rotate the bitmap to landscape inside the detector (Option B from above), you also need to track that rotation when emitting bounding box coordinates back to the rest of the app, otherwise downstream consumers see boxes in the wrong reference frame. 193 1946. **Filename probe before shipping** — see `LESSONS_LEARNED.md` and the `MODELS.md` shape table. Always run `tf.lite.Interpreter` against any tflite once after copying it to verify the input shape matches what you expect. Filenames have lied before in this project.