handover/ANDROID_INTEGRATION.md at release-4.15.1 · nateholland.bsky.social/PoseDetection

nateholland.bsky.social / PoseDetection
Fork 0
This repository has no description
Fork 0
PoseDetection / handover / ANDROID_INTEGRATION.md
at release-4.15.1 194 lines 9.5 kB View raw View rendered
wrap content
virtualintern feat: use camera input aspect ratio for object detection 2mo ago
af9479b8
  1# Android Integration Requirements
  2
  3This document explains what the kima Android app must implement to consume the models in `models/`. All requirements are derived from working code in `reference_code/` — copy or adapt those files as needed.
  4
  5## Critical: camera config + activity orientation
  6
  7The recommended model `yolo26n_v11_rect_512x384.tflite` expects a **landscape 4:3** input. If the camera delivers any other aspect ratio or orientation, the detector's letterbox math will pad with gray and waste capacity. The square models (`square_416`, `square_512`) tolerate any aspect ratio gracefully via letterboxing, but they're slower for the same effective resolution.
  8
  9### 1. Pin CameraX `ImageAnalysis` to 4:3
 10
 11In whatever Compose / Activity sets up the `CameraX` use cases, the `ImageAnalysis.Builder` needs `setTargetAspectRatio(AspectRatio.RATIO_4_3)`:
 12
 13```kotlin
 14import androidx.camera.core.AspectRatio
 15import androidx.camera.core.ImageAnalysis
 16
 17val imageAnalysis = ImageAnalysis.Builder()
 18    .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
 19    .setOutputImageFormat(ImageAnalysis.OUTPUT_IMAGE_FORMAT_RGBA_8888)
 20    .setTargetAspectRatio(AspectRatio.RATIO_4_3)   // ← CRITICAL for the rect model
 21    .build()
 22```
 23
 24`setTargetAspectRatio` is a *hint* — CameraX picks the closest available 4:3 mode (typically 1280×960, 1440×1080, or 2048×1536 depending on the device). The actual W×H still varies, but the *aspect* is guaranteed.
 25
 26If a device can't deliver 4:3 (rare), the existing letterbox math in the detector handles it gracefully — at worst, the model wastes some pixels on padding and quality drops slightly, but it doesn't crash.
 27
 28See `reference_code/CameraView_snippet.kt` for the full surrounding context.
 29
 30### 2. Lock the activity to landscape
 31
 32Otherwise the camera frame arrives portrait-rotated and a 384×512 landscape model gets a 512×384 portrait input — wrong aspect, large letterbox bars. Add to the main activity in `AndroidManifest.xml`:
 33
 34```xml
 35<activity
 36    android:name=".MainActivity"
 37    android:screenOrientation="landscape"
 38    ...>
 39```
 40
 41See `reference_code/AndroidManifest_snippet.xml` for an example.
 42
 43**Alternative if landscape lock isn't acceptable** (e.g., kima needs portrait support too): inside the detector, rotate the bitmap by 90° before resizing if `imgH > imgW`. Adds minimal CPU cost. Implementation sketch in `LESSONS_LEARNED.md`.
 44
 45## Critical: letterbox-aware preprocessing in the detector
 46
 47YOLO models are trained on **letterboxed** input — the source image is scaled to fit the model's input shape while preserving aspect ratio, then padded with gray (RGB 114,114,114) to fill any remaining space. **The Android detector MUST do the same**, otherwise the model's geometry priors are violated and detection quality plummets.
 48
 49If the camera aspect already matches the model aspect (4:3 camera + 4:3 model), the letterbox math degenerates to a plain resize with `padX=0, padY=0` — but you still need the math in place for the cases where it doesn't match.
 50
 51### Reference implementation
 52
 53`reference_code/ImageDetector.android.kt` is the working letterbox-aware detector from this project. Key points:
 54
 55```kotlin
 56// Letterbox: scale source to fit model input while preserving aspect ratio.
 57val scale = min(
 58    inputW.toFloat() / imgW.toFloat(),
 59    inputH.toFloat() / imgH.toFloat()
 60)
 61val scaledW = (imgW * scale).roundToInt().coerceAtLeast(1)
 62val scaledH = (imgH * scale).roundToInt().coerceAtLeast(1)
 63val padX = (inputW - scaledW) / 2f
 64val padY = (inputH - scaledH) / 2f
 65
 66val resized = Bitmap.createBitmap(inputW, inputH, Bitmap.Config.ARGB_8888)
 67val canvas = Canvas(resized)
 68canvas.drawColor(Color.rgb(114, 114, 114))   // ← gray fill matching YOLO training
 69canvas.drawBitmap(
 70    srcBitmap, null,
 71    RectF(padX, padY, padX + scaledW, padY + scaledH),
 72    Paint(Paint.FILTER_BITMAP_FLAG)
 73)
 74```
 75
 76After inference, the output bboxes (which are normalized `[0, 1]` over the **letterboxed** input tensor) need to be **un-letterboxed** back to the original image's coordinate space:
 77
 78```kotlin
 79val x1pLb = min(x1, x2) * inputW   // letterboxed pixel coords
 80val y1pLb = min(y1, y2) * inputH
 81val x2pLb = max(x1, x2) * inputW
 82val y2pLb = max(y1, y2) * inputH
 83
 84// Subtract padding offset, divide by scale ratio → original image coords
 85val left = ((x1pLb - padX) / scale).coerceIn(0f, imgWF)
 86val top = ((y1pLb - padY) / scale).coerceIn(0f, imgHF)
 87val right = ((x2pLb - padX) / scale).coerceIn(0f, imgWF)
 88val bottom = ((y2pLb - padY) / scale).coerceIn(0f, imgHF)
 89```
 90
 91If you skip the un-letterbox step, bboxes will be slightly offset and scaled wrong — most visibly at the edges of the frame.
 92
 93## TFLite interpreter setup
 94
 95The reference `CustomObjectModel.android.kt` shows the full delegate fallback chain (GPU → NNAPI → CPU) with logging. Key snippet:
 96
 97```kotlin
 98import org.tensorflow.lite.Interpreter
 99import org.tensorflow.lite.gpu.GpuDelegate
100
101var selectedDelegate = "CPU"
102val (options, gpuDelegate) = runCatching {
103    val delegate = GpuDelegate()
104    val opts = Interpreter.Options().apply {
105        addDelegate(delegate)
106        setNumThreads(2)
107    }
108    selectedDelegate = "GPU"
109    Logger.i { "TFLite: GPU delegate constructed" }
110    opts to delegate
111}.onFailure { t ->
112    Logger.w(t) { "TFLite: GPU delegate not available; trying NNAPI" }
113}.getOrElse {
114    // fall back to NNAPI, then CPU 4-thread
115    // ... see reference_code/CustomObjectModel.android.kt
116}
117
118val interpreter = Interpreter(model, options)
119
120// IMPORTANT: log the actual loaded shape so you can verify GPU is engaging
121val inputShape = interpreter.getInputTensor(0)?.shape()
122val outputShape = interpreter.getOutputTensor(0)?.shape()
123Logger.i {
124    "TFLite: model='$path' delegate=$selectedDelegate " +
125        "inputShape=${inputShape?.toList()} outputShape=${outputShape?.toList()}"
126}
127```
128
129**Verify after install**: `adb logcat -s TFLite:I` after launching the app should show one line per model load. Confirm:
130- `delegate=GPU` (not CPU — would mean ~10× slower)
131- `inputShape=[1, 384, 512, 3]` for `yolo26n_v11_rect_512x384.tflite` (or match the chosen model)
132
133## Output parsing
134
135All models output `[1, 300, 6]` post-NMS:
136
137```kotlin
138val outputShape = interpreter.getOutputTensor(0).shape()  // [1, 300, 6]
139val output = TensorBuffer.createFixedSize(outputShape, DataType.FLOAT32)
140interpreter.run(tensorImage.buffer, output.buffer)
141val array = output.floatArray   // 1800 floats
142
143// Layout: 300 rows of 6 columns each, sorted by confidence descending
144// Row i: array[i*6 .. i*6+5] = [x1, y1, x2, y2, conf, cls]
145for (i in 0 until 300) {
146    val row = i * 6
147    val conf = array[row + 4]
148    if (conf < 0.25f) break  // remaining rows are empty/zero
149    val x1 = array[row + 0]
150    val y1 = array[row + 1]
151    val x2 = array[row + 2]
152    val y2 = array[row + 3]
153    val cls = array[row + 5].toInt()  // 0=basketball, 1=basketball_hoop
154    // (then un-letterbox per the snippet above and emit a detection)
155}
156```
157
158The reference `ImageDetector.android.kt` handles a subtle wrinkle: depending on how ultralytics happens to export, the output may be `[1, 300, 6]` (300 detections, 6 fields each) OR `[1, 6, 300]` (6 fields, 300 detections each). The reference code auto-detects via the `dim2 == 6 ? elements-first : channels-first` check. Worth keeping that flexibility.
159
160## Permissions + dependencies
161
162The kima app likely already has these, but for completeness:
163
164```xml
165<!-- AndroidManifest.xml -->
166<uses-permission android:name="android.permission.CAMERA" />
167<uses-feature android:name="android.hardware.camera" android:required="false" />
168```
169
170```kotlin
171// build.gradle.kts dependencies
172implementation("androidx.camera:camera-core:1.x.x")
173implementation("androidx.camera:camera-camera2:1.x.x")
174implementation("androidx.camera:camera-lifecycle:1.x.x")
175implementation("androidx.camera:camera-view:1.x.x")
176
177implementation("org.tensorflow:tensorflow-lite:2.x.x")
178implementation("org.tensorflow:tensorflow-lite-gpu:2.x.x")
179implementation("org.tensorflow:tensorflow-lite-support:0.x.x")
180```
181
182## Common gotchas
183
1841. **Wrong delegate** — if the GPU delegate fails silently (e.g. on some emulators or low-end devices) and falls back to CPU, inference goes from ~7 Hz to ~1 Hz. The logging in `CustomObjectModel.android.kt` makes this visible.
185
1862. **Wrong input dtype** — input is **float32** even for the fp16-internal models. The fp16 versions only have fp16 weights internally; their I/O tensors are still float32. Don't try to feed Int8/Uint8 buffers without re-quantizing.
187
1883. **Pixel normalization** — YOLO expects pixel values in `[0, 1]`, not `[0, 255]`. The reference uses `NormalizeOp(0f, 255f)` which performs `(x - 0) / 255 → [0, 1]`. Skipping this is a common cause of "model loads fine, detections all garbage".
189
1904. **NMS double-application** — these models have NMS baked in. **Don't run another NMS pass on the output** — it'll discard valid detections.
191
1925. **Coordinate system after rotation** — if you rotate the bitmap to landscape inside the detector (Option B from above), you also need to track that rotation when emitting bounding box coordinates back to the rest of the app, otherwise downstream consumers see boxes in the wrong reference frame.
193
1946. **Filename probe before shipping** — see `LESSONS_LEARNED.md` and the `MODELS.md` shape table. Always run `tf.lite.Interpreter` against any tflite once after copying it to verify the input shape matches what you expect. Filenames have lied before in this project.
Configure Feed

Configure Feed