handover/README.md at master · nateholland.bsky.social/PoseDetection

nateholland.bsky.social / PoseDetection
Fork 0
This repository has no description
Fork 0
PoseDetection / handover / README.md
at master 104 lines 7.5 kB View raw View rendered
wrap content
virtualintern feat: use camera input aspect ratio for object detection 2mo ago
af9479b8
  1# Basketball Object Detection — Production Handover
  2
  3This folder is the result of a multi-day R&D session optimizing YOLO-family models for basketball + basketball_hoop detection on Android (Samsung Galaxy A36 5G as reference device). It contains the production-ready tflite models, the Android-side requirements for consuming them, and a protocol for verifying quality in the kima app before shipping.
  4
  5## ⚠️ Library version: bump to `posedetection-compose:4.11.0`
  6
  7The R&D work shipped a new **`posedetection-compose:4.11.0`** library release containing the letterbox-aware detector + camera config + delegate logging that the rect models depend on. The kima app currently uses an older version (probably 4.10.0). **Bump the dependency before integrating any of these models.**
  8
  9The new version is available in two places:
 10
 11### Option A — Maven Local (immediate, no remote publish needed)
 12
 13The library is already published to maven local on the dev machine:
 14`~/.m2/repository/com/performancecoachlab/posedetection/posedetection-compose/4.11.0/`
 15
 16To pull it from kima, the kima app's root `build.gradle.kts` (or `settings.gradle.kts`'s `dependencyResolutionManagement`) must include `mavenLocal()` in its repositories list:
 17
 18```kotlin
 19repositories {
 20    mavenLocal()        // ← add this line if not already present
 21    google()
 22    mavenCentral()
 23}
 24```
 25
 26Then bump the dependency in whichever module consumes posedetection (typically the kima Android app's `build.gradle.kts`):
 27
 28```kotlin
 29implementation("com.performancecoachlab.posedetection:posedetection-compose:4.11.0")
 30```
 31
 32This will resolve from maven local without any network calls.
 33
 34### Option B — Remote (after publishing)
 35
 36The 4.11.0 library code lives on the **`release-4.11.0`** branch in the PoseDetection repo. Once it's merged + published to maven central via the existing `vanniktech.maven.publish` flow, kima can drop the `mavenLocal()` line and just consume it from central. Until then, use Option A.
 37
 38### What's actually in 4.11.0
 39
 40The full commit message on `release-4.11.0` lists everything. The headline:
 41- **`ImageDetector.android.kt`** (NEW class) — letterbox-aware detector with full un-letterbox of output bboxes. The detector all the rect models in `models/` were tested against.
 42- **`CameraView.android.kt`** — pins `ImageAnalysis` to `AspectRatio.RATIO_4_3` so the camera frame distribution matches the rect models' expected geometry.
 43- **`CustomObjectModel.android.kt`** — explicit GPU/NNAPI/CPU delegate logging at INFO level. Surfaces silent CPU fallbacks via `adb logcat -s TFLite:I`.
 44- **No breaking API changes** — only additive. Existing kima usage of the library should keep working unchanged after the version bump.
 45
 46## TL;DR — Use this model
 47
 48**`models/yolo26n_v11_rect_512x384.tflite`** is the recommended production default.
 49
 50| Property | Value |
 51|---|---|
 52| Architecture | YOLO26n (2026 generation, ~5.2 GFLOPs, ~2.4M params after fusion) |
 53| Training | 50 epochs, AdamW, `imgsz=512`, `rect=True`, Roboflow `dataset_v11` (basketball court footage, letterboxed preprocessing) |
 54| Input tensor | `[1, 384, 512, 3]` float32 (NHWC), pixel values normalized to `[0, 1]` |
 55| Output tensor | `[1, 300, 6]` float32 = `[x1, y1, x2, y2, conf, cls]` per row, **NMS already applied**, coords normalized to `[0, 1]` over the input tensor |
 56| Classes | `0 = basketball`, `1 = basketball_hoop` |
 57| File size | 9.3 MB (fp32) |
 58| Measured speed | ~6.7 Hz on the A36 5G via TFLite GPU delegate |
 59| Measured quality | mean confidence ~0.78–0.83 across the 60s basketball test clip |
 60| val mAP50 | 0.7632 |
 61
 62This model is **rectangular** (4:3 landscape) and was trained with `rect=True`, which means it learned features tuned to a 384×512 tensor. It expects the camera to deliver landscape 4:3 frames, which the kima app must enforce — see `ANDROID_INTEGRATION.md`.
 63
 64## Why fp32 not fp16
 65
 66Tested both. fp16 saves ~50% file size but on the A36 5G's TFLite GPU delegate they ran at the same speed and fp16 cost ~0.02 mean confidence. Since 4 MB of model file is negligible vs the production app's overall size, **stick with fp32** — same speed, better quality.
 67
 68## What's in this folder
 69
 70```
 71handover/
 72├── README.md                    ← you are here
 73├── MODELS.md                    ← detailed comparison of all 6 candidates
 74├── ANDROID_INTEGRATION.md       ← what the kima app's Android side needs
 75├── TESTING_PROTOCOL.md          ← how to A/B test models in the kima app
 76├── LESSONS_LEARNED.md           ← key insights from the R&D session
 77├── models/
 78│   ├── yolo26n_v11_rect_512x384.tflite     ← RECOMMENDED PRODUCTION DEFAULT (rect=True)
 79│   ├── yolo26n_v11_rect_384x288.tflite     ← speed-critical alternative (>10 Hz)
 80│   ├── yolo26n_v11_square_512.tflite       ← square 512², slightly higher quality at lower speed
 81│   ├── yolo26n_v11_square_416.tflite       ← square 416² baseline reference
 82│   ├── yolo11n_dataset640.tflite           ← historical quality ceiling, KNOWN LOOSE BBOXES
 83│   └── yolo11n_noise_floor_baseline.tflite ← stable model for noise-floor sanity checks
 84└── reference_code/
 85    ├── ImageDetector.android.kt            ← working letterbox-aware detector
 86    ├── CustomObjectModel.android.kt        ← TFLite interpreter setup with delegate logging
 87    ├── CameraView_snippet.kt               ← CameraX 4:3 + landscape pin
 88    ├── AndroidManifest_snippet.xml         ← orientation lock
 89    └── compare_logs.py                     ← portable Python report generator for A/B tests
 90```
 91
 92## Quick start
 93
 941. **Read `MODELS.md`** to understand the candidates and pick a starting model. The recommended default is `yolo26n_v11_rect_512x384.tflite`.
 952. **Read `ANDROID_INTEGRATION.md`** to see what camera config and detector code the kima app needs. **Critical**: the model expects a 4:3 landscape camera frame; the kima app must pin CameraX accordingly.
 963. **Drop the chosen tflite** into the kima app's assets folder.
 974. **Implement the camera + detector** per `ANDROID_INTEGRATION.md`. The reference code in `reference_code/` is the working version from this project.
 985. **Verify with `TESTING_PROTOCOL.md`** before shipping. Even a single 60-second comparison run against a known video can catch integration mistakes early.
 99
100## Source notes
101
102- All models are trained on the `kima-rbjnn/dataset-3k5b6` Roboflow project, version 11 (preprocessing = "Fit (black padding)"; the v8 stretched dataset was the source of weeks of accuracy issues — **never train on stretched data again**, see `LESSONS_LEARNED.md`).
103- The pipeline is closed-loop: edit `collab/Model Training.ipynb` cell 5, run training in Colab, drive-sync the tflite to the Mac, sync to assets via `tools/sync_models.py`, build, install, run a 60s back-to-back comparison via `tools/run_auto_experiment.sh`, generate the report via `tools/compare_logs.py`. The full source pipeline lives in the parent project repo.
104- **iOS is not covered here.** This handover is Android-only. iOS uses CoreML exports of the same `.pt` weights, and the user has confirmed the iOS pipeline already works at 10+ Hz with high accuracy because CoreML bakes preprocessing into the model graph. If kima ships iOS too, the iOS side likely doesn't need any of this — just the same Roboflow dataset versions and a separate CoreML export.
Configure Feed

Configure Feed