This repository has no description
1# Closed-loop training & comparison tools
2
3These scripts close the loop between Colab-trained TFLite object-detection
4models and on-device measurement on a real Android phone, so we can
5objectively compare model A vs. model B by running them against the same
6video on the same hardware.
7
8```
9collab/Model Training.ipynb (edit cells, retrain in Colab)
10 │ trained .tflite via Drive sync
11 ▼
12collab/output/<run-id>/*.tflite
13 │ tools/sync_models.py
14 ▼
15sample/composeApp/.../assets/<run-id>__<model>.tflite
16 │ ./gradlew :sample:composeApp:assembleDebug
17 │ adb install -r
18 ▼
19Phone (Experiment Mode)
20 │ JSON log per detection
21 ▼
22/sdcard/Android/data/com.nate.posedetection.androidApp/files/experiment_logs/
23 │ tools/run_experiment.sh (manual orchestration)
24 │ adb pull
25 ▼
26experiments/<run-id>/{log.json, manifest.json}
27 │ tools/compare_logs.py
28 ▼
29report.html + summary.csv
30```
31
32The package id is `com.nate.posedetection.androidApp` (note the `.androidApp`
33suffix — the Kotlin namespace is `com.nate.posedetection` without it). Every
34adb command targets the suffixed name.
35
36---
37
38## Prerequisites
39
40* **macOS** with QuickTime Player (preinstalled).
41* **adb** at `/Users/virtualintern/Library/Android/sdk/platform-tools/adb`.
42* **Phone** with USB debugging on, paired with this Mac, camera permission
43 pre-granted to the sample app. Verify with `adb devices`.
44* **Python 3** for the orchestration scripts. `compare_logs.py` optionally
45 uses **matplotlib** for embedded timeline charts — install with
46 `uv pip install matplotlib` if you want them; the report still renders
47 without.
48
49---
50
51## tools/sync_models.py
52
53Copies trained `.tflite` files from `collab/output/<run-id>/` into the
54sample app's Android assets dir. Synced files are renamed
55`<run-id>__<model-base>.tflite` (double underscore deliberate — the picker
56in `DiscoverModels.android.kt` replaces single underscores with spaces, so
57the double underscore renders as a clear visual break in the dropdown).
58
59```bash
60tools/sync_models.py # default: sync the latest run
61tools/sync_models.py --list # show available runs newest-first
62tools/sync_models.py --run v0_smoke # sync a specific run
63tools/sync_models.py --clean # remove synced models
64```
65
66`--clean` preserves the three baseline models that ship with the repo
67(`yolo11n_dataset_dataset.tflite`, `yolo11n_su_416.tflite`,
68`yolov10n_float16.tflite`).
69
70After syncing, rebuild and reinstall the app:
71
72```bash
73./gradlew :sample:composeApp:assembleDebug
74adb install -r sample/composeApp/build/outputs/apk/debug/composeApp-debug.apk
75```
76
77If install fails with `INSTALL_FAILED_UPDATE_INCOMPATIBLE` (signature
78mismatch), the on-device app was signed by a different debug key. Run
79`adb uninstall com.nate.posedetection.androidApp` first; this wipes any
80data the app stored locally.
81
82---
83
84## tools/run_auto_experiment.sh — fully unattended orchestration
85
86Drives one experiment run end-to-end with no human at the phone. The
87orchestrator launches the sample app via `am start` with intent extras that
88tell the in-app auto-driver to select a model, wait until a wall-clock
89target, run the experiment buffer for a fixed duration, write the JSON,
90and finish the activity. The Mac side races to the same wall-clock target
91and plays the video in QuickTime at the configured offset.
92
93```bash
94tools/run_auto_experiment.sh \
95 --video ~/Downloads/clip.mp4 \
96 --duration 10 \
97 --model-tag yolo11n_su_416 \
98 [--start-offset 10] \
99 [--device <serial>]
100```
101
102Flow:
103
1041. Snapshots existing experiment-log files on the device for the diff-pull.
1052. **Pre-opens** the video in QuickTime, positions the cursor at
106 `--start-offset` seconds, and pauses. This step is wrapped in a 600 s
107 AppleEvent timeout so very large 4K files don't blow up. The slow
108 indexing happens here, *before* any wall-clock timing matters.
1093. Force-stops the sample app and clears Logcat.
1104. Computes `start_at_wall_ms = now + 9 s` (configurable via the
111 `APP_BOOT_PAD_SECONDS` env var). Launches `AppActivity` via `am start`
112 with `--ez experiment_auto true --es model_name <X>
113 --el start_at_wall_ms <Y> --el duration_ms <Z>`.
1145. Sleeps locally until `start_at_wall_ms`, then issues
115 `osascript ... play front document` (sub-second since the file is
116 already loaded). `t0_wall_ms` is captured immediately before the play.
1176. Sleeps `--duration` seconds, then pauses QuickTime.
1187. Diff-polls `/sdcard/.../experiment_logs` for the new JSON file (15 s
119 timeout) and pulls it into `experiments/<run-id>/`.
1208. Captures the `ExperimentAuto` Logcat excerpt and writes
121 `experiments/<run-id>/manifest.json` with `t0_wall_ms`,
122 `start_at_wall_ms`, model tag, video offset, device label, and the
123 captured log lines.
124
125The Compose auto-driver inside `CameraSample` reads
126`LocalExperimentAutoSpec`, finds the model whose display name (or
127underscore-equivalent) matches `model_name` case-insensitively, waits for
128`customObjectFlow.drop(1).first()` (the pipeline's first real emission,
129bounded by an 8 s timeout), waits until `start_at_wall_ms`, runs the
130buffer for `duration_ms`, writes the JSON via the same `ExperimentLogger`
131the manual button uses, logs the saved path, and finishes the activity.
132
133Logcat tag is `ExperimentAuto`. To watch a run live:
134
135```bash
136adb logcat -c && adb logcat -s ExperimentAuto:I
137```
138
139Tuning knobs (env vars):
140
141* `APP_BOOT_PAD_SECONDS` (default 9) — wall-clock budget for cold start +
142 model load + camera spin-up. Bump if you see `start_at_wall_ms` firing
143 before the phone is `ready`.
144* `PULL_TIMEOUT_SECONDS` (default 15) — how long to wait for the new JSON
145 to appear on the phone after the buffer should have stopped.
146
147---
148
149## tools/run_experiment.sh — manual orchestration
150
151Drives one experiment run when you can be at the phone to tap Start/Stop.
152
153```bash
154tools/run_experiment.sh \
155 --video ~/Downloads/test.mp4 \
156 --duration 10 \
157 --model-tag yolo11n_su_416
158```
159
160Flow:
161
1621. Snapshots the existing experiment-log files on the phone so it can
163 diff-pull only the new ones afterwards.
1642. Prompts you: *"Phone ready? Model picked, Experiment Mode on, Start
165 tapped?"*
1663. Records `t0_wall_ms` and plays the video in QuickTime (`osascript`).
1674. Sleeps for `--duration` seconds.
1685. Pauses QuickTime.
1696. Prompts you to tap Stop on the phone.
1707. Pulls only the newly-written log file(s) into
171 `experiments/<run-id>/`.
1728. Writes `experiments/<run-id>/manifest.json` with `t0_wall_ms`,
173 model tag, device label, video path, duration.
174
175`--device <serial>` is only needed if `adb devices` shows more than one
176device (otherwise the script auto-selects the single connected one).
177`--no-confirm` skips the interactive prompts for unattended dry-runs.
178
179---
180
181## tools/compare_logs.py — model comparison report
182
183Reads every `experiments/<run-id>/manifest.json` + log files and produces
184a self-contained HTML report and a CSV summary.
185
186```bash
187tools/compare_logs.py experiments/ # report.html in cwd
188tools/compare_logs.py experiments/ -o ./reports/ # write into reports/
189tools/compare_logs.py experiments/ --bucket 250 # 250ms buckets
190```
191
192The report contains:
193
194* **Per-run scorecard**: detection rate, mean confidence, events/sec,
195 classes seen, mean bbox area.
196* **Per-class jitter table**: stddev of bbox-center deltas across
197 consecutive frames. Lower = more stable. (Note: a moving subject also
198 produces high jitter — useful for spotting *flapping* detections more
199 than for absolute model quality.)
200* **Per-bucket comparison table**: for each time bucket (default 100 ms,
201 configurable via `--bucket`), each model's frame count, top class, and
202 max confidence. Aligned to each run's `t0_wall_ms` from the manifest so
203 model A and model B share the same x-axis when run against the same
204 video.
205* **Per-class confidence timelines**: matplotlib PNG charts embedded as
206 base64 data URIs (omitted with a warning banner if matplotlib is not
207 installed).
208
209`summary.csv` has one row per run with the same scorecard fields, for
210spreadsheet exploration.
211
212---
213
214## Typical comparison workflow
215
216```bash
217# 1. Train a new model in collab/Model Training.ipynb (via Colab MCP).
218# Output lands at collab/output/<run-id>/<name>.tflite via Drive sync.
219
220# 2. Sync the trained model into the app.
221tools/sync_models.py --run <run-id>
222
223# 3. Build and install.
224./gradlew :sample:composeApp:assembleDebug && \
225 adb install -r sample/composeApp/build/outputs/apk/debug/composeApp-debug.apk
226
227# 4. Run experiment A against the trained model.
228tools/run_experiment.sh \
229 --video ~/Movies/test_clip.mp4 \
230 --duration 10 \
231 --model-tag <run-id>__<name>
232
233# 5. (At the phone) Switch to a baseline model in the picker, tap Start
234# Experiment, and re-run against the same video.
235tools/run_experiment.sh \
236 --video ~/Movies/test_clip.mp4 \
237 --duration 10 \
238 --model-tag yolo11n_su_416
239
240# 6. Generate the comparison report.
241tools/compare_logs.py experiments/ -o reports/
242
243# 7. Open reports/report.html in a browser.
244```
245
246---
247
248## Layout
249
250```
251tools/
252├── README.md (this file)
253├── sync_models.py (Phase 2)
254├── run_experiment.sh (Phase 5, manual orchestration)
255├── run_auto_experiment.sh (unattended orchestration via intent extras)
256└── compare_logs.py (Phase 6)
257
258experiments/ (created by run_experiment.sh; gitignored)
259└── <t0_wall_ms>_<model_tag>/
260 ├── manifest.json
261 └── <model>__<runId>.json
262
263collab/
264├── Model Training.ipynb (Drive-synced; iterated via Colab MCP)
265└── output/ (Drive-synced; tflites land here)
266 └── <run-id>/
267 └── *.tflite
268```