Closed-loop training & comparison tools#
These scripts close the loop between Colab-trained TFLite object-detection models and on-device measurement on a real Android phone, so we can objectively compare model A vs. model B by running them against the same video on the same hardware.
collab/Model Training.ipynb (edit cells, retrain in Colab)
│ trained .tflite via Drive sync
▼
collab/output/<run-id>/*.tflite
│ tools/sync_models.py
▼
sample/composeApp/.../assets/<run-id>__<model>.tflite
│ ./gradlew :sample:composeApp:assembleDebug
│ adb install -r
▼
Phone (Experiment Mode)
│ JSON log per detection
▼
/sdcard/Android/data/com.nate.posedetection.androidApp/files/experiment_logs/
│ tools/run_experiment.sh (manual orchestration)
│ adb pull
▼
experiments/<run-id>/{log.json, manifest.json}
│ tools/compare_logs.py
▼
report.html + summary.csv
The package id is com.nate.posedetection.androidApp (note the .androidApp
suffix — the Kotlin namespace is com.nate.posedetection without it). Every
adb command targets the suffixed name.
Prerequisites#
- macOS with QuickTime Player (preinstalled).
- adb at
/Users/virtualintern/Library/Android/sdk/platform-tools/adb. - Phone with USB debugging on, paired with this Mac, camera permission
pre-granted to the sample app. Verify with
adb devices. - Python 3 for the orchestration scripts.
compare_logs.pyoptionally uses matplotlib for embedded timeline charts — install withuv pip install matplotlibif you want them; the report still renders without.
tools/sync_models.py#
Copies trained .tflite files from collab/output/<run-id>/ into the
sample app's Android assets dir. Synced files are renamed
<run-id>__<model-base>.tflite (double underscore deliberate — the picker
in DiscoverModels.android.kt replaces single underscores with spaces, so
the double underscore renders as a clear visual break in the dropdown).
tools/sync_models.py # default: sync the latest run
tools/sync_models.py --list # show available runs newest-first
tools/sync_models.py --run v0_smoke # sync a specific run
tools/sync_models.py --clean # remove synced models
--clean preserves the three baseline models that ship with the repo
(yolo11n_dataset_dataset.tflite, yolo11n_su_416.tflite,
yolov10n_float16.tflite).
After syncing, rebuild and reinstall the app:
./gradlew :sample:composeApp:assembleDebug
adb install -r sample/composeApp/build/outputs/apk/debug/composeApp-debug.apk
If install fails with INSTALL_FAILED_UPDATE_INCOMPATIBLE (signature
mismatch), the on-device app was signed by a different debug key. Run
adb uninstall com.nate.posedetection.androidApp first; this wipes any
data the app stored locally.
tools/run_auto_experiment.sh — fully unattended orchestration#
Drives one experiment run end-to-end with no human at the phone. The
orchestrator launches the sample app via am start with intent extras that
tell the in-app auto-driver to select a model, wait until a wall-clock
target, run the experiment buffer for a fixed duration, write the JSON,
and finish the activity. The Mac side races to the same wall-clock target
and plays the video in QuickTime at the configured offset.
tools/run_auto_experiment.sh \
--video ~/Downloads/clip.mp4 \
--duration 10 \
--model-tag yolo11n_su_416 \
[--start-offset 10] \
[--device <serial>]
Flow:
- Snapshots existing experiment-log files on the device for the diff-pull.
- Pre-opens the video in QuickTime, positions the cursor at
--start-offsetseconds, and pauses. This step is wrapped in a 600 s AppleEvent timeout so very large 4K files don't blow up. The slow indexing happens here, before any wall-clock timing matters. - Force-stops the sample app and clears Logcat.
- Computes
start_at_wall_ms = now + 9 s(configurable via theAPP_BOOT_PAD_SECONDSenv var). LaunchesAppActivityviaam startwith--ez experiment_auto true --es model_name <X> --el start_at_wall_ms <Y> --el duration_ms <Z>. - Sleeps locally until
start_at_wall_ms, then issuesosascript ... play front document(sub-second since the file is already loaded).t0_wall_msis captured immediately before the play. - Sleeps
--durationseconds, then pauses QuickTime. - Diff-polls
/sdcard/.../experiment_logsfor the new JSON file (15 s timeout) and pulls it intoexperiments/<run-id>/. - Captures the
ExperimentAutoLogcat excerpt and writesexperiments/<run-id>/manifest.jsonwitht0_wall_ms,start_at_wall_ms, model tag, video offset, device label, and the captured log lines.
The Compose auto-driver inside CameraSample reads
LocalExperimentAutoSpec, finds the model whose display name (or
underscore-equivalent) matches model_name case-insensitively, waits for
customObjectFlow.drop(1).first() (the pipeline's first real emission,
bounded by an 8 s timeout), waits until start_at_wall_ms, runs the
buffer for duration_ms, writes the JSON via the same ExperimentLogger
the manual button uses, logs the saved path, and finishes the activity.
Logcat tag is ExperimentAuto. To watch a run live:
adb logcat -c && adb logcat -s ExperimentAuto:I
Tuning knobs (env vars):
APP_BOOT_PAD_SECONDS(default 9) — wall-clock budget for cold start + model load + camera spin-up. Bump if you seestart_at_wall_msfiring before the phone isready.PULL_TIMEOUT_SECONDS(default 15) — how long to wait for the new JSON to appear on the phone after the buffer should have stopped.
tools/run_experiment.sh — manual orchestration#
Drives one experiment run when you can be at the phone to tap Start/Stop.
tools/run_experiment.sh \
--video ~/Downloads/test.mp4 \
--duration 10 \
--model-tag yolo11n_su_416
Flow:
- Snapshots the existing experiment-log files on the phone so it can diff-pull only the new ones afterwards.
- Prompts you: "Phone ready? Model picked, Experiment Mode on, Start tapped?"
- Records
t0_wall_msand plays the video in QuickTime (osascript). - Sleeps for
--durationseconds. - Pauses QuickTime.
- Prompts you to tap Stop on the phone.
- Pulls only the newly-written log file(s) into
experiments/<run-id>/. - Writes
experiments/<run-id>/manifest.jsonwitht0_wall_ms, model tag, device label, video path, duration.
--device <serial> is only needed if adb devices shows more than one
device (otherwise the script auto-selects the single connected one).
--no-confirm skips the interactive prompts for unattended dry-runs.
tools/compare_logs.py — model comparison report#
Reads every experiments/<run-id>/manifest.json + log files and produces
a self-contained HTML report and a CSV summary.
tools/compare_logs.py experiments/ # report.html in cwd
tools/compare_logs.py experiments/ -o ./reports/ # write into reports/
tools/compare_logs.py experiments/ --bucket 250 # 250ms buckets
The report contains:
- Per-run scorecard: detection rate, mean confidence, events/sec, classes seen, mean bbox area.
- Per-class jitter table: stddev of bbox-center deltas across consecutive frames. Lower = more stable. (Note: a moving subject also produces high jitter — useful for spotting flapping detections more than for absolute model quality.)
- Per-bucket comparison table: for each time bucket (default 100 ms,
configurable via
--bucket), each model's frame count, top class, and max confidence. Aligned to each run'st0_wall_msfrom the manifest so model A and model B share the same x-axis when run against the same video. - Per-class confidence timelines: matplotlib PNG charts embedded as base64 data URIs (omitted with a warning banner if matplotlib is not installed).
summary.csv has one row per run with the same scorecard fields, for
spreadsheet exploration.
Typical comparison workflow#
# 1. Train a new model in collab/Model Training.ipynb (via Colab MCP).
# Output lands at collab/output/<run-id>/<name>.tflite via Drive sync.
# 2. Sync the trained model into the app.
tools/sync_models.py --run <run-id>
# 3. Build and install.
./gradlew :sample:composeApp:assembleDebug && \
adb install -r sample/composeApp/build/outputs/apk/debug/composeApp-debug.apk
# 4. Run experiment A against the trained model.
tools/run_experiment.sh \
--video ~/Movies/test_clip.mp4 \
--duration 10 \
--model-tag <run-id>__<name>
# 5. (At the phone) Switch to a baseline model in the picker, tap Start
# Experiment, and re-run against the same video.
tools/run_experiment.sh \
--video ~/Movies/test_clip.mp4 \
--duration 10 \
--model-tag yolo11n_su_416
# 6. Generate the comparison report.
tools/compare_logs.py experiments/ -o reports/
# 7. Open reports/report.html in a browser.
Layout#
tools/
├── README.md (this file)
├── sync_models.py (Phase 2)
├── run_experiment.sh (Phase 5, manual orchestration)
├── run_auto_experiment.sh (unattended orchestration via intent extras)
└── compare_logs.py (Phase 6)
experiments/ (created by run_experiment.sh; gitignored)
└── <t0_wall_ms>_<model_tag>/
├── manifest.json
└── <model>__<runId>.json
collab/
├── Model Training.ipynb (Drive-synced; iterated via Colab MCP)
└── output/ (Drive-synced; tflites land here)
└── <run-id>/
└── *.tflite