Commits
Android CPU+XNNPACK default delegate + parallel pose/object detection.
Backwards-compatible — public API unchanged, Android inference behavior
changed significantly (semver minor bump).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On Samsung SM-A366B the GPU delegate offloaded only 26/685 YOLO26n ops;
the CPU↔GPU roundtrip tax dominated, and in BOTH mode the GPU also contended
with the pose pipeline. Flipping the default to CPU+XNNPACK (setUseXNNPACK(true)
is required on tensorflow-lite-support 0.5.0, otherwise pure CPU is ~100× slower)
raised BOTH-mode FPS 8.8 → 9.9.
Removing the even/odd alternation in CameraView.android.kt activated the
already-existing `poseExecutor` parallel path in Utils.android.kt, so in BOTH
mode each camera frame now runs both detectors concurrently instead of every
other frame. Net: effective per-detector rate doubled.
numThreads dropped 4 → 3 for the object interpreter so concurrent pose + object
don't oversubscribe the 8-core CPU.
Runtime override preserved: `adb shell setprop debug.tflite.delegate GPU|NNAPI|CPU`.
Combined with tonight's INT8 baseline re-export, on-device BOTH-mode FPS went
8.77 → 16.02 (+83%) with effective object-detection rate rising from ~4.4 to ~16 FPS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
# posedetection/build.gradle.kts
# sample/composeApp/src/commonMain/kotlin/com/nate/posedetection/App.kt
Brings the iOS multiarray-decode + parametrized model-input-dim work that
was lost from the v4.12.0 squash. Without this, ultralytics yolo26 end2end
CoreML exports (rect 512×384 / 640×480 / 960×736) silently produced zero
detections on iOS — Vision delivered VNCoreMLFeatureValueObservations that
the library's Vision-only filterIsInstance dropped on the floor.
Library
- FrameProcessor.analyseBufferForAll: when Vision returns a raw multiarray
(shape [1, 300, 6], yolo26n end2end output), decode it into AnalysisObjects
with class labels "basketball" / "basketball_hoop" and bbox coords mapped
back to oriented source pixel space.
- Coordinates from the end2end output are pixel-space over the model input,
not normalized — divide by modelInputW/H before scaling to source.
- modelInputW/Height are read from the ObjectModel (set via
CustomObjectModel.ios.kt parsing the `_<W>x<H>` filename suffix), so
rect-640 and rect-960 work without further code changes.
- ImageDetector.ios.kt gets the same letterbox + multiarray decode path
for the standalone (non-AVCapture) entry point.
Sample app — iOS unattended test harness
- iosApp.swift parses `-test_model`, `-test_duration_sec`,
`-start_at_wall_ms`, `-finish_on_stop` launch args and threads them
through MainViewControllerWithAutoSpec → LocalExperimentAutoSpec.
- ExperimentLogger.ios.kt writes per-frame detection JSON to
NSDocumentDirectory/experiment_logs/ (was a no-op stub).
- ExperimentAuto.ios.kt logs progress via NSLog and exits on finish so
back-to-back captures cold-start cleanly.
- App.kt: replace System.currentTimeMillis() with
Clock.System.now().toEpochMilliseconds() so commonMain compiles for iOS.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts:
# .gitignore
# sample/composeApp/src/commonMain/kotlin/com/nate/posedetection/App.kt
Adds a 4:3 rectangular detection path on iOS that mirrors the Android
v4.11.0 letterbox preprocessing — instead of feeding square frames to
Vision and letting it center-crop away the sides, the detector now
letterboxes the source frame into the model's native aspect ratio (e.g.
512×384, 640×480, or 960×736) and decodes the model output back to
original-image coordinates. The model's input dimensions are inferred
from a `_<W>x<H>` filename suffix on the bundled `.mlmodelc`, so a new
rect model can be dropped in with no code changes.
iOS detector
- ImageDetector.ios.kt + FrameProcessor.analyseBufferForAll: handles
both Vision-pipeline output (VNRecognizedObjectObservation, used by
classic yolo11 CoreML pipelines) and raw multiarray output
(VNCoreMLFeatureValueObservation with shape [1, 300, 6], used by
ultralytics' yolo26 end2end CoreML export). Coordinates from the
end2end output are in pixel space of the model input and are
normalized by the model dimensions before mapping to the oriented
source frame.
- CustomObjectModel.ios.kt: parses the input width/height from the
model's filename (`yolo26n_v11_rect_512x384` → 512×384). Models
without the suffix get (0, 0) and skip letterboxing — preserves
prior Vision-default behavior for square models.
- Sample app picks up `imageCropAndScaleOption = ScaleFit` as a
belt-and-suspenders so Vision doesn't double-crop a frame whose
aspect already matches the model.
Sample app + experiment harness
- iOSApp.swift parses `-test_model`, `-test_duration_sec`,
`-start_at_wall_ms`, `-finish_on_stop` launch args and threads them
through MainViewControllerWithAutoSpec → LocalExperimentAutoSpec
CompositionLocal. Enables unattended back-to-back model captures via
`xcrun devicectl device process launch`.
- ExperimentLogger.ios.kt writes per-frame detection JSON to
NSDocumentDirectory/experiment_logs/ in the same schema
tools/compare_logs.py consumes; pull via `pymobiledevice3 apps afc`.
- ExperimentAuto.ios.kt logs progress via NSLog and exits the app on
finish (so back-to-back captures cold-start cleanly).
- App.kt: replace System.currentTimeMillis() with
Clock.System.now().toEpochMilliseconds() so commonMain compiles for
iOS.
Bundled rect CoreML models for the sample app
- yolo26n_v11_rect_512x384.mlpackage (val mAP50 = 0.800)
- yolo26n_v11_rect_640x480.mlpackage (val mAP50 = 0.840)
- yolo26n_v11_rect_960x736.mlpackage (val mAP50 = 0.870)
Library version bump: 4.11.1 → 4.12.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Android detector pipeline previously stretched camera frames to the
model input shape, destroying aspect ratio and degrading YOLO accuracy on
non-square sources. This release replaces the stretch with a proper
letterbox: scale-to-fit + gray-114 pad, then un-letterbox the output
bounding boxes back to the original image's coordinate space.
Library changes:
- ImageDetector.android.kt (NEW) — letterbox-aware detector with full
un-letterbox pipeline. Output bboxes returned in source-image pixel
coordinates regardless of model input aspect ratio.
- ImageDetector.kt + ImageDetector.ios.kt — common interface + iOS
expect/actual stubs.
- CameraView.android.kt — pin ImageAnalysis to AspectRatio.RATIO_4_3 so
the camera frame distribution is consistent across devices and matches
the model's expected input geometry.
- CustomObjectModel.android.kt — track and log which TFLite delegate
(GPU/NNAPI/CPU) actually built the interpreter. Surfaces silent CPU
fallbacks at INFO level for adb logcat -s TFLite:I.
- build.gradle.kts — version 4.10.1 → 4.11.0 (minor: new public detector
class, no breaking API changes).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Skeleton timestamps use sensor→epoch conversion for accurate alignment with video
- AnalysisObject now carries its own timestamp from the analysis frame
- Stagger pose and object detection on alternate frames in BOTH mode (~15fps each)
- Clear stale skeleton/object overlays when switching detection modes
- iOS VideoBuilder finalize() crash fix (NSThread.sleep polling)
- iOS extractFrame autoreleasepool to prevent CGImage accumulation
- Skeleton.lerp() for interpolation between keyframes
- Batch extractFrames Flow API for fast sequential frame decoding
- VideoBuilder: handle mismatched frame dimensions, YUV420 buffer overflow fix
- Downscale pose input to 256px for faster ML Kit processing
- Remove bundled movenet/posenet/YOLO pose model files from library assets
- Replace iOS VideoBuilder debug printlns with structured Logger calls
- Remove FPS debug logging from CameraView
- Remove DebugTestScreen from sample app
- Add *.hprof to .gitignore
- Version bump to 4.10.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add NNAPI delegate as an intermediate fallback between GPU and CPU
for TFLite inference. On devices where GPU delegate is unavailable,
NNAPI can route inference to the device's DSP/NPU for better
performance than CPU-only execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add NNAPI delegate as an intermediate fallback between GPU and CPU
for TFLite inference. On devices where GPU delegate is unavailable,
NNAPI can route inference to the device's DSP/NPU for better
performance than CPU-only execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Android CPU+XNNPACK default delegate + parallel pose/object detection.
Backwards-compatible — public API unchanged, Android inference behavior
changed significantly (semver minor bump).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On Samsung SM-A366B the GPU delegate offloaded only 26/685 YOLO26n ops;
the CPU↔GPU roundtrip tax dominated, and in BOTH mode the GPU also contended
with the pose pipeline. Flipping the default to CPU+XNNPACK (setUseXNNPACK(true)
is required on tensorflow-lite-support 0.5.0, otherwise pure CPU is ~100× slower)
raised BOTH-mode FPS 8.8 → 9.9.
Removing the even/odd alternation in CameraView.android.kt activated the
already-existing `poseExecutor` parallel path in Utils.android.kt, so in BOTH
mode each camera frame now runs both detectors concurrently instead of every
other frame. Net: effective per-detector rate doubled.
numThreads dropped 4 → 3 for the object interpreter so concurrent pose + object
don't oversubscribe the 8-core CPU.
Runtime override preserved: `adb shell setprop debug.tflite.delegate GPU|NNAPI|CPU`.
Combined with tonight's INT8 baseline re-export, on-device BOTH-mode FPS went
8.77 → 16.02 (+83%) with effective object-detection rate rising from ~4.4 to ~16 FPS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
# posedetection/build.gradle.kts
# sample/composeApp/src/commonMain/kotlin/com/nate/posedetection/App.kt
Brings the iOS multiarray-decode + parametrized model-input-dim work that
was lost from the v4.12.0 squash. Without this, ultralytics yolo26 end2end
CoreML exports (rect 512×384 / 640×480 / 960×736) silently produced zero
detections on iOS — Vision delivered VNCoreMLFeatureValueObservations that
the library's Vision-only filterIsInstance dropped on the floor.
Library
- FrameProcessor.analyseBufferForAll: when Vision returns a raw multiarray
(shape [1, 300, 6], yolo26n end2end output), decode it into AnalysisObjects
with class labels "basketball" / "basketball_hoop" and bbox coords mapped
back to oriented source pixel space.
- Coordinates from the end2end output are pixel-space over the model input,
not normalized — divide by modelInputW/H before scaling to source.
- modelInputW/Height are read from the ObjectModel (set via
CustomObjectModel.ios.kt parsing the `_<W>x<H>` filename suffix), so
rect-640 and rect-960 work without further code changes.
- ImageDetector.ios.kt gets the same letterbox + multiarray decode path
for the standalone (non-AVCapture) entry point.
Sample app — iOS unattended test harness
- iosApp.swift parses `-test_model`, `-test_duration_sec`,
`-start_at_wall_ms`, `-finish_on_stop` launch args and threads them
through MainViewControllerWithAutoSpec → LocalExperimentAutoSpec.
- ExperimentLogger.ios.kt writes per-frame detection JSON to
NSDocumentDirectory/experiment_logs/ (was a no-op stub).
- ExperimentAuto.ios.kt logs progress via NSLog and exits on finish so
back-to-back captures cold-start cleanly.
- App.kt: replace System.currentTimeMillis() with
Clock.System.now().toEpochMilliseconds() so commonMain compiles for iOS.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a 4:3 rectangular detection path on iOS that mirrors the Android
v4.11.0 letterbox preprocessing — instead of feeding square frames to
Vision and letting it center-crop away the sides, the detector now
letterboxes the source frame into the model's native aspect ratio (e.g.
512×384, 640×480, or 960×736) and decodes the model output back to
original-image coordinates. The model's input dimensions are inferred
from a `_<W>x<H>` filename suffix on the bundled `.mlmodelc`, so a new
rect model can be dropped in with no code changes.
iOS detector
- ImageDetector.ios.kt + FrameProcessor.analyseBufferForAll: handles
both Vision-pipeline output (VNRecognizedObjectObservation, used by
classic yolo11 CoreML pipelines) and raw multiarray output
(VNCoreMLFeatureValueObservation with shape [1, 300, 6], used by
ultralytics' yolo26 end2end CoreML export). Coordinates from the
end2end output are in pixel space of the model input and are
normalized by the model dimensions before mapping to the oriented
source frame.
- CustomObjectModel.ios.kt: parses the input width/height from the
model's filename (`yolo26n_v11_rect_512x384` → 512×384). Models
without the suffix get (0, 0) and skip letterboxing — preserves
prior Vision-default behavior for square models.
- Sample app picks up `imageCropAndScaleOption = ScaleFit` as a
belt-and-suspenders so Vision doesn't double-crop a frame whose
aspect already matches the model.
Sample app + experiment harness
- iOSApp.swift parses `-test_model`, `-test_duration_sec`,
`-start_at_wall_ms`, `-finish_on_stop` launch args and threads them
through MainViewControllerWithAutoSpec → LocalExperimentAutoSpec
CompositionLocal. Enables unattended back-to-back model captures via
`xcrun devicectl device process launch`.
- ExperimentLogger.ios.kt writes per-frame detection JSON to
NSDocumentDirectory/experiment_logs/ in the same schema
tools/compare_logs.py consumes; pull via `pymobiledevice3 apps afc`.
- ExperimentAuto.ios.kt logs progress via NSLog and exits the app on
finish (so back-to-back captures cold-start cleanly).
- App.kt: replace System.currentTimeMillis() with
Clock.System.now().toEpochMilliseconds() so commonMain compiles for
iOS.
Bundled rect CoreML models for the sample app
- yolo26n_v11_rect_512x384.mlpackage (val mAP50 = 0.800)
- yolo26n_v11_rect_640x480.mlpackage (val mAP50 = 0.840)
- yolo26n_v11_rect_960x736.mlpackage (val mAP50 = 0.870)
Library version bump: 4.11.1 → 4.12.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Android detector pipeline previously stretched camera frames to the
model input shape, destroying aspect ratio and degrading YOLO accuracy on
non-square sources. This release replaces the stretch with a proper
letterbox: scale-to-fit + gray-114 pad, then un-letterbox the output
bounding boxes back to the original image's coordinate space.
Library changes:
- ImageDetector.android.kt (NEW) — letterbox-aware detector with full
un-letterbox pipeline. Output bboxes returned in source-image pixel
coordinates regardless of model input aspect ratio.
- ImageDetector.kt + ImageDetector.ios.kt — common interface + iOS
expect/actual stubs.
- CameraView.android.kt — pin ImageAnalysis to AspectRatio.RATIO_4_3 so
the camera frame distribution is consistent across devices and matches
the model's expected input geometry.
- CustomObjectModel.android.kt — track and log which TFLite delegate
(GPU/NNAPI/CPU) actually built the interpreter. Surfaces silent CPU
fallbacks at INFO level for adb logcat -s TFLite:I.
- build.gradle.kts — version 4.10.1 → 4.11.0 (minor: new public detector
class, no breaking API changes).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Skeleton timestamps use sensor→epoch conversion for accurate alignment with video
- AnalysisObject now carries its own timestamp from the analysis frame
- Stagger pose and object detection on alternate frames in BOTH mode (~15fps each)
- Clear stale skeleton/object overlays when switching detection modes
- iOS VideoBuilder finalize() crash fix (NSThread.sleep polling)
- iOS extractFrame autoreleasepool to prevent CGImage accumulation
- Skeleton.lerp() for interpolation between keyframes
- Batch extractFrames Flow API for fast sequential frame decoding
- VideoBuilder: handle mismatched frame dimensions, YUV420 buffer overflow fix
- Downscale pose input to 256px for faster ML Kit processing
- Remove bundled movenet/posenet/YOLO pose model files from library assets
- Replace iOS VideoBuilder debug printlns with structured Logger calls
- Remove FPS debug logging from CameraView
- Remove DebugTestScreen from sample app
- Add *.hprof to .gitignore
- Version bump to 4.10.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>