Commits
Adds a 4:3 rectangular detection path on iOS that mirrors the Android
v4.11.0 letterbox preprocessing — instead of feeding square frames to
Vision and letting it center-crop away the sides, the detector now
letterboxes the source frame into the model's native aspect ratio (e.g.
512×384, 640×480, or 960×736) and decodes the model output back to
original-image coordinates. The model's input dimensions are inferred
from a `_<W>x<H>` filename suffix on the bundled `.mlmodelc`, so a new
rect model can be dropped in with no code changes.
iOS detector
- ImageDetector.ios.kt + FrameProcessor.analyseBufferForAll: handles
both Vision-pipeline output (VNRecognizedObjectObservation, used by
classic yolo11 CoreML pipelines) and raw multiarray output
(VNCoreMLFeatureValueObservation with shape [1, 300, 6], used by
ultralytics' yolo26 end2end CoreML export). Coordinates from the
end2end output are in pixel space of the model input and are
normalized by the model dimensions before mapping to the oriented
source frame.
- CustomObjectModel.ios.kt: parses the input width/height from the
model's filename (`yolo26n_v11_rect_512x384` → 512×384). Models
without the suffix get (0, 0) and skip letterboxing — preserves
prior Vision-default behavior for square models.
- Sample app picks up `imageCropAndScaleOption = ScaleFit` as a
belt-and-suspenders so Vision doesn't double-crop a frame whose
aspect already matches the model.
Sample app + experiment harness
- iOSApp.swift parses `-test_model`, `-test_duration_sec`,
`-start_at_wall_ms`, `-finish_on_stop` launch args and threads them
through MainViewControllerWithAutoSpec → LocalExperimentAutoSpec
CompositionLocal. Enables unattended back-to-back model captures via
`xcrun devicectl device process launch`.
- ExperimentLogger.ios.kt writes per-frame detection JSON to
NSDocumentDirectory/experiment_logs/ in the same schema
tools/compare_logs.py consumes; pull via `pymobiledevice3 apps afc`.
- ExperimentAuto.ios.kt logs progress via NSLog and exits the app on
finish (so back-to-back captures cold-start cleanly).
- App.kt: replace System.currentTimeMillis() with
Clock.System.now().toEpochMilliseconds() so commonMain compiles for
iOS.
Bundled rect CoreML models for the sample app
- yolo26n_v11_rect_512x384.mlpackage (val mAP50 = 0.800)
- yolo26n_v11_rect_640x480.mlpackage (val mAP50 = 0.840)
- yolo26n_v11_rect_960x736.mlpackage (val mAP50 = 0.870)
Library version bump: 4.11.1 → 4.12.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add NNAPI delegate as an intermediate fallback between GPU and CPU
for TFLite inference. On devices where GPU delegate is unavailable,
NNAPI can route inference to the device's DSP/NPU for better
performance than CPU-only execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add NNAPI delegate as an intermediate fallback between GPU and CPU
for TFLite inference. On devices where GPU delegate is unavailable,
NNAPI can route inference to the device's DSP/NPU for better
performance than CPU-only execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a 4:3 rectangular detection path on iOS that mirrors the Android
v4.11.0 letterbox preprocessing — instead of feeding square frames to
Vision and letting it center-crop away the sides, the detector now
letterboxes the source frame into the model's native aspect ratio (e.g.
512×384, 640×480, or 960×736) and decodes the model output back to
original-image coordinates. The model's input dimensions are inferred
from a `_<W>x<H>` filename suffix on the bundled `.mlmodelc`, so a new
rect model can be dropped in with no code changes.
iOS detector
- ImageDetector.ios.kt + FrameProcessor.analyseBufferForAll: handles
both Vision-pipeline output (VNRecognizedObjectObservation, used by
classic yolo11 CoreML pipelines) and raw multiarray output
(VNCoreMLFeatureValueObservation with shape [1, 300, 6], used by
ultralytics' yolo26 end2end CoreML export). Coordinates from the
end2end output are in pixel space of the model input and are
normalized by the model dimensions before mapping to the oriented
source frame.
- CustomObjectModel.ios.kt: parses the input width/height from the
model's filename (`yolo26n_v11_rect_512x384` → 512×384). Models
without the suffix get (0, 0) and skip letterboxing — preserves
prior Vision-default behavior for square models.
- Sample app picks up `imageCropAndScaleOption = ScaleFit` as a
belt-and-suspenders so Vision doesn't double-crop a frame whose
aspect already matches the model.
Sample app + experiment harness
- iOSApp.swift parses `-test_model`, `-test_duration_sec`,
`-start_at_wall_ms`, `-finish_on_stop` launch args and threads them
through MainViewControllerWithAutoSpec → LocalExperimentAutoSpec
CompositionLocal. Enables unattended back-to-back model captures via
`xcrun devicectl device process launch`.
- ExperimentLogger.ios.kt writes per-frame detection JSON to
NSDocumentDirectory/experiment_logs/ in the same schema
tools/compare_logs.py consumes; pull via `pymobiledevice3 apps afc`.
- ExperimentAuto.ios.kt logs progress via NSLog and exits the app on
finish (so back-to-back captures cold-start cleanly).
- App.kt: replace System.currentTimeMillis() with
Clock.System.now().toEpochMilliseconds() so commonMain compiles for
iOS.
Bundled rect CoreML models for the sample app
- yolo26n_v11_rect_512x384.mlpackage (val mAP50 = 0.800)
- yolo26n_v11_rect_640x480.mlpackage (val mAP50 = 0.840)
- yolo26n_v11_rect_960x736.mlpackage (val mAP50 = 0.870)
Library version bump: 4.11.1 → 4.12.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>