Commits
Without this, publishToMavenLocal fails because GPG signing is mandatory.
Local consumers (like the kima app) don't need signatures. CI publishing
to Maven Central can still opt-in via -PsigningEnabled=true.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- iOS: switch from Vision VNDetectHumanBodyPoseRequest to MLKit Accurate
pose, embedded as cinterop static archives (no cocoapods propagation
to downstream consumers).
- iOS: fix coordinate-space bugs across all four device orientations.
Pose and object paths now have independent EXIF derivations +
independent preview-coord mapping (aspect-fit/fill for pose, original
pointForCaptureDevicePointOfInterest for objects).
- iOS: fix object-detection bounding boxes in non-landscape-right
orientations. Root cause was VNImageRequestHandler silently ignoring
the VNImageOptionCGImagePropertyOrientation options-dict key;
switched to the orientation-parameter constructor.
- Skeleton: add leftHeel/rightHeel, leftToe/rightToe (foot index on
Android), and leftIndex/rightIndex (finger tip). Wired through lerp,
mirror, rotate, bones (new foot + hand bones), joints, and both
MLKit pose builders.
- iOS overlay: replace BlendMode.Softlight/Color bones and radial
gradient joint dots with solid crisp strokes + circles at ~1/3
thickness. Matches Android.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Experiment Mode menu toggle and the bottom-center Start/Stop
overlay were dev tooling, noise for anyone poking at the sample app.
Drop both.
The underlying ExperimentLogger, ExperimentEvent, the two capture
LaunchedEffects, and the intent-driven auto-mode path all stay — so
orchestrator-driven comparison runs (MainActivity test_mode extras)
continue to work without any UI visible.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The experiment logger was originally object-only — every event wrote
"skeleton": null regardless of what the pose pipeline produced. That
made it impossible to measure pose A/Bs from the captured JSON.
ExperimentEvent now carries an optional Skeleton, the JSON writer
serialises the 12 landmarks (or null per-joint) plus frame dims, and
App.kt collects skeletonFlow alongside customObjectFlow into the same
event buffer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Once a confident full skeleton is detected inside the static focus
area, FollowCropState remembers its bbox (with 50% pad) and the next
frame crops tightly around it — lifting effective input resolution on
the person above what the static half-frame crop can achieve.
Reverts to the static focus area when the skeleton is lost for
MISS_TOLERANCE=2 consecutive frames (hysteresis so one clipped frame
doesn't bounce back to wide), goes stale past 500ms, or the tight rect
degenerates. A MIN_NORMALIZED_SIDE floor of 0.25 keeps MLKit off
ultra-narrow aspect crops where recall collapses. Tight rect is
clamped to the user's static focusArea so the tracker can't drift
outside the intended zone.
Sample app menu item now cycles Mask → Crop → Crop+follow → Mask.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three compounding pose-quality tweaks:
- CROP downscale target rises from 256 → 384 max side. The crop is
already smaller than the full frame so this lands more pixels on the
person without moving the mask-path downscale.
- Drop landmarks whose MLKit inFrameLikelihood is below 0.5 before
emitting. Removes the "phantom joint" problem where occluded limbs
get guessed with low confidence.
- Temporal smoothing on successive skeletons (α=0.6, 500ms gap reset)
via PoseSmoother. Cuts visible jitter during fast shot motion while
staying responsive to real position changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a new parameter on the CameraView composable so callers can choose
how the focus rectangle is applied to the pose input. MASK preserves
existing behaviour — black out non-focus region and downscale the full
frame. CROP geometrically restricts the pose input to just the focus
rectangle before downscaling, giving the MLKit model more effective
pixels on the subject at the same downscaled side length; landmarks are
returned in full-frame coordinates via an offset that flows from
buildMlKitPoseInput through skeletonFromPoseScaled.
Object detection is intentionally unaffected — YOLO always sees the
full, unmasked frame.
Sample app exposes a menu toggle (Mask ↔ Crop) and points the demo at
the left half of the frame to exercise both modes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same API, higher-quality detector. Swaps the gradle artifact from
pose-detection to pose-detection-accurate and updates both pose-client
call sites to use AccuratePoseDetectorOptions. Trades ~2× per-frame
latency for materially better recall — worthwhile given pose is not
the FPS bottleneck in BOTH mode.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Android CPU+XNNPACK default delegate + parallel pose/object detection.
Backwards-compatible — public API unchanged, Android inference behavior
changed significantly (semver minor bump).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On Samsung SM-A366B the GPU delegate offloaded only 26/685 YOLO26n ops;
the CPU↔GPU roundtrip tax dominated, and in BOTH mode the GPU also contended
with the pose pipeline. Flipping the default to CPU+XNNPACK (setUseXNNPACK(true)
is required on tensorflow-lite-support 0.5.0, otherwise pure CPU is ~100× slower)
raised BOTH-mode FPS 8.8 → 9.9.
Removing the even/odd alternation in CameraView.android.kt activated the
already-existing `poseExecutor` parallel path in Utils.android.kt, so in BOTH
mode each camera frame now runs both detectors concurrently instead of every
other frame. Net: effective per-detector rate doubled.
numThreads dropped 4 → 3 for the object interpreter so concurrent pose + object
don't oversubscribe the 8-core CPU.
Runtime override preserved: `adb shell setprop debug.tflite.delegate GPU|NNAPI|CPU`.
Combined with tonight's INT8 baseline re-export, on-device BOTH-mode FPS went
8.77 → 16.02 (+83%) with effective object-detection rate rising from ~4.4 to ~16 FPS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
# posedetection/build.gradle.kts
# sample/composeApp/src/commonMain/kotlin/com/nate/posedetection/App.kt
Brings the iOS multiarray-decode + parametrized model-input-dim work that
was lost from the v4.12.0 squash. Without this, ultralytics yolo26 end2end
CoreML exports (rect 512×384 / 640×480 / 960×736) silently produced zero
detections on iOS — Vision delivered VNCoreMLFeatureValueObservations that
the library's Vision-only filterIsInstance dropped on the floor.
Library
- FrameProcessor.analyseBufferForAll: when Vision returns a raw multiarray
(shape [1, 300, 6], yolo26n end2end output), decode it into AnalysisObjects
with class labels "basketball" / "basketball_hoop" and bbox coords mapped
back to oriented source pixel space.
- Coordinates from the end2end output are pixel-space over the model input,
not normalized — divide by modelInputW/H before scaling to source.
- modelInputW/Height are read from the ObjectModel (set via
CustomObjectModel.ios.kt parsing the `_<W>x<H>` filename suffix), so
rect-640 and rect-960 work without further code changes.
- ImageDetector.ios.kt gets the same letterbox + multiarray decode path
for the standalone (non-AVCapture) entry point.
Sample app — iOS unattended test harness
- iosApp.swift parses `-test_model`, `-test_duration_sec`,
`-start_at_wall_ms`, `-finish_on_stop` launch args and threads them
through MainViewControllerWithAutoSpec → LocalExperimentAutoSpec.
- ExperimentLogger.ios.kt writes per-frame detection JSON to
NSDocumentDirectory/experiment_logs/ (was a no-op stub).
- ExperimentAuto.ios.kt logs progress via NSLog and exits on finish so
back-to-back captures cold-start cleanly.
- App.kt: replace System.currentTimeMillis() with
Clock.System.now().toEpochMilliseconds() so commonMain compiles for iOS.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts:
# .gitignore
# sample/composeApp/src/commonMain/kotlin/com/nate/posedetection/App.kt
Adds a 4:3 rectangular detection path on iOS that mirrors the Android
v4.11.0 letterbox preprocessing — instead of feeding square frames to
Vision and letting it center-crop away the sides, the detector now
letterboxes the source frame into the model's native aspect ratio (e.g.
512×384, 640×480, or 960×736) and decodes the model output back to
original-image coordinates. The model's input dimensions are inferred
from a `_<W>x<H>` filename suffix on the bundled `.mlmodelc`, so a new
rect model can be dropped in with no code changes.
iOS detector
- ImageDetector.ios.kt + FrameProcessor.analyseBufferForAll: handles
both Vision-pipeline output (VNRecognizedObjectObservation, used by
classic yolo11 CoreML pipelines) and raw multiarray output
(VNCoreMLFeatureValueObservation with shape [1, 300, 6], used by
ultralytics' yolo26 end2end CoreML export). Coordinates from the
end2end output are in pixel space of the model input and are
normalized by the model dimensions before mapping to the oriented
source frame.
- CustomObjectModel.ios.kt: parses the input width/height from the
model's filename (`yolo26n_v11_rect_512x384` → 512×384). Models
without the suffix get (0, 0) and skip letterboxing — preserves
prior Vision-default behavior for square models.
- Sample app picks up `imageCropAndScaleOption = ScaleFit` as a
belt-and-suspenders so Vision doesn't double-crop a frame whose
aspect already matches the model.
Sample app + experiment harness
- iOSApp.swift parses `-test_model`, `-test_duration_sec`,
`-start_at_wall_ms`, `-finish_on_stop` launch args and threads them
through MainViewControllerWithAutoSpec → LocalExperimentAutoSpec
CompositionLocal. Enables unattended back-to-back model captures via
`xcrun devicectl device process launch`.
- ExperimentLogger.ios.kt writes per-frame detection JSON to
NSDocumentDirectory/experiment_logs/ in the same schema
tools/compare_logs.py consumes; pull via `pymobiledevice3 apps afc`.
- ExperimentAuto.ios.kt logs progress via NSLog and exits the app on
finish (so back-to-back captures cold-start cleanly).
- App.kt: replace System.currentTimeMillis() with
Clock.System.now().toEpochMilliseconds() so commonMain compiles for
iOS.
Bundled rect CoreML models for the sample app
- yolo26n_v11_rect_512x384.mlpackage (val mAP50 = 0.800)
- yolo26n_v11_rect_640x480.mlpackage (val mAP50 = 0.840)
- yolo26n_v11_rect_960x736.mlpackage (val mAP50 = 0.870)
Library version bump: 4.11.1 → 4.12.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Android detector pipeline previously stretched camera frames to the
model input shape, destroying aspect ratio and degrading YOLO accuracy on
non-square sources. This release replaces the stretch with a proper
letterbox: scale-to-fit + gray-114 pad, then un-letterbox the output
bounding boxes back to the original image's coordinate space.
Library changes:
- ImageDetector.android.kt (NEW) — letterbox-aware detector with full
un-letterbox pipeline. Output bboxes returned in source-image pixel
coordinates regardless of model input aspect ratio.
- ImageDetector.kt + ImageDetector.ios.kt — common interface + iOS
expect/actual stubs.
- CameraView.android.kt — pin ImageAnalysis to AspectRatio.RATIO_4_3 so
the camera frame distribution is consistent across devices and matches
the model's expected input geometry.
- CustomObjectModel.android.kt — track and log which TFLite delegate
(GPU/NNAPI/CPU) actually built the interpreter. Surfaces silent CPU
fallbacks at INFO level for adb logcat -s TFLite:I.
- build.gradle.kts — version 4.10.1 → 4.11.0 (minor: new public detector
class, no breaking API changes).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Skeleton timestamps use sensor→epoch conversion for accurate alignment with video
- AnalysisObject now carries its own timestamp from the analysis frame
- Stagger pose and object detection on alternate frames in BOTH mode (~15fps each)
- Clear stale skeleton/object overlays when switching detection modes
- iOS VideoBuilder finalize() crash fix (NSThread.sleep polling)
- iOS extractFrame autoreleasepool to prevent CGImage accumulation
- Skeleton.lerp() for interpolation between keyframes
- Batch extractFrames Flow API for fast sequential frame decoding
- VideoBuilder: handle mismatched frame dimensions, YUV420 buffer overflow fix
- Downscale pose input to 256px for faster ML Kit processing
- Remove bundled movenet/posenet/YOLO pose model files from library assets
- Replace iOS VideoBuilder debug printlns with structured Logger calls
- Remove FPS debug logging from CameraView
- Remove DebugTestScreen from sample app
- Add *.hprof to .gitignore
- Version bump to 4.10.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add NNAPI delegate as an intermediate fallback between GPU and CPU
for TFLite inference. On devices where GPU delegate is unavailable,
NNAPI can route inference to the device's DSP/NPU for better
performance than CPU-only execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add NNAPI delegate as an intermediate fallback between GPU and CPU
for TFLite inference. On devices where GPU delegate is unavailable,
NNAPI can route inference to the device's DSP/NPU for better
performance than CPU-only execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- iOS: switch from Vision VNDetectHumanBodyPoseRequest to MLKit Accurate
pose, embedded as cinterop static archives (no cocoapods propagation
to downstream consumers).
- iOS: fix coordinate-space bugs across all four device orientations.
Pose and object paths now have independent EXIF derivations +
independent preview-coord mapping (aspect-fit/fill for pose, original
pointForCaptureDevicePointOfInterest for objects).
- iOS: fix object-detection bounding boxes in non-landscape-right
orientations. Root cause was VNImageRequestHandler silently ignoring
the VNImageOptionCGImagePropertyOrientation options-dict key;
switched to the orientation-parameter constructor.
- Skeleton: add leftHeel/rightHeel, leftToe/rightToe (foot index on
Android), and leftIndex/rightIndex (finger tip). Wired through lerp,
mirror, rotate, bones (new foot + hand bones), joints, and both
MLKit pose builders.
- iOS overlay: replace BlendMode.Softlight/Color bones and radial
gradient joint dots with solid crisp strokes + circles at ~1/3
thickness. Matches Android.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Experiment Mode menu toggle and the bottom-center Start/Stop
overlay were dev tooling, noise for anyone poking at the sample app.
Drop both.
The underlying ExperimentLogger, ExperimentEvent, the two capture
LaunchedEffects, and the intent-driven auto-mode path all stay — so
orchestrator-driven comparison runs (MainActivity test_mode extras)
continue to work without any UI visible.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The experiment logger was originally object-only — every event wrote
"skeleton": null regardless of what the pose pipeline produced. That
made it impossible to measure pose A/Bs from the captured JSON.
ExperimentEvent now carries an optional Skeleton, the JSON writer
serialises the 12 landmarks (or null per-joint) plus frame dims, and
App.kt collects skeletonFlow alongside customObjectFlow into the same
event buffer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Once a confident full skeleton is detected inside the static focus
area, FollowCropState remembers its bbox (with 50% pad) and the next
frame crops tightly around it — lifting effective input resolution on
the person above what the static half-frame crop can achieve.
Reverts to the static focus area when the skeleton is lost for
MISS_TOLERANCE=2 consecutive frames (hysteresis so one clipped frame
doesn't bounce back to wide), goes stale past 500ms, or the tight rect
degenerates. A MIN_NORMALIZED_SIDE floor of 0.25 keeps MLKit off
ultra-narrow aspect crops where recall collapses. Tight rect is
clamped to the user's static focusArea so the tracker can't drift
outside the intended zone.
Sample app menu item now cycles Mask → Crop → Crop+follow → Mask.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three compounding pose-quality tweaks:
- CROP downscale target rises from 256 → 384 max side. The crop is
already smaller than the full frame so this lands more pixels on the
person without moving the mask-path downscale.
- Drop landmarks whose MLKit inFrameLikelihood is below 0.5 before
emitting. Removes the "phantom joint" problem where occluded limbs
get guessed with low confidence.
- Temporal smoothing on successive skeletons (α=0.6, 500ms gap reset)
via PoseSmoother. Cuts visible jitter during fast shot motion while
staying responsive to real position changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a new parameter on the CameraView composable so callers can choose
how the focus rectangle is applied to the pose input. MASK preserves
existing behaviour — black out non-focus region and downscale the full
frame. CROP geometrically restricts the pose input to just the focus
rectangle before downscaling, giving the MLKit model more effective
pixels on the subject at the same downscaled side length; landmarks are
returned in full-frame coordinates via an offset that flows from
buildMlKitPoseInput through skeletonFromPoseScaled.
Object detection is intentionally unaffected — YOLO always sees the
full, unmasked frame.
Sample app exposes a menu toggle (Mask ↔ Crop) and points the demo at
the left half of the frame to exercise both modes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same API, higher-quality detector. Swaps the gradle artifact from
pose-detection to pose-detection-accurate and updates both pose-client
call sites to use AccuratePoseDetectorOptions. Trades ~2× per-frame
latency for materially better recall — worthwhile given pose is not
the FPS bottleneck in BOTH mode.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Android CPU+XNNPACK default delegate + parallel pose/object detection.
Backwards-compatible — public API unchanged, Android inference behavior
changed significantly (semver minor bump).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On Samsung SM-A366B the GPU delegate offloaded only 26/685 YOLO26n ops;
the CPU↔GPU roundtrip tax dominated, and in BOTH mode the GPU also contended
with the pose pipeline. Flipping the default to CPU+XNNPACK (setUseXNNPACK(true)
is required on tensorflow-lite-support 0.5.0, otherwise pure CPU is ~100× slower)
raised BOTH-mode FPS 8.8 → 9.9.
Removing the even/odd alternation in CameraView.android.kt activated the
already-existing `poseExecutor` parallel path in Utils.android.kt, so in BOTH
mode each camera frame now runs both detectors concurrently instead of every
other frame. Net: effective per-detector rate doubled.
numThreads dropped 4 → 3 for the object interpreter so concurrent pose + object
don't oversubscribe the 8-core CPU.
Runtime override preserved: `adb shell setprop debug.tflite.delegate GPU|NNAPI|CPU`.
Combined with tonight's INT8 baseline re-export, on-device BOTH-mode FPS went
8.77 → 16.02 (+83%) with effective object-detection rate rising from ~4.4 to ~16 FPS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
# posedetection/build.gradle.kts
# sample/composeApp/src/commonMain/kotlin/com/nate/posedetection/App.kt
Brings the iOS multiarray-decode + parametrized model-input-dim work that
was lost from the v4.12.0 squash. Without this, ultralytics yolo26 end2end
CoreML exports (rect 512×384 / 640×480 / 960×736) silently produced zero
detections on iOS — Vision delivered VNCoreMLFeatureValueObservations that
the library's Vision-only filterIsInstance dropped on the floor.
Library
- FrameProcessor.analyseBufferForAll: when Vision returns a raw multiarray
(shape [1, 300, 6], yolo26n end2end output), decode it into AnalysisObjects
with class labels "basketball" / "basketball_hoop" and bbox coords mapped
back to oriented source pixel space.
- Coordinates from the end2end output are pixel-space over the model input,
not normalized — divide by modelInputW/H before scaling to source.
- modelInputW/Height are read from the ObjectModel (set via
CustomObjectModel.ios.kt parsing the `_<W>x<H>` filename suffix), so
rect-640 and rect-960 work without further code changes.
- ImageDetector.ios.kt gets the same letterbox + multiarray decode path
for the standalone (non-AVCapture) entry point.
Sample app — iOS unattended test harness
- iosApp.swift parses `-test_model`, `-test_duration_sec`,
`-start_at_wall_ms`, `-finish_on_stop` launch args and threads them
through MainViewControllerWithAutoSpec → LocalExperimentAutoSpec.
- ExperimentLogger.ios.kt writes per-frame detection JSON to
NSDocumentDirectory/experiment_logs/ (was a no-op stub).
- ExperimentAuto.ios.kt logs progress via NSLog and exits on finish so
back-to-back captures cold-start cleanly.
- App.kt: replace System.currentTimeMillis() with
Clock.System.now().toEpochMilliseconds() so commonMain compiles for iOS.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a 4:3 rectangular detection path on iOS that mirrors the Android
v4.11.0 letterbox preprocessing — instead of feeding square frames to
Vision and letting it center-crop away the sides, the detector now
letterboxes the source frame into the model's native aspect ratio (e.g.
512×384, 640×480, or 960×736) and decodes the model output back to
original-image coordinates. The model's input dimensions are inferred
from a `_<W>x<H>` filename suffix on the bundled `.mlmodelc`, so a new
rect model can be dropped in with no code changes.
iOS detector
- ImageDetector.ios.kt + FrameProcessor.analyseBufferForAll: handles
both Vision-pipeline output (VNRecognizedObjectObservation, used by
classic yolo11 CoreML pipelines) and raw multiarray output
(VNCoreMLFeatureValueObservation with shape [1, 300, 6], used by
ultralytics' yolo26 end2end CoreML export). Coordinates from the
end2end output are in pixel space of the model input and are
normalized by the model dimensions before mapping to the oriented
source frame.
- CustomObjectModel.ios.kt: parses the input width/height from the
model's filename (`yolo26n_v11_rect_512x384` → 512×384). Models
without the suffix get (0, 0) and skip letterboxing — preserves
prior Vision-default behavior for square models.
- Sample app picks up `imageCropAndScaleOption = ScaleFit` as a
belt-and-suspenders so Vision doesn't double-crop a frame whose
aspect already matches the model.
Sample app + experiment harness
- iOSApp.swift parses `-test_model`, `-test_duration_sec`,
`-start_at_wall_ms`, `-finish_on_stop` launch args and threads them
through MainViewControllerWithAutoSpec → LocalExperimentAutoSpec
CompositionLocal. Enables unattended back-to-back model captures via
`xcrun devicectl device process launch`.
- ExperimentLogger.ios.kt writes per-frame detection JSON to
NSDocumentDirectory/experiment_logs/ in the same schema
tools/compare_logs.py consumes; pull via `pymobiledevice3 apps afc`.
- ExperimentAuto.ios.kt logs progress via NSLog and exits the app on
finish (so back-to-back captures cold-start cleanly).
- App.kt: replace System.currentTimeMillis() with
Clock.System.now().toEpochMilliseconds() so commonMain compiles for
iOS.
Bundled rect CoreML models for the sample app
- yolo26n_v11_rect_512x384.mlpackage (val mAP50 = 0.800)
- yolo26n_v11_rect_640x480.mlpackage (val mAP50 = 0.840)
- yolo26n_v11_rect_960x736.mlpackage (val mAP50 = 0.870)
Library version bump: 4.11.1 → 4.12.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Android detector pipeline previously stretched camera frames to the
model input shape, destroying aspect ratio and degrading YOLO accuracy on
non-square sources. This release replaces the stretch with a proper
letterbox: scale-to-fit + gray-114 pad, then un-letterbox the output
bounding boxes back to the original image's coordinate space.
Library changes:
- ImageDetector.android.kt (NEW) — letterbox-aware detector with full
un-letterbox pipeline. Output bboxes returned in source-image pixel
coordinates regardless of model input aspect ratio.
- ImageDetector.kt + ImageDetector.ios.kt — common interface + iOS
expect/actual stubs.
- CameraView.android.kt — pin ImageAnalysis to AspectRatio.RATIO_4_3 so
the camera frame distribution is consistent across devices and matches
the model's expected input geometry.
- CustomObjectModel.android.kt — track and log which TFLite delegate
(GPU/NNAPI/CPU) actually built the interpreter. Surfaces silent CPU
fallbacks at INFO level for adb logcat -s TFLite:I.
- build.gradle.kts — version 4.10.1 → 4.11.0 (minor: new public detector
class, no breaking API changes).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Skeleton timestamps use sensor→epoch conversion for accurate alignment with video
- AnalysisObject now carries its own timestamp from the analysis frame
- Stagger pose and object detection on alternate frames in BOTH mode (~15fps each)
- Clear stale skeleton/object overlays when switching detection modes
- iOS VideoBuilder finalize() crash fix (NSThread.sleep polling)
- iOS extractFrame autoreleasepool to prevent CGImage accumulation
- Skeleton.lerp() for interpolation between keyframes
- Batch extractFrames Flow API for fast sequential frame decoding
- VideoBuilder: handle mismatched frame dimensions, YUV420 buffer overflow fix
- Downscale pose input to 256px for faster ML Kit processing
- Remove bundled movenet/posenet/YOLO pose model files from library assets
- Replace iOS VideoBuilder debug printlns with structured Logger calls
- Remove FPS debug logging from CameraView
- Remove DebugTestScreen from sample app
- Add *.hprof to .gitignore
- Version bump to 4.10.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>