Commits
- Fix VideoRecordEvent.Finalize duration/start-timestamp (~165ms overshoot)
- Add setOnAnalyzerFrameCallback / setOnRecordingFirstFrameTsCallback / setCurrentRecordingId
- Release MediaMetadataRetriever; iOS camera engine alignment
- Publishable coordinates 4.16.0 (non-SNAPSHOT)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two iOS-side fixes for use-after-free crashes (kima downstream + sample
app on iPad/iPhone 17 Pro) and the resulting cpu_resource warnings.
* CameraController.captureOutput now CFRetains the CVImageBufferRef
synchronously on the camera output queue before dispatch_async, with
a matching CFRelease in the worker's finally block (and an outer
bufferEnqueued guard for the dispatch-failed path). Without this the
K/N closure dereferenced freed memory inside FrameProcessor's first
CFRetain because AVCaptureVideoDataOutput recycles sample buffers
the moment the delegate returns.
* The dispatch_async closure no longer captures any per-frame ObjC
reference. Orientation and mirrored flag are snapshotted into local
primitives, lastCaptureConnection is written directly on the camera
output queue (idempotent — connection is constant for the session),
and analyseBufferForAll is invoked with captureConnection = null
since both overrides are non-null.
* GC.collect() is now called every 10 frames (~3 Hz at 30 fps) instead
of every frame. K/N's auto-GC trigger is too lazy at this allocation
rate so a periodic synchronous collect is still required, but cutting
the frequency 10× clears the sustained CPU advisory iOS 26 was
raising on the camera queue.
v4.15 replaced the old pointForCaptureDevicePointOfInterest-based
mapSkeletonToPreview with direct aspect-fit/fill math and — as a side
effect — changed what Skeleton.width / Skeleton.height mean:
pre-4.15: size of the VISIBLE sensor region projected onto the preview
layer, in preview points.
(abs of pointForCaptureDevicePointOfInterest((0,0))
vs pointForCaptureDevicePointOfInterest((1,1))
= source dims × aspect-fit/fill scale)
v4.15.x: full preview layer bounds in points (too wide in FIT;
too narrow in FILL).
Downstream consumers that normalize via `joint.x / skeleton.width` saw
skeletons drawn at the wrong scale after the bump. Joint x/y values
themselves were unchanged across versions (both return preview points),
so the fix is localized: set width/height = oriW*scale, oriH*scale.
AnalysisObject.boundingBox still goes through the original
pointForCaptureDevicePointOfInterest path (preserved pre-4.15 values);
AnalysisObject.frameSize is unchanged (oriented sensor dims). No
other consumer-visible coord shape changed.
New Skeleton fields (leftHeel / rightHeel / leftToe / rightToe /
leftIndex / rightIndex) are additive — default to null for consumers
that don't reference them.
Version 4.15.6.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous stubs (v4.15.3-4) covered the writeLog path. New crash hits
a different Clearcut subsystem: the periodic auto-uploader that flushes
counters on a timer independent of pose detection, matching the "ran for
a while before crashing" symptom.
Stack:
-[MLKITx_CCTClearcutMetaLogger logCounters:] +56
-[MLKITx_CCTClearcutAutoCounters flushCountersToLoggerInternal] +88
MLKITx__prm_dispatch_sync_named_cstr +56
-[MLKITx_CCTClearcutAutoCounters flushCountersToLogger] +84
-[MLKITx_CCTClearcutMetaLogger flushCountersToLogger] +32
-[MLKITx_CCTClearcutUploader finishUploadAndCallHandlers] +260
-[MLKITx_CCTClearcutUploader flushThenUploadWithCompletionHandler:isOnForeground:] +1324
Stubs added at three levels (each independently installed):
1. -[MLKITx_CCTClearcutUploader startAutoUpload]
No-op. Prevents the timer from starting at all — the cleanest
short-circuit if it runs before the uploader has started.
2. -[MLKITx_CCTClearcutUploader
flushThenUploadWithCompletionHandler:isOnForeground:]
No-op + invoke completion handler with success — handles the
foreground-entry trigger that bypasses the timer.
3. -[MLKITx_CCTClearcutMetaLogger logCounters:]
Deepest fallback right at the crash site — drops counters silently.
All telemetry dropped; pose detection unaffected.
Version 4.15.5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v4.15.3's klib was byte-identical to v4.15.2 despite the .m-file edit:
the cinteropMlkitAccurate* Gradle tasks only watched the .def file, and
since the .def content stayed the same (same paths, just new .a content
inside them), the cinterop cache wasn't invalidated and the old klib
was republished verbatim.
Fix: declare mlkitArchivesDir + mlkitRedirectDir as explicit inputs on
the cinterop tasks, so any change to libMLKit*.a / libMLKitRedirect.a
forces cinterop to rebuild and Maven to republish.
v4.15.4 iosarm64 cinterop klib: 11612790 bytes (was 11611236 in 4.15.3) —
delta matches the three new CCT-telemetry stubs from 4.15.3's .m changes
that didn't land. Verified with `nm` on the published klib's embedded
libMLKitRedirect.a — __pd_mlkit_noop_writelog, __pd_mlkit_compute_url,
and __pd_install_clearcut_stub are all present.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v4.15.2 only stubbed -[MLKITx_CCTClearcutLogger log:completion:]. Kima's
crash stack actually entered telemetry via
-[MLKITx_CCTLogWriter writeLog:pseudonymousID:logDirectory:clock:
logTransformers:completionQueue:completion:]
directly — bypassing the Logger entry point we'd stubbed. The crash then
propagates down to the missing
+[MLKITx_CCTClearcutFileUtility computeUrlForLogContextDir:context:bundleId:]
class method.
v4.15.3 adds two more defenses:
1. Stub -[MLKITx_CCTLogWriter writeLog:...] to a no-op that invokes
the completion handler (dispatching onto completionQueue if the
caller supplied one, to preserve semantics).
2. As a belt-and-braces fallback, if callers reach the FileUtility
anyway, install a class method on MLKITx_CCTClearcutFileUtility for
computeUrlForLogContextDir:context:bundleId: that returns the input
directory unchanged — satisfies the selector lookup so we don't
raise NSInvalidArgumentException even if the rest of the chain runs.
Each stub has its own installed-flag so partial success on the first
attempt still leaves the other stubs to retry on the second attempt
(+load vs mlkit_set_resource_dir).
Version 4.15.3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Kima crash: -[MLKITx_CCTClearcutLogger log:completion:] background
dispatch → ... → +[MLKITx_CCTClearcutFileUtility
computeUrlForLogContextDir:context:bundleId:] "unrecognized selector".
MLKit's telemetry subsystem is fire-and-forget on an internal dispatch
queue and doesn't surface the crash until deep in the call chain.
Telemetry isn't needed for pose detection. Swizzle the entry-point
`-[MLKITx_CCTClearcutLogger log:completion:]` to a no-op that signals
success via the completion handler, so the failing writeLog → file-IO
path is never reached.
Installation runs twice for robustness: once at +load (may be too early
if MLKit's classes haven't registered yet) and once from
mlkit_set_resource_dir (called from Kotlin after app init, when all
classes are guaranteed loaded). A flag short-circuits if the first
attempt succeeded.
Version bumped to 4.15.2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The root problem with v4.15.0's MLKit shipment was that downstream apps
had to: (1) add -ObjC to OTHER_LDFLAGS, (2) copy MLKit resource bundles
into the app bundle via an Xcode build phase, (3) force the KMP framework
to be dynamic. This PR eliminates all three consumer-side steps.
How:
1. `-ObjC` propagation — the cinterop mlkitAccurate.def already carries
`linkerOpts = -ObjC`, which Kotlin/Native applies at the consumer's
framework link step. For the KMP default (dynamic framework), this
preserves MLKit's ObjC class/category metadata inside the resulting
dylib; the consumer's app-link step just embeds the dylib, so it
doesn't need -ObjC of its own. No change needed in consumer
OTHER_LDFLAGS.
2. Resource bundles — MLKit's .tflite / .binarypb weights (~9 MB) are
now staged (Gradle `stageMlkitResources` task) into the library's
iosMain Compose Multiplatform resource set. Compose packages them
into the consumer's iOS app bundle under
`compose-resources/composeResources/...generated.resources/files/mlkit/`.
At first MlKitPose.detect() call, the new
`MlKitResourceBootstrap.ensureInitialized()` runs once — reads the
bundled bytes via `Res.readBytes`, writes them to
`NSCachesDirectory()/MLKitResources/<BundleName>.bundle/<file>`,
and registers that path with a small NSBundle swizzle.
3. NSBundle swizzle — `src/nativeInterop/mlkitRedirect/MLKitResourceRedirect.m`
swizzles `-[NSBundle URLForResource:withExtension:]` at +load time.
When MLKit's internal lookup for its three resource bundles
(MLKitPoseDetectionAccurateResources / MLKitPoseDetectionCommonResources /
MLKitXenoResources) falls through the main bundle, the swizzle
redirects to the Caches-directory copy the Kotlin bootstrap wrote.
Compiled to libMLKitRedirect.a via the new `compileMlkitRedirect`
Gradle task and folded into the MLKit cinterop .def's
staticLibraries list.
4. Dynamic framework — KMP's default is dynamic, so no consumer action
is required unless they explicitly set `isStatic = true`. The
sample iosApp's composeApp/build.gradle.kts is simplified to the
default dynamic config, and the Xcode iosApp target's OTHER_LDFLAGS
`-ObjC` entry + "Copy MLKit Resource Bundles" shell-script build
phase are removed — proving the library is self-contained.
Result: downstream consumers (kima) can bump from 4.14.0 → 4.15.1 with
zero other changes and MLKit pose detection works on iOS.
Version bumped to 4.15.1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This reverts commit 2df79f72f8d71ed17dcafa64c0b69fe5334663a1.
v4.15.0 shipped with MLKit-on-iOS via cinterop static archives, but MLKit
needs its resource bundles (~9 MB of .tflite/.binarypb weights) inside the
consumer's app bundle AND needs -ObjC at the consumer's app link step AND
needs a dynamic KMP framework. There's no KMP distribution mechanism to
flow native framework resources through a klib into a downstream app
bundle without consumer-side Gradle/Xcode setup. Shipping as-is forced
every downstream consumer to add a Run Script build phase, OTHER_LDFLAGS
entry, and change isStatic=false — violating the "just bump the version"
contract.
Fix: iosArm64's MlKitPose stubs isAvailable()=false (matching simulator
targets); FrameProcessor falls back to Apple Vision uniformly. Removed
the sync-mlkit.sh Gradle plumbing and the MLKit cinterop config.
All other v4.15.0 improvements are preserved:
- Orientation fixes (independent of pose backend)
- Object detection fixes (VNImageRequestHandler orientation-ctor bug)
- Crisp solid overlay rendering
- Split object/pose EXIF infrastructure
- Skeleton.leftHeel/leftToe/leftIndex fields on Android (MLKit provides
them); iOS Vision leaves those fields null
Future MLKit-on-iOS revival would need to ship as a pre-built XCFramework
distributed via SPM or a Gradle plugin that auto-embeds into the
consumer's iOS app — out of scope here.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without this, publishToMavenLocal fails because GPG signing is mandatory.
Local consumers (like the kima app) don't need signatures. CI publishing
to Maven Central can still opt-in via -PsigningEnabled=true.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- iOS: switch from Vision VNDetectHumanBodyPoseRequest to MLKit Accurate
pose, embedded as cinterop static archives (no cocoapods propagation
to downstream consumers).
- iOS: fix coordinate-space bugs across all four device orientations.
Pose and object paths now have independent EXIF derivations +
independent preview-coord mapping (aspect-fit/fill for pose, original
pointForCaptureDevicePointOfInterest for objects).
- iOS: fix object-detection bounding boxes in non-landscape-right
orientations. Root cause was VNImageRequestHandler silently ignoring
the VNImageOptionCGImagePropertyOrientation options-dict key;
switched to the orientation-parameter constructor.
- Skeleton: add leftHeel/rightHeel, leftToe/rightToe (foot index on
Android), and leftIndex/rightIndex (finger tip). Wired through lerp,
mirror, rotate, bones (new foot + hand bones), joints, and both
MLKit pose builders.
- iOS overlay: replace BlendMode.Softlight/Color bones and radial
gradient joint dots with solid crisp strokes + circles at ~1/3
thickness. Matches Android.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Experiment Mode menu toggle and the bottom-center Start/Stop
overlay were dev tooling, noise for anyone poking at the sample app.
Drop both.
The underlying ExperimentLogger, ExperimentEvent, the two capture
LaunchedEffects, and the intent-driven auto-mode path all stay — so
orchestrator-driven comparison runs (MainActivity test_mode extras)
continue to work without any UI visible.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The experiment logger was originally object-only — every event wrote
"skeleton": null regardless of what the pose pipeline produced. That
made it impossible to measure pose A/Bs from the captured JSON.
ExperimentEvent now carries an optional Skeleton, the JSON writer
serialises the 12 landmarks (or null per-joint) plus frame dims, and
App.kt collects skeletonFlow alongside customObjectFlow into the same
event buffer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Once a confident full skeleton is detected inside the static focus
area, FollowCropState remembers its bbox (with 50% pad) and the next
frame crops tightly around it — lifting effective input resolution on
the person above what the static half-frame crop can achieve.
Reverts to the static focus area when the skeleton is lost for
MISS_TOLERANCE=2 consecutive frames (hysteresis so one clipped frame
doesn't bounce back to wide), goes stale past 500ms, or the tight rect
degenerates. A MIN_NORMALIZED_SIDE floor of 0.25 keeps MLKit off
ultra-narrow aspect crops where recall collapses. Tight rect is
clamped to the user's static focusArea so the tracker can't drift
outside the intended zone.
Sample app menu item now cycles Mask → Crop → Crop+follow → Mask.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three compounding pose-quality tweaks:
- CROP downscale target rises from 256 → 384 max side. The crop is
already smaller than the full frame so this lands more pixels on the
person without moving the mask-path downscale.
- Drop landmarks whose MLKit inFrameLikelihood is below 0.5 before
emitting. Removes the "phantom joint" problem where occluded limbs
get guessed with low confidence.
- Temporal smoothing on successive skeletons (α=0.6, 500ms gap reset)
via PoseSmoother. Cuts visible jitter during fast shot motion while
staying responsive to real position changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a new parameter on the CameraView composable so callers can choose
how the focus rectangle is applied to the pose input. MASK preserves
existing behaviour — black out non-focus region and downscale the full
frame. CROP geometrically restricts the pose input to just the focus
rectangle before downscaling, giving the MLKit model more effective
pixels on the subject at the same downscaled side length; landmarks are
returned in full-frame coordinates via an offset that flows from
buildMlKitPoseInput through skeletonFromPoseScaled.
Object detection is intentionally unaffected — YOLO always sees the
full, unmasked frame.
Sample app exposes a menu toggle (Mask ↔ Crop) and points the demo at
the left half of the frame to exercise both modes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same API, higher-quality detector. Swaps the gradle artifact from
pose-detection to pose-detection-accurate and updates both pose-client
call sites to use AccuratePoseDetectorOptions. Trades ~2× per-frame
latency for materially better recall — worthwhile given pose is not
the FPS bottleneck in BOTH mode.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Android CPU+XNNPACK default delegate + parallel pose/object detection.
Backwards-compatible — public API unchanged, Android inference behavior
changed significantly (semver minor bump).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On Samsung SM-A366B the GPU delegate offloaded only 26/685 YOLO26n ops;
the CPU↔GPU roundtrip tax dominated, and in BOTH mode the GPU also contended
with the pose pipeline. Flipping the default to CPU+XNNPACK (setUseXNNPACK(true)
is required on tensorflow-lite-support 0.5.0, otherwise pure CPU is ~100× slower)
raised BOTH-mode FPS 8.8 → 9.9.
Removing the even/odd alternation in CameraView.android.kt activated the
already-existing `poseExecutor` parallel path in Utils.android.kt, so in BOTH
mode each camera frame now runs both detectors concurrently instead of every
other frame. Net: effective per-detector rate doubled.
numThreads dropped 4 → 3 for the object interpreter so concurrent pose + object
don't oversubscribe the 8-core CPU.
Runtime override preserved: `adb shell setprop debug.tflite.delegate GPU|NNAPI|CPU`.
Combined with tonight's INT8 baseline re-export, on-device BOTH-mode FPS went
8.77 → 16.02 (+83%) with effective object-detection rate rising from ~4.4 to ~16 FPS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
# posedetection/build.gradle.kts
# sample/composeApp/src/commonMain/kotlin/com/nate/posedetection/App.kt
Brings the iOS multiarray-decode + parametrized model-input-dim work that
was lost from the v4.12.0 squash. Without this, ultralytics yolo26 end2end
CoreML exports (rect 512×384 / 640×480 / 960×736) silently produced zero
detections on iOS — Vision delivered VNCoreMLFeatureValueObservations that
the library's Vision-only filterIsInstance dropped on the floor.
Library
- FrameProcessor.analyseBufferForAll: when Vision returns a raw multiarray
(shape [1, 300, 6], yolo26n end2end output), decode it into AnalysisObjects
with class labels "basketball" / "basketball_hoop" and bbox coords mapped
back to oriented source pixel space.
- Coordinates from the end2end output are pixel-space over the model input,
not normalized — divide by modelInputW/H before scaling to source.
- modelInputW/Height are read from the ObjectModel (set via
CustomObjectModel.ios.kt parsing the `_<W>x<H>` filename suffix), so
rect-640 and rect-960 work without further code changes.
- ImageDetector.ios.kt gets the same letterbox + multiarray decode path
for the standalone (non-AVCapture) entry point.
Sample app — iOS unattended test harness
- iosApp.swift parses `-test_model`, `-test_duration_sec`,
`-start_at_wall_ms`, `-finish_on_stop` launch args and threads them
through MainViewControllerWithAutoSpec → LocalExperimentAutoSpec.
- ExperimentLogger.ios.kt writes per-frame detection JSON to
NSDocumentDirectory/experiment_logs/ (was a no-op stub).
- ExperimentAuto.ios.kt logs progress via NSLog and exits on finish so
back-to-back captures cold-start cleanly.
- App.kt: replace System.currentTimeMillis() with
Clock.System.now().toEpochMilliseconds() so commonMain compiles for iOS.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts:
# .gitignore
# sample/composeApp/src/commonMain/kotlin/com/nate/posedetection/App.kt
Adds a 4:3 rectangular detection path on iOS that mirrors the Android
v4.11.0 letterbox preprocessing — instead of feeding square frames to
Vision and letting it center-crop away the sides, the detector now
letterboxes the source frame into the model's native aspect ratio (e.g.
512×384, 640×480, or 960×736) and decodes the model output back to
original-image coordinates. The model's input dimensions are inferred
from a `_<W>x<H>` filename suffix on the bundled `.mlmodelc`, so a new
rect model can be dropped in with no code changes.
iOS detector
- ImageDetector.ios.kt + FrameProcessor.analyseBufferForAll: handles
both Vision-pipeline output (VNRecognizedObjectObservation, used by
classic yolo11 CoreML pipelines) and raw multiarray output
(VNCoreMLFeatureValueObservation with shape [1, 300, 6], used by
ultralytics' yolo26 end2end CoreML export). Coordinates from the
end2end output are in pixel space of the model input and are
normalized by the model dimensions before mapping to the oriented
source frame.
- CustomObjectModel.ios.kt: parses the input width/height from the
model's filename (`yolo26n_v11_rect_512x384` → 512×384). Models
without the suffix get (0, 0) and skip letterboxing — preserves
prior Vision-default behavior for square models.
- Sample app picks up `imageCropAndScaleOption = ScaleFit` as a
belt-and-suspenders so Vision doesn't double-crop a frame whose
aspect already matches the model.
Sample app + experiment harness
- iOSApp.swift parses `-test_model`, `-test_duration_sec`,
`-start_at_wall_ms`, `-finish_on_stop` launch args and threads them
through MainViewControllerWithAutoSpec → LocalExperimentAutoSpec
CompositionLocal. Enables unattended back-to-back model captures via
`xcrun devicectl device process launch`.
- ExperimentLogger.ios.kt writes per-frame detection JSON to
NSDocumentDirectory/experiment_logs/ in the same schema
tools/compare_logs.py consumes; pull via `pymobiledevice3 apps afc`.
- ExperimentAuto.ios.kt logs progress via NSLog and exits the app on
finish (so back-to-back captures cold-start cleanly).
- App.kt: replace System.currentTimeMillis() with
Clock.System.now().toEpochMilliseconds() so commonMain compiles for
iOS.
Bundled rect CoreML models for the sample app
- yolo26n_v11_rect_512x384.mlpackage (val mAP50 = 0.800)
- yolo26n_v11_rect_640x480.mlpackage (val mAP50 = 0.840)
- yolo26n_v11_rect_960x736.mlpackage (val mAP50 = 0.870)
Library version bump: 4.11.1 → 4.12.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Android detector pipeline previously stretched camera frames to the
model input shape, destroying aspect ratio and degrading YOLO accuracy on
non-square sources. This release replaces the stretch with a proper
letterbox: scale-to-fit + gray-114 pad, then un-letterbox the output
bounding boxes back to the original image's coordinate space.
Library changes:
- ImageDetector.android.kt (NEW) — letterbox-aware detector with full
un-letterbox pipeline. Output bboxes returned in source-image pixel
coordinates regardless of model input aspect ratio.
- ImageDetector.kt + ImageDetector.ios.kt — common interface + iOS
expect/actual stubs.
- CameraView.android.kt — pin ImageAnalysis to AspectRatio.RATIO_4_3 so
the camera frame distribution is consistent across devices and matches
the model's expected input geometry.
- CustomObjectModel.android.kt — track and log which TFLite delegate
(GPU/NNAPI/CPU) actually built the interpreter. Surfaces silent CPU
fallbacks at INFO level for adb logcat -s TFLite:I.
- build.gradle.kts — version 4.10.1 → 4.11.0 (minor: new public detector
class, no breaking API changes).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Skeleton timestamps use sensor→epoch conversion for accurate alignment with video
- AnalysisObject now carries its own timestamp from the analysis frame
- Stagger pose and object detection on alternate frames in BOTH mode (~15fps each)
- Clear stale skeleton/object overlays when switching detection modes
- iOS VideoBuilder finalize() crash fix (NSThread.sleep polling)
- iOS extractFrame autoreleasepool to prevent CGImage accumulation
- Skeleton.lerp() for interpolation between keyframes
- Batch extractFrames Flow API for fast sequential frame decoding
- VideoBuilder: handle mismatched frame dimensions, YUV420 buffer overflow fix
- Downscale pose input to 256px for faster ML Kit processing
- Remove bundled movenet/posenet/YOLO pose model files from library assets
- Replace iOS VideoBuilder debug printlns with structured Logger calls
- Remove FPS debug logging from CameraView
- Remove DebugTestScreen from sample app
- Add *.hprof to .gitignore
- Version bump to 4.10.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add NNAPI delegate as an intermediate fallback between GPU and CPU
for TFLite inference. On devices where GPU delegate is unavailable,
NNAPI can route inference to the device's DSP/NPU for better
performance than CPU-only execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add NNAPI delegate as an intermediate fallback between GPU and CPU
for TFLite inference. On devices where GPU delegate is unavailable,
NNAPI can route inference to the device's DSP/NPU for better
performance than CPU-only execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix VideoRecordEvent.Finalize duration/start-timestamp (~165ms overshoot)
- Add setOnAnalyzerFrameCallback / setOnRecordingFirstFrameTsCallback / setCurrentRecordingId
- Release MediaMetadataRetriever; iOS camera engine alignment
- Publishable coordinates 4.16.0 (non-SNAPSHOT)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two iOS-side fixes for use-after-free crashes (kima downstream + sample
app on iPad/iPhone 17 Pro) and the resulting cpu_resource warnings.
* CameraController.captureOutput now CFRetains the CVImageBufferRef
synchronously on the camera output queue before dispatch_async, with
a matching CFRelease in the worker's finally block (and an outer
bufferEnqueued guard for the dispatch-failed path). Without this the
K/N closure dereferenced freed memory inside FrameProcessor's first
CFRetain because AVCaptureVideoDataOutput recycles sample buffers
the moment the delegate returns.
* The dispatch_async closure no longer captures any per-frame ObjC
reference. Orientation and mirrored flag are snapshotted into local
primitives, lastCaptureConnection is written directly on the camera
output queue (idempotent — connection is constant for the session),
and analyseBufferForAll is invoked with captureConnection = null
since both overrides are non-null.
* GC.collect() is now called every 10 frames (~3 Hz at 30 fps) instead
of every frame. K/N's auto-GC trigger is too lazy at this allocation
rate so a periodic synchronous collect is still required, but cutting
the frequency 10× clears the sustained CPU advisory iOS 26 was
raising on the camera queue.
v4.15 replaced the old pointForCaptureDevicePointOfInterest-based
mapSkeletonToPreview with direct aspect-fit/fill math and — as a side
effect — changed what Skeleton.width / Skeleton.height mean:
pre-4.15: size of the VISIBLE sensor region projected onto the preview
layer, in preview points.
(abs of pointForCaptureDevicePointOfInterest((0,0))
vs pointForCaptureDevicePointOfInterest((1,1))
= source dims × aspect-fit/fill scale)
v4.15.x: full preview layer bounds in points (too wide in FIT;
too narrow in FILL).
Downstream consumers that normalize via `joint.x / skeleton.width` saw
skeletons drawn at the wrong scale after the bump. Joint x/y values
themselves were unchanged across versions (both return preview points),
so the fix is localized: set width/height = oriW*scale, oriH*scale.
AnalysisObject.boundingBox still goes through the original
pointForCaptureDevicePointOfInterest path (preserved pre-4.15 values);
AnalysisObject.frameSize is unchanged (oriented sensor dims). No
other consumer-visible coord shape changed.
New Skeleton fields (leftHeel / rightHeel / leftToe / rightToe /
leftIndex / rightIndex) are additive — default to null for consumers
that don't reference them.
Version 4.15.6.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous stubs (v4.15.3-4) covered the writeLog path. New crash hits
a different Clearcut subsystem: the periodic auto-uploader that flushes
counters on a timer independent of pose detection, matching the "ran for
a while before crashing" symptom.
Stack:
-[MLKITx_CCTClearcutMetaLogger logCounters:] +56
-[MLKITx_CCTClearcutAutoCounters flushCountersToLoggerInternal] +88
MLKITx__prm_dispatch_sync_named_cstr +56
-[MLKITx_CCTClearcutAutoCounters flushCountersToLogger] +84
-[MLKITx_CCTClearcutMetaLogger flushCountersToLogger] +32
-[MLKITx_CCTClearcutUploader finishUploadAndCallHandlers] +260
-[MLKITx_CCTClearcutUploader flushThenUploadWithCompletionHandler:isOnForeground:] +1324
Stubs added at three levels (each independently installed):
1. -[MLKITx_CCTClearcutUploader startAutoUpload]
No-op. Prevents the timer from starting at all — the cleanest
short-circuit if it runs before the uploader has started.
2. -[MLKITx_CCTClearcutUploader
flushThenUploadWithCompletionHandler:isOnForeground:]
No-op + invoke completion handler with success — handles the
foreground-entry trigger that bypasses the timer.
3. -[MLKITx_CCTClearcutMetaLogger logCounters:]
Deepest fallback right at the crash site — drops counters silently.
All telemetry dropped; pose detection unaffected.
Version 4.15.5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v4.15.3's klib was byte-identical to v4.15.2 despite the .m-file edit:
the cinteropMlkitAccurate* Gradle tasks only watched the .def file, and
since the .def content stayed the same (same paths, just new .a content
inside them), the cinterop cache wasn't invalidated and the old klib
was republished verbatim.
Fix: declare mlkitArchivesDir + mlkitRedirectDir as explicit inputs on
the cinterop tasks, so any change to libMLKit*.a / libMLKitRedirect.a
forces cinterop to rebuild and Maven to republish.
v4.15.4 iosarm64 cinterop klib: 11612790 bytes (was 11611236 in 4.15.3) —
delta matches the three new CCT-telemetry stubs from 4.15.3's .m changes
that didn't land. Verified with `nm` on the published klib's embedded
libMLKitRedirect.a — __pd_mlkit_noop_writelog, __pd_mlkit_compute_url,
and __pd_install_clearcut_stub are all present.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v4.15.2 only stubbed -[MLKITx_CCTClearcutLogger log:completion:]. Kima's
crash stack actually entered telemetry via
-[MLKITx_CCTLogWriter writeLog:pseudonymousID:logDirectory:clock:
logTransformers:completionQueue:completion:]
directly — bypassing the Logger entry point we'd stubbed. The crash then
propagates down to the missing
+[MLKITx_CCTClearcutFileUtility computeUrlForLogContextDir:context:bundleId:]
class method.
v4.15.3 adds two more defenses:
1. Stub -[MLKITx_CCTLogWriter writeLog:...] to a no-op that invokes
the completion handler (dispatching onto completionQueue if the
caller supplied one, to preserve semantics).
2. As a belt-and-braces fallback, if callers reach the FileUtility
anyway, install a class method on MLKITx_CCTClearcutFileUtility for
computeUrlForLogContextDir:context:bundleId: that returns the input
directory unchanged — satisfies the selector lookup so we don't
raise NSInvalidArgumentException even if the rest of the chain runs.
Each stub has its own installed-flag so partial success on the first
attempt still leaves the other stubs to retry on the second attempt
(+load vs mlkit_set_resource_dir).
Version 4.15.3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Kima crash: -[MLKITx_CCTClearcutLogger log:completion:] background
dispatch → ... → +[MLKITx_CCTClearcutFileUtility
computeUrlForLogContextDir:context:bundleId:] "unrecognized selector".
MLKit's telemetry subsystem is fire-and-forget on an internal dispatch
queue and doesn't surface the crash until deep in the call chain.
Telemetry isn't needed for pose detection. Swizzle the entry-point
`-[MLKITx_CCTClearcutLogger log:completion:]` to a no-op that signals
success via the completion handler, so the failing writeLog → file-IO
path is never reached.
Installation runs twice for robustness: once at +load (may be too early
if MLKit's classes haven't registered yet) and once from
mlkit_set_resource_dir (called from Kotlin after app init, when all
classes are guaranteed loaded). A flag short-circuits if the first
attempt succeeded.
Version bumped to 4.15.2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The root problem with v4.15.0's MLKit shipment was that downstream apps
had to: (1) add -ObjC to OTHER_LDFLAGS, (2) copy MLKit resource bundles
into the app bundle via an Xcode build phase, (3) force the KMP framework
to be dynamic. This PR eliminates all three consumer-side steps.
How:
1. `-ObjC` propagation — the cinterop mlkitAccurate.def already carries
`linkerOpts = -ObjC`, which Kotlin/Native applies at the consumer's
framework link step. For the KMP default (dynamic framework), this
preserves MLKit's ObjC class/category metadata inside the resulting
dylib; the consumer's app-link step just embeds the dylib, so it
doesn't need -ObjC of its own. No change needed in consumer
OTHER_LDFLAGS.
2. Resource bundles — MLKit's .tflite / .binarypb weights (~9 MB) are
now staged (Gradle `stageMlkitResources` task) into the library's
iosMain Compose Multiplatform resource set. Compose packages them
into the consumer's iOS app bundle under
`compose-resources/composeResources/...generated.resources/files/mlkit/`.
At first MlKitPose.detect() call, the new
`MlKitResourceBootstrap.ensureInitialized()` runs once — reads the
bundled bytes via `Res.readBytes`, writes them to
`NSCachesDirectory()/MLKitResources/<BundleName>.bundle/<file>`,
and registers that path with a small NSBundle swizzle.
3. NSBundle swizzle — `src/nativeInterop/mlkitRedirect/MLKitResourceRedirect.m`
swizzles `-[NSBundle URLForResource:withExtension:]` at +load time.
When MLKit's internal lookup for its three resource bundles
(MLKitPoseDetectionAccurateResources / MLKitPoseDetectionCommonResources /
MLKitXenoResources) falls through the main bundle, the swizzle
redirects to the Caches-directory copy the Kotlin bootstrap wrote.
Compiled to libMLKitRedirect.a via the new `compileMlkitRedirect`
Gradle task and folded into the MLKit cinterop .def's
staticLibraries list.
4. Dynamic framework — KMP's default is dynamic, so no consumer action
is required unless they explicitly set `isStatic = true`. The
sample iosApp's composeApp/build.gradle.kts is simplified to the
default dynamic config, and the Xcode iosApp target's OTHER_LDFLAGS
`-ObjC` entry + "Copy MLKit Resource Bundles" shell-script build
phase are removed — proving the library is self-contained.
Result: downstream consumers (kima) can bump from 4.14.0 → 4.15.1 with
zero other changes and MLKit pose detection works on iOS.
Version bumped to 4.15.1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v4.15.0 shipped with MLKit-on-iOS via cinterop static archives, but MLKit
needs its resource bundles (~9 MB of .tflite/.binarypb weights) inside the
consumer's app bundle AND needs -ObjC at the consumer's app link step AND
needs a dynamic KMP framework. There's no KMP distribution mechanism to
flow native framework resources through a klib into a downstream app
bundle without consumer-side Gradle/Xcode setup. Shipping as-is forced
every downstream consumer to add a Run Script build phase, OTHER_LDFLAGS
entry, and change isStatic=false — violating the "just bump the version"
contract.
Fix: iosArm64's MlKitPose stubs isAvailable()=false (matching simulator
targets); FrameProcessor falls back to Apple Vision uniformly. Removed
the sync-mlkit.sh Gradle plumbing and the MLKit cinterop config.
All other v4.15.0 improvements are preserved:
- Orientation fixes (independent of pose backend)
- Object detection fixes (VNImageRequestHandler orientation-ctor bug)
- Crisp solid overlay rendering
- Split object/pose EXIF infrastructure
- Skeleton.leftHeel/leftToe/leftIndex fields on Android (MLKit provides
them); iOS Vision leaves those fields null
Future MLKit-on-iOS revival would need to ship as a pre-built XCFramework
distributed via SPM or a Gradle plugin that auto-embeds into the
consumer's iOS app — out of scope here.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- iOS: switch from Vision VNDetectHumanBodyPoseRequest to MLKit Accurate
pose, embedded as cinterop static archives (no cocoapods propagation
to downstream consumers).
- iOS: fix coordinate-space bugs across all four device orientations.
Pose and object paths now have independent EXIF derivations +
independent preview-coord mapping (aspect-fit/fill for pose, original
pointForCaptureDevicePointOfInterest for objects).
- iOS: fix object-detection bounding boxes in non-landscape-right
orientations. Root cause was VNImageRequestHandler silently ignoring
the VNImageOptionCGImagePropertyOrientation options-dict key;
switched to the orientation-parameter constructor.
- Skeleton: add leftHeel/rightHeel, leftToe/rightToe (foot index on
Android), and leftIndex/rightIndex (finger tip). Wired through lerp,
mirror, rotate, bones (new foot + hand bones), joints, and both
MLKit pose builders.
- iOS overlay: replace BlendMode.Softlight/Color bones and radial
gradient joint dots with solid crisp strokes + circles at ~1/3
thickness. Matches Android.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Experiment Mode menu toggle and the bottom-center Start/Stop
overlay were dev tooling, noise for anyone poking at the sample app.
Drop both.
The underlying ExperimentLogger, ExperimentEvent, the two capture
LaunchedEffects, and the intent-driven auto-mode path all stay — so
orchestrator-driven comparison runs (MainActivity test_mode extras)
continue to work without any UI visible.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The experiment logger was originally object-only — every event wrote
"skeleton": null regardless of what the pose pipeline produced. That
made it impossible to measure pose A/Bs from the captured JSON.
ExperimentEvent now carries an optional Skeleton, the JSON writer
serialises the 12 landmarks (or null per-joint) plus frame dims, and
App.kt collects skeletonFlow alongside customObjectFlow into the same
event buffer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Once a confident full skeleton is detected inside the static focus
area, FollowCropState remembers its bbox (with 50% pad) and the next
frame crops tightly around it — lifting effective input resolution on
the person above what the static half-frame crop can achieve.
Reverts to the static focus area when the skeleton is lost for
MISS_TOLERANCE=2 consecutive frames (hysteresis so one clipped frame
doesn't bounce back to wide), goes stale past 500ms, or the tight rect
degenerates. A MIN_NORMALIZED_SIDE floor of 0.25 keeps MLKit off
ultra-narrow aspect crops where recall collapses. Tight rect is
clamped to the user's static focusArea so the tracker can't drift
outside the intended zone.
Sample app menu item now cycles Mask → Crop → Crop+follow → Mask.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three compounding pose-quality tweaks:
- CROP downscale target rises from 256 → 384 max side. The crop is
already smaller than the full frame so this lands more pixels on the
person without moving the mask-path downscale.
- Drop landmarks whose MLKit inFrameLikelihood is below 0.5 before
emitting. Removes the "phantom joint" problem where occluded limbs
get guessed with low confidence.
- Temporal smoothing on successive skeletons (α=0.6, 500ms gap reset)
via PoseSmoother. Cuts visible jitter during fast shot motion while
staying responsive to real position changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a new parameter on the CameraView composable so callers can choose
how the focus rectangle is applied to the pose input. MASK preserves
existing behaviour — black out non-focus region and downscale the full
frame. CROP geometrically restricts the pose input to just the focus
rectangle before downscaling, giving the MLKit model more effective
pixels on the subject at the same downscaled side length; landmarks are
returned in full-frame coordinates via an offset that flows from
buildMlKitPoseInput through skeletonFromPoseScaled.
Object detection is intentionally unaffected — YOLO always sees the
full, unmasked frame.
Sample app exposes a menu toggle (Mask ↔ Crop) and points the demo at
the left half of the frame to exercise both modes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same API, higher-quality detector. Swaps the gradle artifact from
pose-detection to pose-detection-accurate and updates both pose-client
call sites to use AccuratePoseDetectorOptions. Trades ~2× per-frame
latency for materially better recall — worthwhile given pose is not
the FPS bottleneck in BOTH mode.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Android CPU+XNNPACK default delegate + parallel pose/object detection.
Backwards-compatible — public API unchanged, Android inference behavior
changed significantly (semver minor bump).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On Samsung SM-A366B the GPU delegate offloaded only 26/685 YOLO26n ops;
the CPU↔GPU roundtrip tax dominated, and in BOTH mode the GPU also contended
with the pose pipeline. Flipping the default to CPU+XNNPACK (setUseXNNPACK(true)
is required on tensorflow-lite-support 0.5.0, otherwise pure CPU is ~100× slower)
raised BOTH-mode FPS 8.8 → 9.9.
Removing the even/odd alternation in CameraView.android.kt activated the
already-existing `poseExecutor` parallel path in Utils.android.kt, so in BOTH
mode each camera frame now runs both detectors concurrently instead of every
other frame. Net: effective per-detector rate doubled.
numThreads dropped 4 → 3 for the object interpreter so concurrent pose + object
don't oversubscribe the 8-core CPU.
Runtime override preserved: `adb shell setprop debug.tflite.delegate GPU|NNAPI|CPU`.
Combined with tonight's INT8 baseline re-export, on-device BOTH-mode FPS went
8.77 → 16.02 (+83%) with effective object-detection rate rising from ~4.4 to ~16 FPS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
# posedetection/build.gradle.kts
# sample/composeApp/src/commonMain/kotlin/com/nate/posedetection/App.kt
Brings the iOS multiarray-decode + parametrized model-input-dim work that
was lost from the v4.12.0 squash. Without this, ultralytics yolo26 end2end
CoreML exports (rect 512×384 / 640×480 / 960×736) silently produced zero
detections on iOS — Vision delivered VNCoreMLFeatureValueObservations that
the library's Vision-only filterIsInstance dropped on the floor.
Library
- FrameProcessor.analyseBufferForAll: when Vision returns a raw multiarray
(shape [1, 300, 6], yolo26n end2end output), decode it into AnalysisObjects
with class labels "basketball" / "basketball_hoop" and bbox coords mapped
back to oriented source pixel space.
- Coordinates from the end2end output are pixel-space over the model input,
not normalized — divide by modelInputW/H before scaling to source.
- modelInputW/Height are read from the ObjectModel (set via
CustomObjectModel.ios.kt parsing the `_<W>x<H>` filename suffix), so
rect-640 and rect-960 work without further code changes.
- ImageDetector.ios.kt gets the same letterbox + multiarray decode path
for the standalone (non-AVCapture) entry point.
Sample app — iOS unattended test harness
- iosApp.swift parses `-test_model`, `-test_duration_sec`,
`-start_at_wall_ms`, `-finish_on_stop` launch args and threads them
through MainViewControllerWithAutoSpec → LocalExperimentAutoSpec.
- ExperimentLogger.ios.kt writes per-frame detection JSON to
NSDocumentDirectory/experiment_logs/ (was a no-op stub).
- ExperimentAuto.ios.kt logs progress via NSLog and exits on finish so
back-to-back captures cold-start cleanly.
- App.kt: replace System.currentTimeMillis() with
Clock.System.now().toEpochMilliseconds() so commonMain compiles for iOS.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a 4:3 rectangular detection path on iOS that mirrors the Android
v4.11.0 letterbox preprocessing — instead of feeding square frames to
Vision and letting it center-crop away the sides, the detector now
letterboxes the source frame into the model's native aspect ratio (e.g.
512×384, 640×480, or 960×736) and decodes the model output back to
original-image coordinates. The model's input dimensions are inferred
from a `_<W>x<H>` filename suffix on the bundled `.mlmodelc`, so a new
rect model can be dropped in with no code changes.
iOS detector
- ImageDetector.ios.kt + FrameProcessor.analyseBufferForAll: handles
both Vision-pipeline output (VNRecognizedObjectObservation, used by
classic yolo11 CoreML pipelines) and raw multiarray output
(VNCoreMLFeatureValueObservation with shape [1, 300, 6], used by
ultralytics' yolo26 end2end CoreML export). Coordinates from the
end2end output are in pixel space of the model input and are
normalized by the model dimensions before mapping to the oriented
source frame.
- CustomObjectModel.ios.kt: parses the input width/height from the
model's filename (`yolo26n_v11_rect_512x384` → 512×384). Models
without the suffix get (0, 0) and skip letterboxing — preserves
prior Vision-default behavior for square models.
- Sample app picks up `imageCropAndScaleOption = ScaleFit` as a
belt-and-suspenders so Vision doesn't double-crop a frame whose
aspect already matches the model.
Sample app + experiment harness
- iOSApp.swift parses `-test_model`, `-test_duration_sec`,
`-start_at_wall_ms`, `-finish_on_stop` launch args and threads them
through MainViewControllerWithAutoSpec → LocalExperimentAutoSpec
CompositionLocal. Enables unattended back-to-back model captures via
`xcrun devicectl device process launch`.
- ExperimentLogger.ios.kt writes per-frame detection JSON to
NSDocumentDirectory/experiment_logs/ in the same schema
tools/compare_logs.py consumes; pull via `pymobiledevice3 apps afc`.
- ExperimentAuto.ios.kt logs progress via NSLog and exits the app on
finish (so back-to-back captures cold-start cleanly).
- App.kt: replace System.currentTimeMillis() with
Clock.System.now().toEpochMilliseconds() so commonMain compiles for
iOS.
Bundled rect CoreML models for the sample app
- yolo26n_v11_rect_512x384.mlpackage (val mAP50 = 0.800)
- yolo26n_v11_rect_640x480.mlpackage (val mAP50 = 0.840)
- yolo26n_v11_rect_960x736.mlpackage (val mAP50 = 0.870)
Library version bump: 4.11.1 → 4.12.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Android detector pipeline previously stretched camera frames to the
model input shape, destroying aspect ratio and degrading YOLO accuracy on
non-square sources. This release replaces the stretch with a proper
letterbox: scale-to-fit + gray-114 pad, then un-letterbox the output
bounding boxes back to the original image's coordinate space.
Library changes:
- ImageDetector.android.kt (NEW) — letterbox-aware detector with full
un-letterbox pipeline. Output bboxes returned in source-image pixel
coordinates regardless of model input aspect ratio.
- ImageDetector.kt + ImageDetector.ios.kt — common interface + iOS
expect/actual stubs.
- CameraView.android.kt — pin ImageAnalysis to AspectRatio.RATIO_4_3 so
the camera frame distribution is consistent across devices and matches
the model's expected input geometry.
- CustomObjectModel.android.kt — track and log which TFLite delegate
(GPU/NNAPI/CPU) actually built the interpreter. Surfaces silent CPU
fallbacks at INFO level for adb logcat -s TFLite:I.
- build.gradle.kts — version 4.10.1 → 4.11.0 (minor: new public detector
class, no breaking API changes).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Skeleton timestamps use sensor→epoch conversion for accurate alignment with video
- AnalysisObject now carries its own timestamp from the analysis frame
- Stagger pose and object detection on alternate frames in BOTH mode (~15fps each)
- Clear stale skeleton/object overlays when switching detection modes
- iOS VideoBuilder finalize() crash fix (NSThread.sleep polling)
- iOS extractFrame autoreleasepool to prevent CGImage accumulation
- Skeleton.lerp() for interpolation between keyframes
- Batch extractFrames Flow API for fast sequential frame decoding
- VideoBuilder: handle mismatched frame dimensions, YUV420 buffer overflow fix
- Downscale pose input to 256px for faster ML Kit processing
- Remove bundled movenet/posenet/YOLO pose model files from library assets
- Replace iOS VideoBuilder debug printlns with structured Logger calls
- Remove FPS debug logging from CameraView
- Remove DebugTestScreen from sample app
- Add *.hprof to .gitignore
- Version bump to 4.10.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>