Commits
The Android detector pipeline previously stretched camera frames to the
model input shape, destroying aspect ratio and degrading YOLO accuracy on
non-square sources. This release replaces the stretch with a proper
letterbox: scale-to-fit + gray-114 pad, then un-letterbox the output
bounding boxes back to the original image's coordinate space.
Library changes:
- ImageDetector.android.kt (NEW) — letterbox-aware detector with full
un-letterbox pipeline. Output bboxes returned in source-image pixel
coordinates regardless of model input aspect ratio.
- ImageDetector.kt + ImageDetector.ios.kt — common interface + iOS
expect/actual stubs.
- CameraView.android.kt — pin ImageAnalysis to AspectRatio.RATIO_4_3 so
the camera frame distribution is consistent across devices and matches
the model's expected input geometry.
- CustomObjectModel.android.kt — track and log which TFLite delegate
(GPU/NNAPI/CPU) actually built the interpreter. Surfaces silent CPU
fallbacks at INFO level for adb logcat -s TFLite:I.
- build.gradle.kts — version 4.10.1 → 4.11.0 (minor: new public detector
class, no breaking API changes).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Skeleton timestamps use sensor→epoch conversion for accurate alignment with video
- AnalysisObject now carries its own timestamp from the analysis frame
- Stagger pose and object detection on alternate frames in BOTH mode (~15fps each)
- Clear stale skeleton/object overlays when switching detection modes
- iOS VideoBuilder finalize() crash fix (NSThread.sleep polling)
- iOS extractFrame autoreleasepool to prevent CGImage accumulation
- Skeleton.lerp() for interpolation between keyframes
- Batch extractFrames Flow API for fast sequential frame decoding
- VideoBuilder: handle mismatched frame dimensions, YUV420 buffer overflow fix
- Downscale pose input to 256px for faster ML Kit processing
- Remove bundled movenet/posenet/YOLO pose model files from library assets
- Replace iOS VideoBuilder debug printlns with structured Logger calls
- Remove FPS debug logging from CameraView
- Remove DebugTestScreen from sample app
- Add *.hprof to .gitignore
- Version bump to 4.10.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add NNAPI delegate as an intermediate fallback between GPU and CPU
for TFLite inference. On devices where GPU delegate is unavailable,
NNAPI can route inference to the device's DSP/NPU for better
performance than CPU-only execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add NNAPI delegate as an intermediate fallback between GPU and CPU
for TFLite inference. On devices where GPU delegate is unavailable,
NNAPI can route inference to the device's DSP/NPU for better
performance than CPU-only execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Android detector pipeline previously stretched camera frames to the
model input shape, destroying aspect ratio and degrading YOLO accuracy on
non-square sources. This release replaces the stretch with a proper
letterbox: scale-to-fit + gray-114 pad, then un-letterbox the output
bounding boxes back to the original image's coordinate space.
Library changes:
- ImageDetector.android.kt (NEW) — letterbox-aware detector with full
un-letterbox pipeline. Output bboxes returned in source-image pixel
coordinates regardless of model input aspect ratio.
- ImageDetector.kt + ImageDetector.ios.kt — common interface + iOS
expect/actual stubs.
- CameraView.android.kt — pin ImageAnalysis to AspectRatio.RATIO_4_3 so
the camera frame distribution is consistent across devices and matches
the model's expected input geometry.
- CustomObjectModel.android.kt — track and log which TFLite delegate
(GPU/NNAPI/CPU) actually built the interpreter. Surfaces silent CPU
fallbacks at INFO level for adb logcat -s TFLite:I.
- build.gradle.kts — version 4.10.1 → 4.11.0 (minor: new public detector
class, no breaking API changes).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Skeleton timestamps use sensor→epoch conversion for accurate alignment with video
- AnalysisObject now carries its own timestamp from the analysis frame
- Stagger pose and object detection on alternate frames in BOTH mode (~15fps each)
- Clear stale skeleton/object overlays when switching detection modes
- iOS VideoBuilder finalize() crash fix (NSThread.sleep polling)
- iOS extractFrame autoreleasepool to prevent CGImage accumulation
- Skeleton.lerp() for interpolation between keyframes
- Batch extractFrames Flow API for fast sequential frame decoding
- VideoBuilder: handle mismatched frame dimensions, YUV420 buffer overflow fix
- Downscale pose input to 256px for faster ML Kit processing
- Remove bundled movenet/posenet/YOLO pose model files from library assets
- Replace iOS VideoBuilder debug printlns with structured Logger calls
- Remove FPS debug logging from CameraView
- Remove DebugTestScreen from sample app
- Add *.hprof to .gitignore
- Version bump to 4.10.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a cached MediaCodec decoder for frame extraction instead of
creating/destroying a MediaMetadataRetriever per frame. The decoder
opens the video once and decodes frames sequentially, leveraging
codec state across frames. Handles video rotation metadata.
Reduces offline video analysis time from ~70s to ~31s (2.3x faster)
on a 6-second test video.
Also adds build time display to the sample app's frame analysis view
for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run pose detection on a dedicated thread concurrently with object
detection so per-frame latency is max(pose, object) instead of the
sum. Also bump detection throttle from 20ms (~50 FPS) to 33ms (~30 FPS)
to reduce CPU contention with minimal perceptual difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CameraX VideoCapture use case for hardware-accelerated recording
instead of per-frame ARGB→NV12 software conversion via MediaCodec.
This eliminates the CPU-intensive bitmap conversion and encoding that
was competing with pose/object detection on every frame.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>