Model Comparison Report: OpenPose vs MediaPipe Hands¶

Date: 2026-05-07 Status: Partial — OpenPose not available, MediaPipe baseline only. Refs: #37 (HAND-005)

Objective¶

Compare OpenPose hand keypoint detection and MediaPipe Hands (HandLandmarker) for the LABIS grasp recognition study. Evaluate detection rate, keypoint visibility, stability, and occlusion handling using the same input images.

Models Evaluated¶

MediaPipe Hands (HandLandmarker)¶

Version: 0.10.35 (Tasks API)
Model file: hand_landmarker.task (float16, 7.5MB)
Keypoints: 21 per hand (3D: x, y, z)
Handedness: Detected automatically (Left/Right)
Status: Installed and functional. Runs on CPU without GPU dependencies.

OpenPose Hand¶

Expected version: 1.7.0
Keypoints: 21 per hand (2D: x, y + confidence)
Handedness: Not natively detected; must infer from body pose
Status: NOT AVAILABLE in this environment. OpenPose requires Caffe/CUDA build, not installable via pip. Would need Docker or native build with GPU support.

Comparison Framework¶

A common output schema was implemented (normalize_hand_output.py) that normalizes both models' outputs to the same structure:

{
    "model": "mediapipe_hands" | "openpose_hand",
    "model_version": "x.y.z",
    "source_image": "path",
    "hands_detected": 1,
    "hands": [{
        "hand_index": 0,
        "handedness": "Right",
        "landmarks": [{"id": 0, "name": "WRIST", "x": 0.5, "y": 0.7, "z": 0.0, "confidence": 0.95}],
        "num_landmarks": 21,
        "detection_confidence": 0.95
    }]
}

This normalizer is tested with mock data for both models (22 tests passing).

Results: MediaPipe Hands (Baseline)¶

Synthetic Image Smoke Tests¶

Image	Hands Detected	Notes
Geometric open hand	0	Expected: geometric shapes too simple for detection
Geometric power grasp	0	Expected: no realistic hand texture
No hand (table)	0	Correct negative: no false positive

Conclusion: MediaPipe correctly rejects non-realistic synthetic images and produces no false positives. The pipeline is wired correctly and outputs valid schema.

Observations¶

MediaPipe requires realistic skin texture and proportions to detect hands.
Geometric/drawn hand shapes are not detected (correct behavior for a real-world model).
Detection confidence threshold of 0.3 is appropriate; higher thresholds would miss partially occluded grasps.
3D z-coordinate from MediaPipe provides depth information that OpenPose hand does not.

Results: OpenPose Hand¶

NOT AVAILABLE for direct comparison. OpenPose hand detection requires: 1. Full OpenPose build (Caffe + CUDA or CPU-only, but very slow) 2. Body pose detection first (hand region is estimated from body keypoints) 3. Separate hand model execution

The OpenPose normalizer (normalize_openpose()) is implemented and tested with mock data, ready for when OpenPose becomes available.

Feature Comparison (Based on Documentation)¶

Feature	MediaPipe Hands	OpenPose Hand
Keypoints	21 (3D)	21 (2D + confidence)
Handedness	Automatic	Requires body pose
Depth (z)	Yes	No
Installation	`pip install mediapipe`	Build from source (Caffe/CUDA)
CPU performance	~30ms per frame	~2-5s per frame (CPU)
GPU required	No	Strongly recommended
Occlusion handling	Moderate (presence confidence)	Limited
Multi-hand	Up to 2 hands	Depends on body detection
Maintenance	Active (Google)	Limited (CMU, last release 2020)
Python API	pip package, Tasks API	Custom build, no pip

Recommendation¶

Use MediaPipe Hands as the primary model for the LABIS grasp recognition study.

Rationale: 1. Accessibility: pip-installable, no GPU or custom builds needed. 2. 3D keypoints: z-coordinate provides depth useful for grasp classification. 3. Performance: ~30ms inference on CPU, suitable for real-time and batch processing. 4. Active maintenance: Google actively develops MediaPipe; OpenPose's last release was 2020. 5. Handedness detection: Built-in, critical for bilateral grasp studies.

If OpenPose becomes available (e.g., via Docker), the comparison framework is ready: - normalize_openpose() normalizer is implemented and tested - Common schema allows apples-to-apples comparison - Test framework expects both models' metrics

Limitations¶

No direct OpenPose comparison was performed due to unavailability.
Synthetic images only — no real hand detection metrics yet (see HAND-004).
Feature comparison based on documentation, not empirical measurement on our specific dataset.
MediaPipe's z-coordinate is relative, not absolute depth — may need calibration for grasp classification.
Occlusion handling was not tested empirically; this is critical for grasps where fingers wrap around objects.

Next Steps¶

When real dataset is available (HAND-004), re-run MediaPipe comparison with detection metrics.
If OpenPose is needed, create a Docker image with OpenPose + Python bindings.
Consider adding MMPose or HRNet as additional comparison candidates if needed.
Implement grasp classification rules based on normalized landmark positions.