Model Comparison Report: OpenPose vs MediaPipe Hands¶
Date: 2026-05-07 Status: Partial — OpenPose not available, MediaPipe baseline only. Refs: #37 (HAND-005)
Objective¶
Compare OpenPose hand keypoint detection and MediaPipe Hands (HandLandmarker) for the LABIS grasp recognition study. Evaluate detection rate, keypoint visibility, stability, and occlusion handling using the same input images.
Models Evaluated¶
MediaPipe Hands (HandLandmarker)¶
- Version: 0.10.35 (Tasks API)
- Model file: hand_landmarker.task (float16, 7.5MB)
- Keypoints: 21 per hand (3D: x, y, z)
- Handedness: Detected automatically (Left/Right)
- Status: Installed and functional. Runs on CPU without GPU dependencies.
OpenPose Hand¶
- Expected version: 1.7.0
- Keypoints: 21 per hand (2D: x, y + confidence)
- Handedness: Not natively detected; must infer from body pose
- Status: NOT AVAILABLE in this environment. OpenPose requires Caffe/CUDA build, not installable via pip. Would need Docker or native build with GPU support.
Comparison Framework¶
A common output schema was implemented (normalize_hand_output.py) that normalizes both models' outputs to the same structure:
{
"model": "mediapipe_hands" | "openpose_hand",
"model_version": "x.y.z",
"source_image": "path",
"hands_detected": 1,
"hands": [{
"hand_index": 0,
"handedness": "Right",
"landmarks": [{"id": 0, "name": "WRIST", "x": 0.5, "y": 0.7, "z": 0.0, "confidence": 0.95}],
"num_landmarks": 21,
"detection_confidence": 0.95
}]
}
This normalizer is tested with mock data for both models (22 tests passing).
Results: MediaPipe Hands (Baseline)¶
Synthetic Image Smoke Tests¶
| Image | Hands Detected | Notes |
|---|---|---|
| Geometric open hand | 0 | Expected: geometric shapes too simple for detection |
| Geometric power grasp | 0 | Expected: no realistic hand texture |
| No hand (table) | 0 | Correct negative: no false positive |
Conclusion: MediaPipe correctly rejects non-realistic synthetic images and produces no false positives. The pipeline is wired correctly and outputs valid schema.
Observations¶
- MediaPipe requires realistic skin texture and proportions to detect hands.
- Geometric/drawn hand shapes are not detected (correct behavior for a real-world model).
- Detection confidence threshold of 0.3 is appropriate; higher thresholds would miss partially occluded grasps.
- 3D z-coordinate from MediaPipe provides depth information that OpenPose hand does not.
Results: OpenPose Hand¶
NOT AVAILABLE for direct comparison. OpenPose hand detection requires: 1. Full OpenPose build (Caffe + CUDA or CPU-only, but very slow) 2. Body pose detection first (hand region is estimated from body keypoints) 3. Separate hand model execution
The OpenPose normalizer (normalize_openpose()) is implemented and tested with mock data, ready for when OpenPose becomes available.
Feature Comparison (Based on Documentation)¶
| Feature | MediaPipe Hands | OpenPose Hand |
|---|---|---|
| Keypoints | 21 (3D) | 21 (2D + confidence) |
| Handedness | Automatic | Requires body pose |
| Depth (z) | Yes | No |
| Installation | pip install mediapipe |
Build from source (Caffe/CUDA) |
| CPU performance | ~30ms per frame | ~2-5s per frame (CPU) |
| GPU required | No | Strongly recommended |
| Occlusion handling | Moderate (presence confidence) | Limited |
| Multi-hand | Up to 2 hands | Depends on body detection |
| Maintenance | Active (Google) | Limited (CMU, last release 2020) |
| Python API | pip package, Tasks API | Custom build, no pip |
Recommendation¶
Use MediaPipe Hands as the primary model for the LABIS grasp recognition study.
Rationale: 1. Accessibility: pip-installable, no GPU or custom builds needed. 2. 3D keypoints: z-coordinate provides depth useful for grasp classification. 3. Performance: ~30ms inference on CPU, suitable for real-time and batch processing. 4. Active maintenance: Google actively develops MediaPipe; OpenPose's last release was 2020. 5. Handedness detection: Built-in, critical for bilateral grasp studies.
If OpenPose becomes available (e.g., via Docker), the comparison framework is ready:
- normalize_openpose() normalizer is implemented and tested
- Common schema allows apples-to-apples comparison
- Test framework expects both models' metrics
Limitations¶
- No direct OpenPose comparison was performed due to unavailability.
- Synthetic images only — no real hand detection metrics yet (see HAND-004).
- Feature comparison based on documentation, not empirical measurement on our specific dataset.
- MediaPipe's z-coordinate is relative, not absolute depth — may need calibration for grasp classification.
- Occlusion handling was not tested empirically; this is critical for grasps where fingers wrap around objects.
Next Steps¶
- When real dataset is available (HAND-004), re-run MediaPipe comparison with detection metrics.
- If OpenPose is needed, create a Docker image with OpenPose + Python bindings.
- Consider adding MMPose or HRNet as additional comparison candidates if needed.
- Implement grasp classification rules based on normalized landmark positions.