Saltar a contenido

Ground Truth Labeling Guide

Date: 2026-05-07 Refs: #36 (HAND-004)

Purpose

This guide defines how to manually label hand grasp images/video for the LABIS dataset. Ground truth labels are essential for validating AI model outputs.

Who Labels

  • Primary annotator: trained researcher familiar with grasp taxonomy (see grasp-taxonomy.md).
  • If inter-annotator agreement is needed: two independent annotators label the same sample, disagreements resolved by a third.

Label Schema

Each sample in the manifest requires a ground_truth_label with the following format:

{grasp_id}:{confidence}

Where: - grasp_id: One of the 6 taxonomy IDs (power_grasp, precision_pinch, spherical_grasp, lateral_pinch, tripod_grasp, hook_grasp) or no_grasp / ambiguous. - confidence: certain, probable, or uncertain.

Examples

  • power_grasp:certain — clearly a power grasp, no ambiguity.
  • precision_pinch:probable — looks like precision pinch but slight finger position uncertainty.
  • ambiguous:uncertain — cannot determine grasp type from this angle/frame.

Labeling Procedure

  1. Open image or video frame.
  2. Identify which grasp type is being performed using grasp-taxonomy.md definitions.
  3. Assess confidence based on visibility of key landmarks.
  4. Record in manifest CSV: ground_truth_label column.
  5. If the grasp does not match any taxonomy entry, use no_grasp or ambiguous.

Quality Criteria

  • Label only frames where the hand is in a stable grasp position (not transitioning).
  • If occlusion prevents identification, label as ambiguous:uncertain.
  • For video: label the frame with clearest grasp, note the timestamp.
  • Never guess based on the object alone; the hand posture must confirm the grasp type.

Common Mistakes

  • Labeling based on object identity rather than observed hand posture.
  • Confusing lateral_pinch (thumb-to-index-side) with precision_pinch (tip-to-tip).
  • Labeling transitional frames where the grasp is not yet formed.