Week 3: Training and Review

Day 15: Dataset Curation

Day 15 of 2818 minGoal - Learn - Example - Practice - Checkpoint

Goal

Understand dataset curation as careful selection and organization.

Learn

  • Curation means deciding what belongs in the dataset and what should be removed, repaired, or held for review.
  • A curated dataset checks duplicates, label consistency, signer coverage, lighting variety, license status, consent, and class balance.
  • More data is not automatically better. A smaller reviewed dataset can be more useful than a large folder of messy clips.

Example

  • Keep: signer centered, both hands visible, correct gloss, consent present.
  • Review: one hand briefly hidden, uncertain gloss, low pose confidence.
  • Reject: no consent, face missing when facial grammar matters, wrong label, corrupted file.

Practice

  1. Create keep/review/reject rules for a small sign clip dataset.
  2. Apply those rules to three pretend samples.

Checkpoint

Before moving on

You can explain that datasets are built, not just collected.

Pipeline note

Pipeline note

Write curation rules before reviewing clips so decisions stay consistent across reviewers.