Week 3: Training and Review
Day 15: Dataset Curation
Goal
Understand dataset curation as careful selection and organization.
Learn
- Curation means deciding what belongs in the dataset and what should be removed, repaired, or held for review.
- A curated dataset checks duplicates, label consistency, signer coverage, lighting variety, license status, consent, and class balance.
- More data is not automatically better. A smaller reviewed dataset can be more useful than a large folder of messy clips.
Example
- Keep: signer centered, both hands visible, correct gloss, consent present.
- Review: one hand briefly hidden, uncertain gloss, low pose confidence.
- Reject: no consent, face missing when facial grammar matters, wrong label, corrupted file.
Practice
- Create keep/review/reject rules for a small sign clip dataset.
- Apply those rules to three pretend samples.
Checkpoint
Before moving on
You can explain that datasets are built, not just collected.
Pipeline note
Pipeline note
Write curation rules before reviewing clips so decisions stay consistent across reviewers.