Week 2: Data Pipeline

Day 13: NPZ Files

Day 13 of 2818 minGoal - Learn - Example - Practice - Checkpoint

Goal

Understand why pose data may be saved as NPZ.

Learn

  • NPZ is a compressed NumPy file format. It can store multiple arrays in one file, which makes it useful for training pipelines.
  • A SignLLM NPZ may store pose features, frame masks, confidence scores, frame count, labels, signer ID references, or normalized coordinates.
  • JSON is easier for humans to read. NPZ is often faster and smaller for training, but it should be paired with metadata that humans can inspect.

Example

  • Example structure: keypoints shape [frames, points, channels], confidence shape [frames, points], mask shape [frames], label = THANK-YOU, frame_count = 64.
  • A related metadata row might say source_video: signer03_thankyou_0042.mp4, pose_file: signer03_thankyou_0042.npz, split: train, qa_status: approved.

Practice

  1. Draw a pretend NPZ container with four arrays: keypoints, confidence, mask, and labels.
  2. Write what each array means in plain English.

Checkpoint

Before moving on

You can explain NPZ as a packed container of training numbers.

Pipeline note

Pipeline note

Do not rely on NPZ alone for project understanding. Keep readable metadata beside it.