Skip to content

Overview

1. General Information

This dataset follows the standard 🤗 Hugging Face datasets format and contains imitation learning demonstrations collected from the AI Worker via ROS 2 teleoperation using the lerobot framework.

2. Dataset Schema

FieldTypeDescription
actionList[float32]Leader state vector
observation.stateList[float32]Follower state vector
observation.images.cam_headImageRGB image from the head-mounted camera
observation.images.cam_wrist_1ImageRGB image from the first wrist camera
observation.images.cam_wrist_2ImageRGB image from the second wrist camera
timestampfloat32Time (in seconds) when the step was recorded
frame_indexint64Index of the frame within an episode
episode_indexint64Index of the episode
indexint64Global index across the dataset
task_indexint64Task identifier

Create Your Datasets

1. Authenticate with Hugging Face

To create a Hugging Face dataset, you first need to log in using a write access token, which can be generated from your Hugging Face settings:

bash
huggingface-cli login --token ${HUGGINGFACE_TOKEN} --add-to-git-credential

Store your Hugging Face username in a variable:

bash
HF_USER=$(huggingface-cli whoami | head -n 1)
echo $HF_USER

2. Record Your Dataset

Launch the ROS 2 teleoperation node:

bash
container
bringup

Open a new terminal and navigate to the lerobot directory:

bash
container
cd /root/colcon_ws/src/physical_ai_tools/lerobot

Run the following command to start recording your Hugging Face dataset:

bash
python lerobot/scripts/control_robot.py \
  --robot.type=ffw \
  --control.type=record \
  --control.single_task="pick and place objects" \
  --control.fps=30 \
  --control.repo_id=${HF_USER}/ffw_test \
  --control.tags='["tutorial"]' \
  --control.warmup_time_s=5 \
  --control.episode_time_s=30 \
  --control.reset_time_s=10 \
  --control.num_episodes=10 \
  --control.push_to_hub=true \
  --control.play_sounds=false

💡 Make sure to replace ${HF_USER} with your actual Hugging Face username.

💡 To save the dataset locally without uploading to the Hugging Face Hub, set --control.push_to_hub=false.

🔧 Key Parameters to Customize

To create your own dataset, here are some important parameters you may want to adjust:

  • --control.repo_id The Hugging Face dataset repository ID in the format <username>/<dataset_name>. This is where your dataset will be saved and optionally pushed to the Hugging Face Hub.

  • --control.single_task The name of the task you're performing (e.g., "pick and place objects").

  • --control.episode_time_s Duration (in seconds) to record each episode.

  • --control.reset_time_s Time allocated (in seconds) for resetting your environment between episodes.

  • --control.num_episodes Total number of episodes to record for the dataset.

Of course, you can modify additional parameters as needed to fit your specific use case.

Dataset Visualization

Once data collection is complete, you can preview and inspect your recorded dataset using the following command:

bash
python lerobot/scripts/visualize_dataset_html.py \
  --repo-id ${HF_USER}/ffw_test

Then, you should see an output similar to the following:

bash
Fetching 4 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 3457.79it/s]
.gitattributes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.46k/2.46k [00:00<00:00, 45.9MB/s]
Fetching 126 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 126/126 [00:00<00:00, 266.66it/s]
Resolving data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 662258.53it/s]
INFO 2025-05-15 16:18:07 set_html.py:364 Output directory already exists. Loading from it: '/tmp/lerobot_visualize_dataset_uo6ddbb1'
 * Serving Flask app 'visualize_dataset_html'
 * Debug mode: off
INFO 2025-05-15 16:18:07 _internal.py:97 WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:9090
INFO 2025-05-15 16:18:07 _internal.py:97 Press CTRL+C to quit

💡 Once the server is running, open http://127.0.0.1:9090 in your browser to preview the dataset.

AI Worker released under the Apache-2.0 license.