This open-source project was created by Alex Vesel and the project is available on GitHub at: https://github.com/alex-vesel/robotics/tree/main/teleop. Alex also provides a more detailed blog for reference: https://www.alexvesel.com/blog/imitation-learning/.
Hardware● myCobot 280 M5The myCobot 280 M5 is a compact and lightweight robotic arm, ideal for low-load application scenarios such as education, research, and light automation tasks. With its simple and beginner-friendly Python API, it is especially suitable for users who are new to robotics development. The device offers a seamless setup experience, making it easy to get started with basic motion control and programming. More information here.
Alex selected the Intel RealSense D435i as his primary workspace camera for the project. This camera is a depth camera with low minimum depth distance and integrated IMU, making it especially suitable for developers working on 3D mapping and reconstruction.
In order to solve the problem of field of view obstruction from the gripper, Alex added an wide FOV fisheye wrist camera upon the arm gripper.
In the data acquisition before the neural network training step, we need to control the 6 joints of the robot arm in real time and record their values. Alex ultimately decided to go with direct control via an Xbox wireless controller due to the ease of setup.
The model need a custom teleoperation data acquisition framework to receive joystick command, capture camera images, log both the images and Metadata, and finally execute the command on the robot.
Alex discovered that maintaining the correct order of these steps is critical to avoid violating the Markov assumption. In his early setup, the command was executed before image capture, which led to two major issues: blurry images due to robot motion, and more crucially, a causality violation—since the expert command was paired with a future state. This subtle error significantly hindered the model’s learning, resulting in poor performance in fine manipulation tasks like picking up an earplug. After correcting the operation sequence, the model’s performance improved noticeably.
● Building Neural Network ModelThe model receives the aligned RGB and depth image from the workspace Intel RealSense camera. Then, resized the image to 224x224 and normalized by the max value for each channel (255 for RGB and 4096 for the depth channels), and normalized between -1 and 1.
The preprocessed images are then passed into the customd ResNet10 model by Alex, which is a typical ResNet with a single residual residual block in each of the four ResNet stages.
The arm angles are likewise processed by a small fully connected network.
Action chunking addresses this problem by instead training the model to predict the next k steps of expert actions, effectively reducing the effective horizon of any given task by k-fold, leading to less possibility for compounding errors. This method can be combined with temporal ensembling whereby overlapping action predictions can be averaged to decrease prediction noise by incorporating the k predictions for any given timestep.
One challenge in behavior cloning is compounding errors, which can cause the robot to enter states not covered in the expert demonstration distribution. For example, if demonstrations only show the robot approaching the earplug directly from the start, the model may fail to act appropriately in similar positions near the workspace but away from the earplug. To address this distribution mismatch, techniques like Dataset Aggregation (DAgger) are used, where the expert labels states visited during model rollout. A similar strategy can be applied by diversifying initial states during data acquisition to better cover the relevant state space.
Another method called Disturbances for Augmenting Robot Trajectories (DART) more directly addresses the problem of compounding error by injecting a small amount of noise in the expert's actions as they are collecting data. In this case, Alex sampled a small amount of gaussian noise and added it to the commands he sent to the robot through teleop. This method works by effectively allowing the model to recover from small amounts of error in its predictions because these small error states are now effectively in-distribution of the training set.
Elephant Robotics offers the pymycobot Python package to allow interfacing with the firmware on the myCobot 280. The package offers simple Python functions that send serial commands to the robot and is quite easy to use and get started with.
However, there are few inefficiencies in the package's code that significantly slowed down the control rate of the robot.
Alex significantly increased the control rate by adding a list of no response commands that would break out of this timeout loop immediately after sending the first message. He also decreased the timeout while maintaining safely receive responses.
The following videos showcase the model's performance on the earplug manipulation task. The slight jerkiness in the motion of the robot is due to the model outputting small angle deltas and the overall processing speed of running the model and sending commands being longer than the time it takes for the collaborative robotic arm to move to the commanded position. The primary speed limiter is the speed of the myCobot firmware, not running the model.
We would like to extend our sincere gratitude to Alex for his creative project. By building an data acquisition system, designing algorithms and optimizing programs, Alex successfully made the robotic arm learn how to pick up objects autonomously through training case data. We hope more people will apply this technology to everyday life, bringing more convenience to the world.
Developers are welcome to participate in our User Case Initiative and showcase your innovative projects: https://www.elephantrobotics.com/en/call-for-user-cases-en/.
Comments