HumanoidOlympics: Sports Environments for Physically Simulated Humanoids
Abstract
We present HumanoidOlympics, a collection of physically simulated sports environments designed for the animation and robotics communities to develop humanoid behaviors. Our suite includes individual sports such as golf, javelin throw, high jump, long jump, and hurdling, as well as competitive games like table tennis, tennis, fencing, boxing, soccer, and basketball. By simulating a wide range of Olympic sports, HumanoidOlympics offers a rich and standardized testing ground to evaluate and develop learning algorithms due to the diversity and physically demanding nature of athletic activities. Our suite supports simulating both graphics-focused (SMPL and SMPL-X) and real-world humanoid robots. For each sport, we benchmark popular humanoid control methods and provide expert-designed rewards that lead to surprising simulation results. Our analysis shows that leveraging human demonstrations can significantly enhance the resulting policies' human likeness and task performance. By providing a unified and competitive sports benchmark, HumanoidOlympics can help the animation and robotics communities develop human-like and performant controllers.
In this section, we provide a collage of the policies trained using our sports environments and the preliminary reward designs. Fencing and boxing results use our competitive self-play.
SMPL + SMPLX
Table Tennis
Tennis
Boxing
Fencing
Penalty Kick
Free Throw
Soccer 1v1
Soccer 2v2
Javelin
Golf
High Jump
Long Jump
Hurdle
Unitree H1 + G1
All of the above sports environments support both the SMPL humanoids as well as the real-world humanoid robots (Unitree H1 and G1). Here we provide some samples of motion imitation for H1 and G1 (using retargeted motion from AMASS); we also provide some results on humanoid sports.
Notice that for H1 and G1 simulation, we use the same simulation parameters (200 Hz simulation, 50 Hz control, joint limit, torque limit, weight, etc.) as the previous sim-to-real efforts, without adding any domain randomization.
H1 Humanoid Motion Retargeting & Imitation
G1 Humanoid Motion Retargeting & Imitation
H1 Humanoid (Boxing: PULSE)
H1 Humanoid (High Jump + Hurdling: PPO)
G1 Humanoid (Hurdling: PPO)
G1 Humanoid (Penalty Kick: PPO)
Data from Videos
Our SMPL based humanoid enables us to directly use poses estimated from videos as human demonstration data.
Here, we provide sample visualizations of the motion data extracted from videos using our pose estimation then simulation refinement pipeline. We can see that our extracted motion is physically plausible and describes a unique style of motion for that sport.
Soccer
Tennis
Golf
Boxing
Ablations on Using Motion Imitation for Physically Plausible Refinement
In this section, we ablate the importance of using a motion imitator (PHC) for pose refinement when acquiring data from videos. Here, we test two sequences of motion demonstration, one with refinement and one without.
We train the PULSE+AMP model using these two sequences as prior and show that the quality of the demonstration data is important. Our refinement step leads to a better motion prior.
W/o Refinement
W/ Refinement
Algorithms Comparisons
In this section, we provide visual comparisons of state-of-the-art humanoid control methods (PPO-only/AMP/PULSE). For sports that have accompanied human demonstration data from videos, we also provide PULSE+AMP as a baseline.
High Jump
For high jump, we can see that using PPO without any motion prior will yield an inhuman jumping motion. AMP, due to the task difficulty and no specific high jump motion in AMASS, decides to ignore the task reward, only optimize the discriminator reward, and stand still.
Using motor skills learned from AMASS, surprisingly, PULSE can discover the Fosbury way of high jump.
PPO
AMP
PULSE
Long Jump
For long jump, we can see that using PPO without any motion prior will lead to inhuman motion. AMP, due to the task difficulty and no specific long jump motion in AMASS, decides to ignore the task reward, only optimize the discriminator reward, and stand still.
Using motor skills learned from AMASS, PULSE can long jump with human-like motion.
PPO
AMP
PULSE
Hurdling
For hurdling, we can see that using PPO without any motion prior will yield inhuman motion. AMP, due to the task difficulty and no specific hurdling motion in AMASS, decides to ignore the task reward, only optimize the discriminator reward, and stand still.
Using motor skills learned from AMASS, PULSE can cross hurdles with human-like motion.
PPO
AMP
PULSE
Javelin
For Javelin throw, we can see that using PPO without any motion prior will yield inhuman motion. AMP, even though with the throwing motion from videos, prioritizes discriminator reward (with the swinging hand motion) and fails to throw.
Using motor skills learned from AMASS, PULSE can throw using human-like motion and even learns to jump to gain momentum.
PPO
AMP
PULSE
Golf
For golfing, PPO without any motion prior will yield inhuman motion (kicking the ball with the pelvis block). AMP, even though with the motion from video, optimizes only the task reward and ignores the discriminator reward (another possible failure mode).
PULSE can kick the golf ball using human-like motion, but PULSE + AMP uses more golf-like motion due to the style guidance from human demonstration.
PPO
AMP
PULSE
PULSE+AMP
Tennis
For tennis, PPO without any motion prior will yield an inhuman swinging motion. AMP will use tennis-like motion when not hitting the ball, but when trying to hit the ball, inhuman behavior surfaces. This is another symptom of
the disagreement between task and discriminator reward. PULSE and PULSE+AMP can hit the ball using human-like motion.
PPO
AMP
PULSE
PULSE+AMP
Table Tennis
For table tennis, PPO without any motion prior results in an inhuman swinging motion. AMP uses ping-pong-like motion when not hitting the ball, but when trying to hit the ball, uses inhuman behavior. PULSE can hit the ball using human-like motion, but PULSE + AMP uses more table-tennis-like motion due to the style guidance from human demonstration.
PPO
AMP
PULSE
PULSE+AMP
Free Throw
For free throw, PPO without any motion prior and AMP both fail to learn proper free throw motion. PULSE and PULSE + AMP can both achieve a high free throw success rate using human-like motion.
PPO
AMP
PULSE
PULSE+AMP
Penalty Kick
For penalty kicks, PPO without any motion prior and AMP both fail to learn proper free throw motion. This is due to the difficulty of learning human-object interaction from scratch. The reward design also plays a role where PPO is exploiting the
player-to-ball reward instead of learning the kicking motion. PULSE and PULSE + AMP can both learn to push the ball. However, PULSE learns to kick with a human-like motion, but PULSE+AMP suffers from the conflict between style and task reward.