3D Human Pose Estimation and Forecasting from the Robot’s Perspective

The HARPER Dataset

1 Department of Engineering for Innovation Medicine, University of Verona, Italy
2 University of Glasgow, UK

Abstract

We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot’s perspective, i.e., on the data captured by the robot’s sensors. These make 3D body pose analysis challenging because being close to the ground captures humans only partially. The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users. The Corpus contains not only the recordings of the built-in stereo cameras of Spot, but also those of a 6-camera OptiTrack system (all recordings are synchronized). This leads to ground-truth skeletal representations with a precision lower than a millimeter. In addition, the Corpus includes reproducible benchmarks on 3D Human Pose Estimation, Human Pose Forecasting, and Collision Prediction, all based on publicly available baseline approaches. This enables future HARPER users to rigorously compare their results with those we provide in this work.

Video

The HARPER Dataset

We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot’s perspective, i.e., on the data captured by the robot’s sensors.

Data

  • 15 different interactions
  • 17 participants
  • 10 actions involve physical contact between the Spot and users
  • Recordings of the built-in stereo cameras of Spot
  • Recordings of a 6-camera OptiTrack system
  • Ground-truth skeletal representations with a precision lower than a millimeter

Sensors

  • Spot's stereo cameras (5 Greyscale + Depth and 1 RGB-D)
  • 6-camera OptiTrack system
  • External RGB Camera

Annotations

  • Human 21-joint 3D skeletal model
  • Spot 21-joint 3D skeletal model
  • 2D keypoints annotated on the Spot's cameras
  • Per-keypoint visibility

Benchmarks (from the Spot's Perspective)

3D Human Pose Estimation

Estimate the 3D human pose from the robot's perspective (camera + depth).

Human Pose Forecasting

Forecast the future human poses (all the keypoints) from the robot's perspective.

Collision Prediction

Collision estimation between the robot and the human on the forecasted poses.