Notes for Paper “MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild”


Rogez, Grégory, and Cordelia Schmid. “Mocap-guided data augmentation for 3d pose estimation in the wild.” Advances in Neural Information Processing Systems. 2016.


Dummy human pose for augmentation

  • Basics
    • Data augmentation for 3D pose estimation.
    • Input: Using 3D motion capture data.
    • Combine selected images to generate a new synthetic image. — stitching local image patches — Constraint on kinematical manner.
    • Cluster the training data into a large number of pose classes. — K way classification problem.
  • Main methods
    • Cluster 3D poses into K pose classes. Then generate the “dummy” pose image, just keep shape outline looks like a human pose, that will be fine.
    •  Input: two training sources — Images with annotated 2D pose && 3D MoCap data
    • Two process
      • MoCap guided mosaic construction  — Stitches image patches together
        • Input: 3D pose with n joints. && projected 2D joints in one view.
        • Output: For an image, we find each joint in the image which corresponds with the pose.
        • Get the transformation matrix of the joint’s location from one pose to another. — Measure the similarity between the joint in the 2nd pose and the aligned joint from 1st pose to the 2nd pose.
        • Increase the weight for the neighboring joints.
        • Transfer the cropped image to another pose, and select the patch to form a new image.
      • Pose-aware blending — improve image quality, erases patch seams.
        • Solving the boundaries between image regions.
        • Select a surrounding squared region. — Evaluate how much each image should contribute to the pixel. — Final is computed as the weighted sum over all aligned images.
    • CNN for full-body 3D pose estimation
      • Shows that with only synthetic data, we can still obtain good performance.
  • Take home messages.
  • Other methods mentioned.
    • Data augmentation
      • Jittering
      • Complex affine
    • 3D pose estimation
      • CNNs — trained on 3D MoCap data in constrained environments.
      • Estimate 3D pose from 2D poses data.
        • 2D pose detector
        • Or jointly learn 2D and 3D pose.
        • Dual source approach — combines 2D pose estimation and 3D pose retrieval.
      • Synthetic pose data.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s