# Notes for Paper “Compositional human pose regression”

Paper:

Sun, Xiao, et al. “Compositional human pose regression.” The IEEE International Conference on Computer Vision (ICCV). Vol. 2. 2017.

Key: Structure-aware

• Performance:
• 48.3mm on H3.6M Protocol 1 (Avg joint error)
• 59.1mm on H3.6M Protocol 2 (Avg joint error)
• PCK(0.5) 86.4 on MPII
• Evaluation
• Metrics:
• Absolute
• 3D: Procrustes Analysis + MPJPE
• 2D: PCK
• Relative:
• 2D: Mean per bone position error
• 3D pose: bone length standard deviation and the percentage of illegal joint angle.
• MPII, H3.6M
• Basics
• Structure-aware approach
• Use bones instead of joints as pose representation.
• Use joint connection structure to define a compositional loss function.
• Just re-parameterizes the pose representation. Compatible with any other algorithm design.
• Both 3D and 2D
• Main method
• Use L1 norm for joint regression. (instead of squared distance)
• Bone based representation.
• Bone is easier to learn compared with joints. And Bone can express constraints more easily than joints.
• Many pose-driven applications only need local bone, not global joints.
• Use L1 norm for bone loss function.
• Bone is a vector from one joint to another joint. Then the relative joint position is the summation of the bones along the path.
• Network
• ResNet-50 pre-trained on ImageNet
• Last FC outputs 3-coordinates (or 2-coordinates)

• Other methods mentioned
• Detection based and regression based
• The heatmaps are usually noisy and multi-mode
• Problem: Simply minimize the per-joint location errors independently but ignore the internal structures of the pose.
• 3D pose estimation
• Not use prior knowledge in 3D model
• Use two separate steps: First do 2D joint prediction, then re-construct the 3D pose via optimization or search.
• [ Sparseness Meets Deepness] combines uncertainty maps of the 2D joints location and a sparsity-driven 3D geometric prior to infer the 3D joint location via an EM (expectation maximization) algorithm
• Represents 3D pose with an over-complete dictionary, use high-dim latent pose representation
• Extend Hourglass from 2D to 3D
• Use prior knowledge in 3D model
• Embedding kinematic model layer into deep neutral networks and estimating model parameters instead of joints.
• The kinematic model parameterization is highly non-linear and its optimization in deep networks is hard.
• 2D pose estimation
• Pure Graphical models, inference models.
• PS model
• Graphical model with CNN
• Evaluation
• Dataset: H3.6M
• Metrics:
• 59.1 mm Average joint error.
• 86.4% PCK(h0.5)
• Coding
• Caffe
• Two GPU