Paper:
Newell, Alejandro, Zhiao Huang, and Jia Deng. “Associative embedding: End-to-end learning for joint detection and grouping.” Advances in Neural Information Processing Systems. 2017.
- Performance
- Basics
- Associative embedding
- Jointly perform detections and grouping using a single-stage deep network trained end-to-end
- For each detection, introduce a “tag” (is a number) to identify which group this detection belongs to.
- Note: We have no ground truth tags for the network to predict, because what matters is not the particular tag values, only the difference between them.
- Output: Two heatmaps
- A heatmap for Per-pixel detection scores. (detection score at each pixel for each joint.)
- A heatmap for per-pixel identity tags.(tagging score at each pixel for each joint.)
- For multi-person pose estimation, output a detection heatmap and a tagging heatmap for each body joint, then group body joints with similar tags into individual people.
- Two loss functions together
- Detection loss: mean square error (MSE) between each predicted detection heatmap and its ground truth heatmap (is a 2D Gaussian activation at each keypoint location).
- Grouping loss: We compare the tags within each person and across people, Tags within a person should be the same, while tags across people should be different.
- Other methods mentioned.
- Vector embedding
- Perceptual organization: group pixels of an image into regions, parts and objects.
- Multiplerson pose estimation
- Instance segmentation
- Evaluation
- Dataset
- MPII human multi-person http://human-pose.mpi-inf.mpg.de/
- 25K images containing over 40K people with annotated body joints. Covers 410 human activities.
- COCO 2016 keypoints challenge. http://cocodataset.org/#keypoints-challenge2017
- MPII human multi-person http://human-pose.mpi-inf.mpg.de/
- Evaluation metrics
- Average precision (AP)
- Dataset
- Questions: How to get the tags of training data?