Zia, M. Zeeshan, et al. “Detailed 3d representations for object recognition and modeling.” IEEE transactions on pattern analysis and machine intelligence 35.11 (2013): 2608-2623.
Li, Jun, Reinhard Klein, and Angela Yao. “Learning fine-scaled depth maps from single RGB images.” arXiv preprint arXiv:1607.00730 (2016).
Note by Lingyu.
Uchida, Yusuke. “Local Feature Detectors, Descriptors, and Image Representations: A Survey.” arXiv preprint arXiv:1607.08368 (2016).
Newell, Alejandro, Zhiao Huang, and Jia Deng. “Associative embedding: End-to-end learning for joint detection and grouping.” Advances in Neural Information Processing Systems. 2017.
- Associative embedding
- Jointly perform detections and grouping using a single-stage deep network trained end-to-end
- For each detection, introduce a “tag” (is a number) to identify which group this detection belongs to.
- Note: We have no ground truth tags for the network to predict, because what matters is not the particular tag values, only the difference between them.
- Output: Two heatmaps
- A heatmap for Per-pixel detection scores. (detection score at each pixel for each joint.)
- A heatmap for per-pixel identity tags.(tagging score at each pixel for each joint.)
- For multi-person pose estimation, output a detection heatmap and a tagging heatmap for each body joint, then group body joints with similar tags into individual people.
- Two loss functions together
- Detection loss: mean square error (MSE) between each predicted detection heatmap and its ground truth heatmap (is a 2D Gaussian activation at each keypoint location).
- Grouping loss: We compare the tags within each person and across people, Tags within a person should be the same, while tags across people should be different.
- Other methods mentioned.
- Vector embedding
- Perceptual organization: group pixels of an image into regions, parts and objects.
- Multiplerson pose estimation
- Instance segmentation
- Questions: How to get the tags of training data?
Zhao, Ruiqi, Yan Wang, and Aleix M. Martinez. “A simple, fast and highly-accurate algorithm to recover 3d shape from 2d landmarks on a single image.” IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).