Zia, M. Zeeshan, et al. “Detailed 3d representations for object recognition and modeling.” IEEE transactions on pattern analysis and machine intelligence 35.11 (2013): 2608-2623.
Li, Jun, Reinhard Klein, and Angela Yao. “Learning fine-scaled depth maps from single RGB images.” arXiv preprint arXiv:1607.00730 (2016).
Note by Lingyu.
Uchida, Yusuke. “Local Feature Detectors, Descriptors, and Image Representations: A Survey.” arXiv preprint arXiv:1607.08368 (2016).
Newell, Alejandro, Zhiao Huang, and Jia Deng. “Associative embedding: End-to-end learning for joint detection and grouping.” Advances in Neural Information Processing Systems. 2017.
- Associative embedding
- Jointly perform detections and grouping using a single-stage deep network trained end-to-end
- For each detection, introduce a “tag” (is a number) to identify which group this detection belongs to.
- Note: We have no ground truth tags for the network to predict, because what matters is not the particular tag values, only the difference between them.
- Output: Two heatmaps
- A heatmap for Per-pixel detection scores. (detection score at each pixel for each joint.)
- A heatmap for per-pixel identity tags.(tagging score at each pixel for each joint.)
- For multi-person pose estimation, output a detection heatmap and a tagging heatmap for each body joint, then group body joints with similar tags into individual people.
- Two loss functions together
- Detection loss: mean square error (MSE) between each predicted detection heatmap and its ground truth heatmap (is a 2D Gaussian activation at each keypoint location).
- Grouping loss: We compare the tags within each person and across people, Tags within a person should be the same, while tags across people should be different.
- Other methods mentioned.
- Vector embedding
- Perceptual organization: group pixels of an image into regions, parts and objects.
- Multiplerson pose estimation
- Instance segmentation
- Questions: How to get the tags of training data?
Zhao, Ruiqi, Yan Wang, and Aleix M. Martinez. “A simple, fast and highly-accurate algorithm to recover 3d shape from 2d landmarks on a single image.” IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).
Zhou, Xingyi, et al. “Deep kinematic pose regression.” European Conference on Computer Vision. Springer, Cham, 2016.
Tekin, Bugra, et al. “Structured prediction of 3d human pose with deep neural networks.” arXiv preprint arXiv:1605.05180(2016).
Chen, Xianjie, and Alan Yuille. “Parsing occluded people by flexible compositions.” Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. IEEE, 2015.