Notes for Paper “Detailed 3D Representations for Object Recognition and Modeling”
Paper:
Zia, M. Zeeshan, et al. “Detailed 3d representations for object recognition and modeling.” IEEE transactions on pattern analysis and machine intelligence 35.11 (2013): 2608-2623.
Notes for “A Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB Images”
Paper:
Li, Jun, Reinhard Klein, and Angela Yao. “Learning fine-scaled depth maps from single RGB images.” arXiv preprint arXiv:1607.00730 (2016).
Survey and Summary for Datasets
Note by Lingyu.
Survey and summary for 2D feature descriptors
Source:
Uchida, Yusuke. “Local Feature Detectors, Descriptors, and Image Representations: A Survey.” arXiv preprint arXiv:1607.08368 (2016).
Notes for Paper “Associative Embedding: End-to-End Learning for Joint Detection and Grouping”
Paper:
Newell, Alejandro, Zhiao Huang, and Jia Deng. “Associative embedding: End-to-end learning for joint detection and grouping.” Advances in Neural Information Processing Systems. 2017.
- Performance
- Basics
- Associative embedding
- Jointly perform detections and grouping using a single-stage deep network trained end-to-end
- For each detection, introduce a “tag” (is a number) to identify which group this detection belongs to.
- Note: We have no ground truth tags for the network to predict, because what matters is not the particular tag values, only the difference between them.
- Output: Two heatmaps
- A heatmap for Per-pixel detection scores. (detection score at each pixel for each joint.)
- A heatmap for per-pixel identity tags.(tagging score at each pixel for each joint.)
- For multi-person pose estimation, output a detection heatmap and a tagging heatmap for each body joint, then group body joints with similar tags into individual people.
- Two loss functions together
- Detection loss: mean square error (MSE) between each predicted detection heatmap and its ground truth heatmap (is a 2D Gaussian activation at each keypoint location).
- Grouping loss: We compare the tags within each person and across people, Tags within a person should be the same, while tags across people should be different.
- Other methods mentioned.
- Vector embedding
- Perceptual organization: group pixels of an image into regions, parts and objects.
- Multiplerson pose estimation
- Instance segmentation
- Evaluation
- Dataset
- MPII human multi-person http://human-pose.mpi-inf.mpg.de/
- 25K images containing over 40K people with annotated body joints. Covers 410 human activities.
- COCO 2016 keypoints challenge. http://cocodataset.org/#keypoints-challenge2017
- MPII human multi-person http://human-pose.mpi-inf.mpg.de/
- Evaluation metrics
- Average precision (AP)
- Dataset
- Questions: How to get the tags of training data?
Notes for Paper “A Simple, Fast and Highly-Accurate Algorithm to Recover 3D Shape from 2D Landmarks on a Single Image”
Paper:
Zhao, Ruiqi, Yan Wang, and Aleix M. Martinez. “A simple, fast and highly-accurate algorithm to recover 3d shape from 2d landmarks on a single image.” IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).
Notes for Paper “Deep Kinematic Pose Regression”
Paper:
Zhou, Xingyi, et al. “Deep kinematic pose regression.” European Conference on Computer Vision. Springer, Cham, 2016.
Notes for Paper “Structured prediction of 3d human pose with deep neural networks”
Paper:
Tekin, Bugra, et al. “Structured prediction of 3d human pose with deep neural networks.” arXiv preprint arXiv:1605.05180(2016).
Notes for Paper “Parsing Occluded People by Flexible Compositions”
Paper:
Chen, Xianjie, and Alan Yuille. “Parsing occluded people by flexible compositions.” Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. IEEE, 2015.