Notes for Paper “Associative Embedding: End-to-End Learning for Joint Detection and Grouping”

Paper:

Newell, Alejandro, Zhiao Huang, and Jia Deng. “Associative embedding: End-to-end learning for joint detection and grouping.” Advances in Neural Information Processing Systems. 2017.

  • Performance
  • Basics
    • Associative embedding
    • Jointly perform detections and grouping using a single-stage deep network trained end-to-end
    • For each detection, introduce a “tag” (is a number) to identify which group this detection belongs to.
      • Note: We have no ground truth tags for the network to predict, because what matters is not the particular tag values, only the difference between them.
    • Output: Two heatmaps
      • A heatmap for Per-pixel detection scores. (detection score at each pixel for each joint.)
      • A heatmap for per-pixel identity tags.(tagging score at each pixel for each joint.)
      • For multi-person pose estimation, output a detection heatmap and a tagging heatmap for each body joint, then group body joints with similar tags into individual people.
    • Two loss functions together
      • Detection loss: mean square error (MSE) between each predicted detection heatmap and its ground truth heatmap (is a 2D Gaussian activation at each keypoint location).
      • Grouping loss: We compare the tags within each person and across people, Tags within a person should be the same, while tags across people should be different.
  • Other methods mentioned.
    • Vector embedding
    • Perceptual organization: group pixels of an image into regions, parts and objects.
    • Multiplerson pose estimation
    • Instance segmentation
  • Evaluation
  • Questions: How to get the tags of training data?