Notes for Paper “A limb based graphical model for human pose estimation”

Paper:

Liang, Guoqiang, et al. “A limb-based graphical model for human pose estimation.” IEEE Transactions on Systems, Man, and Cybernetics: Systems (2017).

- - Code not available
  - Caffe
  - NVIDIA Tesla K40m GPU
  - Basics
    - New task: Human limb detection
      - Detect and represent the local image appearance.
    - Use human limbs to augment constraints between neighboring human joints.
    - Design a new limb representation: Model a limb as a wide line.
  - Main method: ConvNet consists of two modules: Limbs and joints detector, and a limb-based graphical model. Both output heatmaps and trained with Euclidean distance loss.
    - Unified framework detector: VGG16 architecture.
      - Human limb detection combined with joint localization
      - Integrate the two detection processes in a single CNN
    - After initial detections, a two-steps graphical model.
      - To capture the spatial relationship among human joints. And to capture the spatial relationship among limb in a coarse to fine way.
      - First step: Full-connected graphical model is used to capture the coarse relation from an arbitrary
      - Second step: Construct a new pairwise relation term based on limbs.
  - Other methods mentioned
    - Define the relationship as geometric constraint on the relative locations of two neighboring joints.
      - Not using the local appearance (image input itself) of the region connecting two neighboring joints
      - Lead to problems: double-counting and localization failure.
    - PS model (Pictorial Structures)
      - Most popular and influential model.
      - Model human limb as a rigid oriented rectangle
      - Model human limb as bar, detect it by searching parallel edges.
      - Model a limb with 2 joints. Or add an extra joint at the middle point.
      - Use image segmentation methods to distinguish limbs from background.
    - ConvNet based pose estimation
      - Extract appearance and type score.
      - Heat-map
        
        Heat-map based methods are per-pixel classification problems with large contextual information.
      - Use Conv-Net to learn a MRF-based graphical model.
    - Add motion feature
    - For Spatial relations:
      - Tree structure.
    - Appearance and relation models.
      - The relation among human parts is defined as geometric constraints on the location and orientation of parts.
        
        Spring like model
        
        Conditional probability of joints location
      - Note: For joints with higher flexibility， the constraint is too weak.
    - Graphical model over parts.
      - Nodes representing parts
      - Edges encoding constraints.
      - Note: limited by hand-crafted features and tree-based graphical models, the accuracy was not good.

- Limb modeling:
- Evaluation
  - PCP 74.6 on LSP
  - Dataset: FLIC, LSP

Lingyu Zhang

I completed my Ph.D. study in Distributed and Multidimensional Computer Vision Lab, RPI

Notes for Paper “A limb based graphical model for human pose estimation”

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply