Notes for Paper “A limb based graphical model for human pose estimation”

Paper:

Liang, Guoqiang, et al. “A limb-based graphical model for human pose estimation.” IEEE Transactions on Systems, Man, and Cybernetics: Systems (2017).

      • Code not available
      • Caffe
      • NVIDIA Tesla K40m GPU
      • Basics
        • New task: Human limb detection
          • Detect and represent the local image appearance.
        • Use human limbs to augment constraints between neighboring human joints.
        • Design a new limb representation: Model a limb as a wide line.
      • Main method: ConvNet consists of two modules: Limbs and joints detector, and a limb-based graphical model. Both output heatmaps and trained with Euclidean distance loss.
        • Unified framework detector: VGG16 architecture.
          • Human limb detection combined with joint localization
          • Integrate the two detection processes in a single CNN
        • After initial detections, a two-steps graphical model.
          • To capture the spatial relationship among human joints. And to capture the spatial relationship among limb in a coarse to fine way.
          • First step: Full-connected graphical model is used to capture the coarse relation from an arbitrary
          • Second step: Construct a new pairwise relation term based on limbs.
      • Other methods mentioned
        • Define the relationship as geometric constraint on the relative locations of two neighboring joints.
          • Not using the local appearance (image input itself) of the region connecting two neighboring joints
          • Lead to problems: double-counting and localization failure.
        • PS model (Pictorial Structures)
          • Most popular and influential model.
          • Model human limb as a rigid oriented rectangle
          • Model human limb as bar, detect it by searching parallel edges.
          • Model a limb with 2 joints. Or add an extra joint at the middle point.
          • Use image segmentation methods to distinguish limbs from background.
        • ConvNet based pose estimation
          • Extract appearance and type score.
          • Heat-map
            • Heat-map based methods are per-pixel classification problems with large contextual information.
          • Use Conv-Net to learn a MRF-based graphical model.
        • Add motion feature
        • For Spatial relations:
          • Tree structure.
        • Appearance and relation models.
          • The relation among human parts is defined as geometric constraints  on the location and orientation of parts.
            • Spring like model
            • Conditional probability of joints location
          • Note: For joints with higher flexibility, the constraint is too weak.
        • Graphical model over parts.
          • Nodes representing parts
          • Edges encoding constraints.
          • Note: limited by hand-crafted features and tree-based graphical models, the accuracy was not good.

 

    • Limb modeling:
    • Evaluation
      • PCP 74.6 on LSP
      • Dataset: FLIC, LSP

Leave a comment