Paper:
Wang, Hongsong, and Liang Wang. “Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks.” e Conference on Computer Vision and Pa ern Recognition (CVPR). 2017.
- Basics
- Skeleton based action recognition
- Two-stream RNN
- Two architectures for temporal streams
- Stacked RNN
- Hieratical RNN
- Model spatial structure by converting spatial graph into a sequence of joints.
- Obtain 3D skeletons from depth images.
- Main method
- End- to -end two-stream RNN
- Fusion is performed by combining the softmax class posteriors from the two nets.
- Temporal channel. — Concatenate the 3D coordinates of different joints at each time step, get the generated sequence with a RNN.
- Stacked RNN
- Feed the concatenated coordinates of all joints into RNN. Stack two layers. Adding more layer will not improve the performance.
- Hierarchical RNN
- Divide human skeleton into 5 parts.
- Use hierarchical RNN to model the motions of different parts of the body (first layer) and the whole body (second layer).
- Stacked RNN
- Spatial RNN
- Nodes denote the joints and edges denote the physical connections.
- Action == the undirected graph displays some varied patterns of spatial structures.
- Select a temporal window centered at the time step and feed the coordinates of one joint inside the window to model the spatial relationship of joints.
- Three graph representations
- Undirected graph
- Chain sequence
- Traversal sequence
- Spatial RNN can recognize action based on just one graph representations.
- Nodes denote the joints and edges denote the physical connections.
- Data augmentation
- 3D transformation of skeletons
- Rotation
- Scaling
- Shear
- 3D transformation of skeletons
- Take home messages
- Other methods mentioned
- Body part based action recognition and Joint based action recognition.
- Based on hand-crafted low level features, use Markov Random Fields.
- Fully connected deep LSTM network with regularization terms to learn co-occurrence features of joints.
- — These methods
- RGB based action recognition
- Hieratical RNN, RNN with regularizations, differential RNN and part-aware Long Short Term memory
- Body part based action recognition and Joint based action recognition.