Paper:
Sun, Lin, et al. “Lattice long short-term memory for human action recognition.” arXiv preprint arXiv:1708.03958 (2017).
- Basics
- CNN methods for spatial appearance
- RNN methods (LSTM) for temporal dynamics. — Natively applying RNN only suitable for short term motions.
- Main methods
- Lattice-LSTM. — extend LSTM by learning independent hidden state transitions of memory cells for individual spatial locations.
- Control gates are shared between RGB and optical flow stream.
- Greatly enhance the capacity of the memory cell to learn motion dynamics.
- Multi-model training procedure. — Train both input gates and forgor gates in the network. (Other two-stream network training these two separately)
- Lattice-LSTM. — extend LSTM by learning independent hidden state transitions of memory cells for individual spatial locations.
- Take home message
- Other methods mentioned
- Extension of CNN. –C3D learns both space and time.– Only covers a short range of the sequence.
- Training another nerual network on optical flow.
- Methods for obtain a better combination of appearance and motion: spatial-temporal features using sequential procedure. 2D spatial (short) and 1D temporal (long)information.
- ResNets
- RNN, LSTM — encoder and decoder