Paper: Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhudinov. “Unsupervised learning of video representations using lstms.” International conference on machine learning. 2015.
- Basics
- Gradient vanishing problem.
- Solution: LSTM (Long short term Memory)
- Unsupervised learning model
- Crucial to have the right inductive biases and right objective functions.
- Gradient vanishing problem.
- Main methods
- LSTM encoder to get representations of videos
- Multi-LSTM decoders for different tasks. — Produce a target sequence
- Different choices of target sequences.
- Same as input
- To predict the future
- Different choices of target sequences.
- Inputs: — two kinds
- Image patches
- High level percepts
- Simple squared loss function as a starting point
- Encoder-decoder RNN that can be used with any loss functions.
- LSTM details
- Cell: memory unit
- Input date, forget date and output date
- LSTM autoencoder model
- Two RNNs: encoder and decoder
- Input: a sequence of vectors
- After the last input been read, the decoder start to do prediction and output the sequence
- Output: Same as input sequence, reverse order
- LSTM future predictor model
- Predict only the next frame at each time step
- Combination of the above two models
- Take home messages
- Other methods mentioned — for video representations learning
- Supervised learning: 3D convolutional nets.
- Unsupervised learning:
- ICA (Independence subspace analysis)
- Generative models for understanding transformations between 2 consecutive images
- Generative model for predicting next frame or interpolate between frames — Important to choose right loss function — Not square loss function