Two Stream
Two-Stream Convolutional Network for Action Recognition in Videos
Inspiration for me
Problems raised in this article, research directions to explore, new ideology
- Video provides a natural wiki.en data augmentation
- 双向,金字塔结构或cascade
Basic Thought
- Two type of information
- appearance
- motion between frames
- Video is a type of natural wiki.en data augmentation
- 人眼
- ventral stream → object recognition
- dorsal stream → recognize motion
- optical flow → dense trajectory. L帧的视频得到L-1帧的光流图。通过相邻帧分析,每个点都会有一个运动矢量。
- 在本文之前的网络都没有学到时序信息。
- How to improve ?
- Late Fusion -> Early Fusion ?
- AlexNet -> VGG, NiN, wiki.en LSTM...
- Single Frame -> long video processing
- Optical flow
- sample along the trajectory. result shows it's worse
- Bi-direction
- sparse光流,体积极大变小光流
model architecture
new points
What's new raised by this article ?
- New model
- Why?Where ?New architecture ? How?Important
- New ideology
- New function
- New mission
- New training method
advancement
The progress gained based on article
- Beyond Short snippets: Deep Networks for Video Classification
- wiki.en LSTM → in short snippets, not so effective as we may thought
- optical flow