目录

Two Stream

2024/05/10 00:00:00·2026/05/19 10:23:00

AI视觉模型·3 min read

Two Stream 视频理解深度学习光流

Two-Stream Convolutional Network for Action Recognition in Videos

Inspiration for me

Problems raised in this article, research directions to explore, new ideology

Video provides a natural wiki.en data augmentation
双向，金字塔结构或cascade

Basic Thought

Two type of information
- appearance
- motion between frames
Video is a type of natural wiki.en data augmentation
人眼
- ventral stream → object recognition
- dorsal stream → recognize motion
optical flow → dense trajectory. L帧的视频得到L-1帧的光流图。通过相邻帧分析，每个点都会有一个运动矢量。
在本文之前的网络都没有学到时序信息。
How to improve ?
- Late Fusion -> Early Fusion ?
- AlexNet -> VGG, NiN, wiki.en LSTM...
- Single Frame -> long video processing
- Optical flow
  - sample along the trajectory. result shows it's worse
  - Bi-direction
  - sparse光流，体积极大变小光流

Pasted image 20240401164549.png

model architecture

Pasted image 20240331153354.png

new points

What's new raised by this article ?

New model
- Why？Where ?New architecture ? How？Important
New ideology
New function
New mission
New training method

advancement

The progress gained based on article

Beyond Short snippets: Deep Networks for Video Classification
- Google
- wiki.en LSTM → in short snippets, not so effective as we may thought
- optical flow