Video Processing
more related content
question before reading
- how to realize real-time video processing ?
- training time
- how many data is required ?
- The most fast and efficient algorithm till now.
term explanation
- optical flow
Hand-crafted → CNN
Less efficient than hand-crafted.
Two-Stream
Beyond Short snippets: Deep Networks for Video Classification
Convolutional Two-Stream Network Fusion for Video Action Recognition
- Spatial Early fusion
- What? Conv Fusion performs well $$y^{conv}=f^{conv}(x^a, x^b)$$y^{conv}=y^{cat}*f +b$$ where f is a bank of filter
- Where?
- UCF101 D-datasets Video Dataset
- HMDB51 D-datasets Video Dataset
- What? Conv Fusion performs well
- Temporal fusion
- Global structure
- Spatial Early fusion
- Temporal Segment Networks: Towards Good Practices for Deep Action Recognition (TSN) code
- long video
- good practices
- temporal segment 方法
- partial BN 方法
- pre-trained model → freeze all BN except the first one
- wiki.en optical flow
- TVL1
- wiki.en data augmentation
- corner cropping
- typical cropping focus on center area. In corner cropping, extracted regions are only selected from the corners or the center
- scale-jittering
- width and height of cropped region are randomly selected from {256, 224, 192, 168}
- corner cropping
- GPU
- modified caff
- data-parallel with multiple GPUs
- Results