目录

Video Processing

2024/05/10 00:00:00·2026/05/19 10:23:00

AI视觉模型·3 min read

wiki.en anomaly detection

question before reading

how to realize real-time video processing ?
training time
how many data is required ?
The most fast and efficient algorithm till now.

term explanation

optical flow

Hand-crafted → CNN

DeepVideo article code
- Single Frame
- Late Fusion
- Early Fusion
- Slow Fusion

300

Less efficient than hand-crafted.

Two-Stream

Beyond Short snippets: Deep Networks for Video Classification
Convolutional Two-Stream Network Fusion for Video Action Recognition
- Spatial Early fusion
  - What? Conv Fusion performs well $y^{conv}=f^{conv}(x^a, x^b)$ y^{conv}=y^{cat}*f +b$$ where f is a bank of filter
  - Where?
  - UCF101 D-datasets Video Dataset
  - HMDB51 D-datasets Video Dataset
- Temporal fusion
- Global structure

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition (TSN) code
- long video
- good practices
- temporal segment 方法
- partial BN 方法
- pre-trained model → freeze all BN except the first one
- wiki.en optical flow
  - TVL1
- wiki.en data augmentation
  - corner cropping
    - typical cropping focus on center area. In corner cropping, extracted regions are only selected from the corners or the center
  - scale-jittering
    - width and height of cropped region are randomly selected from {256, 224, 192, 168}
- GPU
  - modified caff
  - data-parallel with multiple GPUs
- Results

3D-CNN

Video transformer