To accelerate the video recognition architectures, one typically build a lightweight video key frame sampler to firstly sample a few salient frames and then evoke a recognizer on them, to reduce temporal redundancies. However, existing methods neglect the discrimination between non-salient frames and salient ones in training objectives. We introduced a novel multi-granularity supervision scheme to suppress the non-salient frames and achieved SOTA accuracy with very low GFLOPs and wall-clock time ($4\times$ faster than SOTA methods) on 4 video recognition benchmarks.
  • 0
  • 0
Interest Score
1
HIT Score
0.00
Domain
lawrencexia2008.github.io

Actual
lawrencexia2008.github.io

IP
185.199.108.153, 185.199.109.153, 185.199.110.153, 185.199.111.153

Status
OK

Category
Company
0 comments Add a comment