IMAGE MODELS

Updated 412 days ago
  • ID: 51343341/1
In video understanding, a common practice is bootstrapping from an image pre-trained model and then finetuning on the video data. There are two dominating directions as shown in the left part of the figure above, one is to extend an image model with additional temporal module, the other is to inflate an image model to a video model. However, full finetuning such a video model could be computationally expensive and unnecessary, given that the pre-trained image transformer models have demonstrated exceptional transferability. In this work, we propose a novel method to Adapt pre-trained Image Models (AIM) for efficient video understanding. By freezing the pre-trained image model and adding a few lightweight Adapters, we introduce spatial adaptation, temporal adaptation and joint adaptation to gradually equip an image model with spatiotemporal reasoning capability. We show that our proposed AIM can achieve competitive or even better performance than prior arts with substantially fewer..
  • 0
  • 0
Interest Score
3
HIT Score
0.00
Domain
adapt-image-models.github.io

Actual
adapt-image-models.github.io

IP
185.199.108.153, 185.199.109.153, 185.199.110.153, 185.199.111.153

Status
OK

Category
Company
0 comments Add a comment