V2A-MAPPER

Updated 333 days ago

ID: 52909838/3

v2a-mapper.github.io

CLICK HERE TO SEE DETAILS OF COMPANY CHANGES

Our lightweight method only requires the training of a V2A-Mapper to bridge the domain gap between the vision representative FM CLIP and the audio generative FM AudioLDM. The V2A-Mapper is supervised by the audio representative FM CLAP to learn the translation from visual space to auditory space. Leveraging the generalization and knowledge transfer ability of foundation models, the V2A-Mapper is trained with the same modestly sized dataset but the overall system can achieve much better performance... In this project, we propose a lightweight solution to this problem by leveraging foundation models, specifically CLIP, CLAP, and AudioLDM. We first investigate the domain gap between the latent space of the visual CLIP and the auditory CLAP models. Then we propose a simple yet effective mapper mechanism V2A-Mapper to bridge the domain gap by translating the visual input between CLIP and CLAP spaces. Conditioned on the translated CLAP embedding, pretrained audio generative FM AudioLDM is..

SEARCH FOR SIMILAR COMPANIES

Interest Score

HIT Score

0.00

Domain

v2a-mapper.github.io

Actual

v2a-mapper.github.io

185.199.108.153, 185.199.109.153, 185.199.110.153, 185.199.111.153

Status

Category

Company

V2A-MAPPER

People Also Viewed