# music2video Overview
A repo for making a AI-generated music video from any song with Wav2CLIP and VQGAN-CLIP.
The base code was derived from [VQGAN-CLIP](https://github.com/nerdyrodent/VQGAN-CLIP)
The CLIP embedding for audio was derived from [Wav2CLIP](https://github.com/descriptinc/lyrebird-wav2clip)
A technical paper describing the mechanism is provide in the following link: [Music2Video: Automatic Generation of Music Video with fusion of audio and text](https://arxiv.org/abs/2201.03809v2)
The citation for the technical paper is provided below:
```bibtex
@article{jang2022music2video,
title={Music2Video: Automatic Generation of Music Video with fusion of audio and text},
author={Jang, Joel and Shin, Sumin and Kim, Yoonjeon},
journal={arXiv preprint arXiv:2201.03809},
year={2022}
}
```
## Sample
A sample of a music video created with this repository is available at [this youtube link](https://youtu.be/CaS-ruEiUcg)
Here is a sample of snapshots in a generated music-video with its lyrics:
![sample](https://user-images.githubusercontent.com/41067235/146651217-6fee9676-42a6-4359-9c5b-49beef42c6c9.png)
You can make one with your own song too!
## Set up
This example uses [Anaconda](https://www.anaconda.com/products/individual#Downloads) to manage virtual Python environments.
Create a new virtual Python environment for VQGAN-CLIP:
```sh
conda create --name vqgan python=3.9
conda activate vqgan
```
Install Pytorch in the new enviroment:
Note: This installs the CUDA version of Pytorch, if you want to use an AMD graphics card, read the [AMD section below](#using-an-amd-graphics-card).
```sh
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
```
Install other required Python packages:
```sh
pip install ftfy regex tqdm omegaconf pytorch-lightning IPython kornia imageio imageio-ffmpeg einops torch_optimizer wav2clip
```
Or use the ```requirements.txt``` file, which includes version numbers.
Clone required repositories:
```sh
git clone 'https://github.com/nerdyrodent/VQGAN-CLIP'
cd VQGAN-CLIP
git clone 'https://github.com/openai/CLIP'
git clone 'https://github.com/CompVis/taming-transformers'
```
Note: In my development environment both CLIP and taming-transformers are present in the local directory, and so aren't present in the `requirements.txt` or `vqgan.yml` files.
As an alternative, you can also pip install taming-transformers and CLIP.
You will also need at least 1 VQGAN pretrained model. E.g.
```sh
mkdir checkpoints
curl -L -o checkpoints/vqgan_imagenet_f16_16384.yaml -C - 'https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fconfigs%2Fmodel.yaml&dl=1' #ImageNet 16384
curl -L -o checkpoints/vqgan_imagenet_f16_16384.ckpt -C - 'https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fckpts%2Flast.ckpt&dl=1' #ImageNet 16384
```
Note that users of ```curl``` on Microsoft Windows should use double quotes.
The `download_models.sh` script is an optional way to download a number of models. By default, it will download just 1 model.
See <https://github.com/CompVis/taming-transformers#overview-of-pretrained-models> for more information about VQGAN pre-trained models, including download links.
By default, the model .yaml and .ckpt files are expected in the `checkpoints` directory.
See <https://github.com/CompVis/taming-transformers> for more information on datasets and models.
## Making the music video
To generate video from music, please specify your music and the following code examples can be used depending on the need. We provide a sample music file & lyrics file from Yannic Kilcher's [repo](https://github.com/yk/clip_music_video).
If you have a lyrics file with time-stamp information such as the example in 'lyrics/imagenet_song_lyrics.csv', you can make a lyrics-audio guided music video with the following command:
```sh
python generate.py -vid -o outputs/output.png -ap "imagenet_song.mp3" -lyr "lyrics/imagenet_song_lyrics.csv" -gid 2 -ips 100
```
To interpolate between audio representation and text representation, use to following code (gives a more "music video" feeling)
```sh
python generate_interpolate.py -vid -ips 100 -o outputs/output.png -ap "imagenet_song.mp3" -lyr "lyrics/imagenet_song_lyrics.csv" -gid 0
```
If you do not have lyrics information, you can run the following command using only audio prompts:
```sh
python generate.py -vid -o outputs/output.png -ap "imagenet_song.mp3" -gid 2 -ips 100
```
If there was an error with any of the above commands during merging of the video segments, please use combine_mp4.py to separately concat the video segments from the output directory or download the video segments from output directory and manually merge them using video editing software.
## Citations
```bibtex
@misc{unpublished2021clip,
title = {CLIP: Connecting Text and Images},
author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
year = {2021}
}
```
```bibtex
@misc{esser2020taming,
title={Taming Transformers for High-Resolution Image Synthesis},
author={Patrick Esser and Robin Rombach and Björn Ommer},
year={2020},
eprint={2012.09841},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
```bibtex
@article{wu2021wav2clip,
title={Wav2CLIP: Learning Robust Audio Representations From CLIP},
author={Wu, Ho-Hsiang and Seetharaman, Prem and Kumar, Kundan and Bello, Juan Pablo},
journal={arXiv preprint arXiv:2110.11499},
year={2021}
}
```
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
使用 Wav2CLIP 和 VQGAN-CLIP 从任何歌曲制作 AI 生成的音乐视频的存储库。 基本代码源自VQGAN-CLIP 音频的 CLIP 嵌入源自Wav2CLIP 以下链接中提供了描述该机制的技术论文:Music2Video: Automatic Generation of Music Video with fusion of audio and text 更多详情请下载后,阅读README.md文件
资源推荐
资源详情
资源评论
收起资源包目录
music2video-main.zip (14个子文件)
music2video-main
imagenet_song.mp3 5.19MB
concat.py 931B
generate_interpolate.py 45KB
LICENSE 1KB
generate.py 44KB
cog.yaml 568B
download_models.sh 5KB
combine_mp4.py 537B
lyrics
imagenet_song_lyrics.csv 2KB
requirements.txt 1KB
.gitignore 261B
README.md 5KB
list-of-files.txt 11KB
vqgan.yml 2KB
共 14 条
- 1
快撑死的鱼
- 粉丝: 1w+
- 资源: 9149
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- HIVE-14706.01.patch
- C# WInForm IrisSkin2皮肤控件
- svn cleanup 失败怎么办
- Spring Boot集成Spring Security,HTTP请求授权配置:包含匿名访问、允许访问、禁止访问配置
- 易语言-画曲线模块及应用例程
- 电子元件行业知名厂商官网(TI/NXP/ST/Infineon/ADI/Microchip/Qualcomm/Diodes/Panasonic/TDK/TE/Vishay/Molex等)数据样例
- Cytoscape-3-10-0-windows-64bit.exe
- 基于STM32设计的宠物投喂器项目源代码(高分项目).zip
- 机器学习音频训练文件-24年抖音金曲
- 工业以太网无线通信解决方案
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
- 1
- 2
前往页