多模态大模型-基于CLIP实现的人体动作生成-附项目源码+流程教程-优质项目实战.zip

共63个文件

py：54个

sh：2个

csv：2个

版权申诉

CLIP

人体动作生成

项目源码

优质项目

76 浏览量 2024-10-16 16:07:57 上传评论收藏 449KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

多模态大模型_基于CLIP实现的人体动作生成_附项目源码+流程教程_优质项目实战.zip （63个子文件）

多模态大模型_基于CLIP实现的人体动作生成_附项目源码+流程教程_优质项目实战

download_smpl_files.sh 261B

assets

paper_edits.csv 232B

paper_interps.csv 96B

paper_texts.txt 267B

src

__init__.py 0B

train

__init__.py 0B

trainer.py 2KB

duration_finetunning.py 2KB

train.py 2KB

utils

__init__.py 0B

PYTORCH3D_LICENSE 2KB

tensors.py 2KB

get_model_and_data.py 1KB

fixseed.py 297B

misc.py 885B

video.py 866B

action_classifier.py 3KB

demo.py 5KB

action_label_to_idx.py 3KB

rotation_conversions.py 18KB

datasets

__init__.py 0B

amass_parser.py 13KB

dataset.py 15KB

smpl_utils.py 7KB

get_dataset.py 955B

tools.py 429B

amass.py 10KB

parser

evaluation.py 1KB

__init__.py 0B

checkpoint.py 3KB

training.py 2KB

recognition.py 1KB

dataset.py 2KB

model.py 3KB

generate.py 2KB

finetunning.py 1KB

tools.py 375B

base.py 1KB

visualize.py 4KB

visualize

__init__.py 0B

motion_interpolation.py 1KB

motion_editing.py 1KB

anim.py 5KB

visualize.py 19KB

text2motion.py 1KB

models

__init__.py 0B

architectures

__init__.py 0B

transformer.py 8KB

tools

__init__.py 0B

losses.py 2KB

mmd.py 712B

hessian_penalty.py 7KB

jointstypes.py 59B

smpl.py 4KB

get_model.py 591B

rotation2xyz.py 3KB

modeltype

__init__.py 0B

motionclip.py 12KB

config.py 1KB

visuals

clouds_white_bg.png 406KB

prepare

download_smpl_files.sh 262B

environment.yml 3KB

README.md 5KB

# MotionCLIP ![teaser](visuals/clouds_white_bg.png) ## Getting started ### 1. Create conda environment ``` conda env create -f environment.yml conda activate motionclip ``` The code was tested on Python 3.8 and PyTorch 1.8.1. ### 2. Download data **NEW! Download the parsed data directly** [Parsed AMASS dataset](https://drive.google.com/drive/folders/18guyyud1iobxASZxoGe-798mOxNBKGWf?usp=sharing) -> `./data/amass_db` <details> <summary><b>If you prefer to parse the data yourself, follow this:</b></summary> Download and unzip the above datasets and place them correspondingly: * [AMASS](https://amass.is.tue.mpg.de/) -> `./data/amass` (Download the SMPL+H version for each dataset separately, please note to download ALL the dataset in AMASS website) * [BABEL](https://babel.is.tue.mpg.de/) -> `./data/babel_v1.0_release` * [Rendered AMASS images](https://drive.google.com/file/d/1F8VLY4AC2XPaV3DqKZefQJNWn4KY2z_c/view?usp=sharing) -> `./data/render` Then, process the three datasets into a unified dataset with `(text, image, motion)` triplets: To parse acording to the AMASS split (for all applications except action recognition), run: ```bash python -m src.datasets.amass_parser --dataset_name amass ``` **Only if** you intend to use **Action Recognition**, run also: ```bash python -m src.datasets.amass_parser --dataset_name babel ``` </details> ### 3. Download the SMPL body model ```bash bash prepare/download_smpl_files.sh ``` This will download the SMPL neutral model from this [**github repo**](https://github.com/classner/up/blob/master/models/3D/basicModel_neutral_lbs_10_207_0_v1.0.0.pkl) and additionnal files. In addition, download the **Extended SMPL+H model** (used in AMASS project) from [MANO](https://mano.is.tue.mpg.de/), and place it in `./models/smplh`. ## Using the pretrained model First, [download the model](https://drive.google.com/file/d/1VTIN0kJd2-0NW1sKckKgXddwl4tFZVDp/view?usp=sharing) and place it at `./exps/paper-model` ### 1. Text-to-Motion To reproduce paper results, run: ```bash python -m src.visualize.text2motion ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_texts.txt ``` To run MotionCLIP with your own texts, create a text file, with each line depicts a different text input (see `paper_texts.txt` as a reference) and point to it with `--input_file` instead. ### 2. Vector Editing To reproduce paper results, run: ```bash python -m src.visualize.motion_editing ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_edits.csv ``` To gain the input motions, we support two modes: * `data` - Retrieve motions from train/validation sets, according to their textual label. On it first run, `src.visualize.motion_editing` generates a file containing a list of all textual labels. You can look it up and choose motions for your own editing. * `text` - The inputs are free texts, instead of motions. We use CLIP text encoder to get CLIP representations, perform vector editing, then use MotionCLIP decoder to output the edited motion. To run MotionCLIP on your own editing, create a csv file, with each line depicts a different edit (see `paper_edits.csv` as a reference) and point to it with `--input_file` instead. ### 3. Interpolation To reproduce paper results, run: ```bash python -m src.visualize.motion_interpolation ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_interps.csv ``` To gain the input motions, we use the `data` mode described earlier. To run MotionCLIP on your own interpolations, create a csv file, with each line depicts a different interpolation (see `paper_interps.csv` as a reference) and point to it with `--input_file` instead. ### 4. Action Recognition For action recognition, we use a model trained on text class names. [Download](https://drive.google.com/file/d/1koQMhpqmoffIB0C0P99a8l23YLGfthJ4/view?usp=sharing) and place it at `./exps/classes-model`. ```bash python -m src.utils.action_classifier ./exps/classes-model/checkpoint_0200.pth.tar ``` ## Train your own To reproduce `paper-model` run: ```bash python -m src.train.train --clip_text_losses cosine --clip_image_losses cosine --pose_rep rot6d \ --lambda_vel 100 --lambda_rc 100 --lambda_rcxyz 100 \ --jointstype vertices --batch_size 20 --num_frames 60 --num_layers 8 \ --lr 0.0001 --glob --translation --no-vertstrans --latent_dim 512 --num_epochs 100 --snapshot 10 \ --device <GPU DEVICE ID> \ --dataset amass \ --datapath ./data/amass_db/amass_30fps_db.pt \ --folder ./exps/my-paper-model ``` To reproduce `classes-model` run: ```bash python -m src.train.train --clip_text_losses cosine --clip_image_losses cosine --pose_rep rot6d \ --lambda_vel 95 --lambda_rc 95 --lambda_rcxyz 95 \ --jointstype vertices --batch_size 20 --num_frames 60 --num_layers 8 \ --lr 0.0001 --glob --translation --no-vertstrans --latent_dim 512 --num_epochs 200 --snapshot 10 \ --device <GPU DEVICE ID> \ --dataset babel \ --datapath ./data/amass_db/babel_30fps_db.pt \ --folder ./exps/my-classes-model ```

评论收藏

内容反馈

版权申诉