【免费】LiveSpeechPortrait是一个人脸表情识别的技术，它可以通过分析人脸的表情和动作，来判断人的情绪状态和心理特征

共110个文件

py：35个

pyc：26个

sample：11个

人工智能

机器学习

需积分: 0 28 浏览量 2023-08-23 19:56:09 上传评论收藏 65.02MB RAR 举报

资源推荐

资源详情

资源评论

收起资源包目录

LiveSpeechPortrait是一个人脸表情识别的技术，它可以通过分析人脸的表情和动作，来判断人的情绪状态和心理特征（110个子文件）

w_feature_maps.avi 5.76MB

00083_feature_maps.avi 4.92MB

w.avi 3.55MB

yiyangqianxi_feature_maps.avi 3.24MB

lew_feature_maps.avi 3.14MB

00083.avi 3.11MB

yiyangqianxi.avi 2.05MB

lew.avi 1.88MB

config 273B

description 73B

exclude 240B

HEAD 190B

HEAD 30B

HEAD 21B

pack-9b4b91f65fac5c350cad0f787adcb7fe18236641.idx 5KB

index 4KB

Teaser.jpg 1.25MB

LICENSE 1KB

main 190B

main 41B

README.md 5KB

pack-9b4b91f65fac5c350cad0f787adcb7fe18236641.pack 1.65MB

packed-refs 112B

networks.py 37KB

face_dataset.py 17KB

audiovisual_dataset.py 16KB

predict.py 15KB

audio_funcs.py 15KB

demo.py 13KB

utils.py 13KB

base_model.py 12KB

losses.py 11KB

feature2face_model.py 10KB

base_options_audio2headpose.py 10KB

audio2headpose_model.py 10KB

base_options_audio2feature.py 9KB

base_options_feature2face.py 7KB

visualizer.py 6KB

__init__.py 6KB

audio2feature_model.py 6KB

train_feature2face_options.py 6KB

flow_viz.py 4KB

audio2headpose.py 4KB

get_data.py 4KB

__init__.py 4KB

audio2feature.py 3KB

train_audio2headpose_options.py 3KB

train_audio2feature_options.py 3KB

util.py 3KB

base_dataset.py 2KB

html.py 2KB

feature2face_G.py 1KB

feature2face_D.py 1KB

image_pool.py 1KB

test_audio2feature_options.py 887B

test_audio2headpose_options.py 886B

test_feature2face_options.py 509B

__init__.py 137B

networks.cpython-36.pyc 24KB

face_dataset.cpython-36.pyc 11KB

utils.cpython-36.pyc 11KB

audio_funcs.cpython-36.pyc 11KB

base_model.cpython-36.pyc 10KB

losses.cpython-36.pyc 9KB

base_options_audio2headpose.cpython-36.pyc 7KB

audio2headpose_model.cpython-36.pyc 6KB

feature2face_model.cpython-36.pyc 6KB

base_options_audio2feature.cpython-36.pyc 6KB

__init__.cpython-36.pyc 6KB

base_options_feature2face.cpython-36.pyc 5KB

audio2feature_model.cpython-36.pyc 5KB

__init__.cpython-36.pyc 4KB

visualizer.cpython-36.pyc 4KB

flow_viz.cpython-36.pyc 3KB

util.cpython-36.pyc 3KB

audio2headpose.cpython-36.pyc 3KB

base_dataset.cpython-36.pyc 3KB

html.cpython-36.pyc 2KB

audio2feature.cpython-36.pyc 2KB

feature2face_G.cpython-36.pyc 1KB

test_audio2headpose_options.cpython-36.pyc 1007B

test_audio2feature_options.cpython-36.pyc 1005B

test_feature2face_options.cpython-36.pyc 786B

__init__.cpython-36.pyc 284B

pre-rebase.sample 5KB

update.sample 4KB

fsmonitor-watchman.sample 3KB

pre-commit.sample 2KB

prepare-commit-msg.sample 1KB

pre-push.sample 1KB

commit-msg.sample 896B

pre-receive.sample 544B

applypatch-msg.sample 478B

pre-applypatch.sample 424B

post-update.sample 189B

m.tar 38.42MB

archive.tar 8.03MB

yi.tar 5.3MB

lew.tar 5.02MB

共 110 条

# Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation This repository contains the implementation of the following paper: > **Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation** > > Yuanxun Lu, [Jinxiang Chai](https://scholar.google.com/citations?user=OcN1_gwAAAAJ&hl=zh-CN&oi=ao), [Xun Cao](https://cite.nju.edu.cn/People/Faculty/20190621/i5054.html) *(SIGGRAPH Asia 2021)* > > **Abstract**: To the best of our knowledge, we first present a live system that generates personalized photorealistic talking-head animation only driven by audio signals at over 30 fps. Our system contains three stages. The first stage is a deep neural network that extracts deep audio features along with a manifold projection to project the features to the target person's speech space. In the second stage, we learn facial dynamics and motions from the projected audio features. The predicted motions include head poses and upper body motions, where the former is generated by an autoregressive probabilistic model which models the head pose distribution of the target person. Upper body motions are deduced from head poses. In the final stage, we generate conditional feature maps from previous predictions and send them with a candidate image set to an image-to-image translation network to synthesize photorealistic renderings. Our method generalizes well to wild audio and successfully synthesizes high-fidelity personalized facial details, e.g., wrinkles, teeth. Our method also allows explicit control of head poses. Extensive qualitative and quantitative evaluations, along with user studies, demonstrate the superiority of our method over state-of-the-art techniques. > > [[Project Page]](https://yuanxunlu.github.io/projects/LiveSpeechPortraits/) [[Paper]](https://yuanxunlu.github.io/projects/LiveSpeechPortraits/resources/SIGGRAPH_Asia_2021__Live_Speech_Portraits__Real_Time_Photorealistic_Talking_Head_Animation.pdf) [[Arxiv]](https://arxiv.org/abs/2109.10595) [[Web Demo]](https://replicate.ai/yuanxunlu/livespeechportraits) ![Teaser](./doc/Teaser.jpg) Figure 1. Given an arbitrary input audio stream, our system generates personalized and photorealistic talking-head animation in real-time. Right: May and Obama are driven by the same utterance but present different speaking characteristics. <a href="https://replicate.ai/yuanxunlu/livespeechportraits"><img src="https://img.shields.io/static/v1?label=Replicate&message=Demo and Docker Image&color=blue"></a> ## Requirements - This project is successfully trained and tested on Windows10 with PyTorch 1.7 (Python 3.6). Linux and lower version PyTorch should also work (not tested). We recommend creating a new environment: ``` conda create -n LSP python=3.6 conda activate LSP ``` - Clone the repository: ``` git clone https://github.com/YuanxunLu/LiveSpeechPortraits.git cd LiveSpeechPortraits ``` - FFmpeg is required to combine the audio and the silent generated videos. Please check [FFmpeg](http://ffmpeg.org/download.html) for installation. For Linux users, you can also: ``` sudo apt-get install ffmpeg ``` - Install the dependences: ``` pip install -r requirements.txt ``` ## Demo - Download the pre-trained models and data from [Google Drive](https://drive.google.com/drive/folders/1sHc2xEEGwnb0h2rkUhG9sPmOxvRvPVpJ?usp=sharing) to the `data` folder. Five subjects data are released (May, Obama1, Obama2, Nadella and McStay). - Run the demo: ``` python demo.py --id May --driving_audio ./data/Input/00083.wav --device cuda ``` Results can be found under the `results` folder. - **(New!) Docker and Web Demo** We are really grateful to [Andreas](https://github.com/andreasjansson) from [Replicate](https://replicate.ai/home) for his amazing job to make the web demo! Now you can run the [Demo](https://replicate.ai/yuanxunlu/livespeechportraits) on the browser. ## Citation If you find this project useful for your research, please consider citing: ``` @article{lu2021live, author = {Lu, Yuanxun and Chai, Jinxiang and Cao, Xun}, title = {{Live Speech Portraits}: Real-Time Photorealistic Talking-Head Animation}, journal = {ACM Transactions on Graphics}, numpages = {17}, volume={40}, number={6}, month = December, year = {2021}, doi={10.1145/3478513.3480484} } ``` ## Acknowledgment - This repo was built based on the framework of [pix2pix-pytorch](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix). - Thanks the authors of [MakeItTalk](https://github.com/adobe-research/MakeItTalk), [ATVG](https://github.com/lelechen63/ATVGnet), [RhythmicHead](https://github.com/lelechen63/Talking-head-Generation-with-Rhythmic-Head-Motion), [Speech-Driven Animation](https://github.com/DinoMan/speech-driven-animation) for making their excellent work and codes publicly available. - Thanks [Andreas](https://github.com/andreasjansson) for the efforts of the web demo.

评论收藏

内容反馈