# Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation
This repository contains the implementation of the following paper:
> **Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation**
>
> Yuanxun Lu, [Jinxiang Chai](https://scholar.google.com/citations?user=OcN1_gwAAAAJ&hl=zh-CN&oi=ao), [Xun Cao](https://cite.nju.edu.cn/People/Faculty/20190621/i5054.html) *(SIGGRAPH Asia 2021)*
>
> **Abstract**: To the best of our knowledge, we first present a live system that generates personalized photorealistic talking-head animation only driven by audio signals at over 30 fps. Our system contains three stages. The first stage is a deep neural network that extracts deep audio features along with a manifold projection to project the features to the target person's speech space. In the second stage, we learn facial dynamics and motions from the projected audio features. The predicted motions include head poses and upper body motions, where the former is generated by an autoregressive probabilistic model which models the head pose distribution of the target person. Upper body motions are deduced from head poses. In the final stage, we generate conditional feature maps from previous predictions and send them with a candidate image set to an image-to-image translation network to synthesize photorealistic renderings. Our method generalizes well to wild audio and successfully synthesizes high-fidelity personalized facial details, e.g., wrinkles, teeth. Our method also allows explicit control of head poses. Extensive qualitative and quantitative evaluations, along with user studies, demonstrate the superiority of our method over state-of-the-art techniques.
>
> [[Project Page]](https://yuanxunlu.github.io/projects/LiveSpeechPortraits/) [[Paper]](https://yuanxunlu.github.io/projects/LiveSpeechPortraits/resources/SIGGRAPH_Asia_2021__Live_Speech_Portraits__Real_Time_Photorealistic_Talking_Head_Animation.pdf) [[Arxiv]](https://arxiv.org/abs/2109.10595) [[Web Demo]](https://replicate.ai/yuanxunlu/livespeechportraits)
![Teaser](./doc/Teaser.jpg)
Figure 1. Given an arbitrary input audio stream, our system generates personalized and photorealistic talking-head animation in real-time. Right: May and Obama are driven by the same utterance but present different speaking characteristics.
<a href="https://replicate.ai/yuanxunlu/livespeechportraits"><img src="https://img.shields.io/static/v1?label=Replicate&message=Demo and Docker Image&color=blue"></a>
## Requirements
- This project is successfully trained and tested on Windows10 with PyTorch 1.7 (Python 3.6). Linux and lower version PyTorch should also work (not tested). We recommend creating a new environment:
```
conda create -n LSP python=3.6
conda activate LSP
```
- Clone the repository:
```
git clone https://github.com/YuanxunLu/LiveSpeechPortraits.git
cd LiveSpeechPortraits
```
- FFmpeg is required to combine the audio and the silent generated videos. Please check [FFmpeg](http://ffmpeg.org/download.html) for installation. For Linux users, you can also:
```
sudo apt-get install ffmpeg
```
- Install the dependences:
```
pip install -r requirements.txt
```
## Demo
- Download the pre-trained models and data from [Google Drive](https://drive.google.com/drive/folders/1sHc2xEEGwnb0h2rkUhG9sPmOxvRvPVpJ?usp=sharing) to the `data` folder. Five subjects data are released (May, Obama1, Obama2, Nadella and McStay).
- Run the demo:
```
python demo.py --id May --driving_audio ./data/Input/00083.wav --device cuda
```
Results can be found under the `results` folder.
- **(New!) Docker and Web Demo**
We are really grateful to [Andreas](https://github.com/andreasjansson) from [Replicate](https://replicate.ai/home) for his amazing job to make the web demo! Now you can run the [Demo](https://replicate.ai/yuanxunlu/livespeechportraits) on the browser.
## Citation
If you find this project useful for your research, please consider citing:
```
@article{lu2021live,
author = {Lu, Yuanxun and Chai, Jinxiang and Cao, Xun},
title = {{Live Speech Portraits}: Real-Time Photorealistic Talking-Head Animation},
journal = {ACM Transactions on Graphics},
numpages = {17},
volume={40},
number={6},
month = December,
year = {2021},
doi={10.1145/3478513.3480484}
}
```
## Acknowledgment
- This repo was built based on the framework of [pix2pix-pytorch](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix).
- Thanks the authors of [MakeItTalk](https://github.com/adobe-research/MakeItTalk), [ATVG](https://github.com/lelechen63/ATVGnet), [RhythmicHead](https://github.com/lelechen63/Talking-head-Generation-with-Rhythmic-Head-Motion), [Speech-Driven Animation](https://github.com/DinoMan/speech-driven-animation) for making their excellent work and codes publicly available.
- Thanks [Andreas](https://github.com/andreasjansson) for the efforts of the web demo.
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
LiveSpeechPortrait是一种基于人脸表情识别的技术,它可以通过分析人脸的表情和动作,来判断人的情绪状态和心理特征。这项技术利用计算机视觉和机器学习的方法,对人脸图像进行处理和分析,从而准确地识别人的情感状态,包括喜怒哀乐、惊讶、厌恶等。通过对人的表情进行识别和分析,LiveSpeechPortrait可以帮助我们更好地理解人的情感反应和心理状态。 LiveSpeechPortrait的应用领域非常广泛。在情感识别方面,它可以应用于人机交互和情感计算领域,例如智能助理、虚拟现实和增强现实等技术中,通过识别用户的情绪状态,提供更加智能和个性化的服务。在用户体验研究方面,LiveSpeechPortrait可以帮助企业和研究机构了解消费者对产品和服务的真实反应,从而改进产品设计和市场营销策略。 此外,LiveSpeechPortrait还可以应用于市场调研和广告评估。通过分析人们对广告的表情反应,可以评估广告的效果和吸引力,为广告主提供更加精准的广告投放策略。在医疗领域,LiveSpeechPortrait也可以用于情绪识别和心理健康评估,帮助医生更好地了解患者的情感状态。
资源推荐
资源详情
资源评论
收起资源包目录
LiveSpeechPortrait是一个人脸表情识别的技术,它可以通过分析人脸的表情和动作,来判断人的情绪状态和心理特征 (110个子文件)
w_feature_maps.avi 5.76MB
00083_feature_maps.avi 4.92MB
w.avi 3.55MB
yiyangqianxi_feature_maps.avi 3.24MB
lew_feature_maps.avi 3.14MB
00083.avi 3.11MB
yiyangqianxi.avi 2.05MB
lew.avi 1.88MB
config 273B
description 73B
exclude 240B
HEAD 190B
HEAD 190B
HEAD 30B
HEAD 21B
pack-9b4b91f65fac5c350cad0f787adcb7fe18236641.idx 5KB
index 4KB
Teaser.jpg 1.25MB
LICENSE 1KB
main 190B
main 41B
README.md 5KB
pack-9b4b91f65fac5c350cad0f787adcb7fe18236641.pack 1.65MB
packed-refs 112B
networks.py 37KB
face_dataset.py 17KB
audiovisual_dataset.py 16KB
predict.py 15KB
audio_funcs.py 15KB
demo.py 13KB
utils.py 13KB
base_model.py 12KB
losses.py 11KB
feature2face_model.py 10KB
base_options_audio2headpose.py 10KB
audio2headpose_model.py 10KB
base_options_audio2feature.py 9KB
base_options_feature2face.py 7KB
visualizer.py 6KB
__init__.py 6KB
audio2feature_model.py 6KB
train_feature2face_options.py 6KB
flow_viz.py 4KB
audio2headpose.py 4KB
get_data.py 4KB
__init__.py 4KB
audio2feature.py 3KB
train_audio2headpose_options.py 3KB
train_audio2feature_options.py 3KB
util.py 3KB
base_dataset.py 2KB
html.py 2KB
feature2face_G.py 1KB
feature2face_D.py 1KB
image_pool.py 1KB
test_audio2feature_options.py 887B
test_audio2headpose_options.py 886B
test_feature2face_options.py 509B
__init__.py 137B
networks.cpython-36.pyc 24KB
face_dataset.cpython-36.pyc 11KB
utils.cpython-36.pyc 11KB
audio_funcs.cpython-36.pyc 11KB
base_model.cpython-36.pyc 10KB
losses.cpython-36.pyc 9KB
base_options_audio2headpose.cpython-36.pyc 7KB
audio2headpose_model.cpython-36.pyc 6KB
feature2face_model.cpython-36.pyc 6KB
base_options_audio2feature.cpython-36.pyc 6KB
__init__.cpython-36.pyc 6KB
base_options_feature2face.cpython-36.pyc 5KB
audio2feature_model.cpython-36.pyc 5KB
__init__.cpython-36.pyc 4KB
visualizer.cpython-36.pyc 4KB
flow_viz.cpython-36.pyc 3KB
util.cpython-36.pyc 3KB
audio2headpose.cpython-36.pyc 3KB
base_dataset.cpython-36.pyc 3KB
html.cpython-36.pyc 2KB
audio2feature.cpython-36.pyc 2KB
feature2face_G.cpython-36.pyc 1KB
test_audio2headpose_options.cpython-36.pyc 1007B
test_audio2feature_options.cpython-36.pyc 1005B
test_feature2face_options.cpython-36.pyc 786B
__init__.cpython-36.pyc 284B
pre-rebase.sample 5KB
update.sample 4KB
fsmonitor-watchman.sample 3KB
pre-commit.sample 2KB
prepare-commit-msg.sample 1KB
pre-push.sample 1KB
commit-msg.sample 896B
pre-receive.sample 544B
applypatch-msg.sample 478B
pre-applypatch.sample 424B
post-update.sample 189B
m.tar 38.42MB
archive.tar 8.03MB
yi.tar 5.3MB
lew.tar 5.02MB
共 110 条
- 1
- 2
资源评论
LewGarben
- 粉丝: 1w+
- 资源: 2
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 论文(最终)_20240430235101.pdf
- 基于python编写的Keras深度学习框架开发,利用卷积神经网络CNN,快速识别图片并进行分类
- 最全空间计量实证方法(空间杜宾模型和检验以及结果解释文档).txt
- 5uonly.apk
- 蓝桥杯Python组的历年真题
- 2023-04-06-项目笔记 - 第一百十九阶段 - 4.4.2.117全局变量的作用域-117 -2024.04.30
- 2023-04-06-项目笔记 - 第一百十九阶段 - 4.4.2.117全局变量的作用域-117 -2024.04.30
- 前端开发技术实验报告:内含4四实验&实验报告
- Highlight Plus v20.0.1
- 林周瑜-论文.docx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功