ViDA-MAN：与数字人类的视觉对话.pdf资源-CSDN文库

版权申诉

95 浏览量 2024-03-31 10:47:18 上传评论收藏 1.9MB PDF 举报

### ViDA-MAN：与数字人类的视觉对话 #### 概述《ViDA-MAN：与数字人类的视觉对话》是一篇介绍ViDA-MAN系统的论文，该系统旨在通过多模态交互技术实现与虚拟数字人类之间的实时视听对话。相较于传统的基于文本或语音的系统，ViDA-MAN提供了更加人性化的交互体验，例如生动的语音、自然的面部表情以及身体动作。本文将详细介绍ViDA-MAN系统的架构、关键技术及其应用场景。 #### 关键技术 1. **声学语音识别（Acoustic Speech Recognition, ASR）** ASR是ViDA-MAN系统的核心技术之一，用于识别用户的即时语音请求。它能够准确地将语音转换为文本，以便后续处理。这一技术的高效性和准确性对于实现流畅的人机对话至关重要。 2. **多轮对话管理** 多轮对话管理技术使ViDA-MAN能够在连续的对话中保持上下文的一致性，从而提供更自然的对话体验。这涉及到对用户意图的理解、上下文信息的维护以及合理的响应生成。 3. **文本转语音（Text To Speech, TTS）** TTS技术用于将文本转换成自然流畅的语音输出。这对于模拟真实人类对话至关重要，能够增强用户的沉浸感。 4. **说话人头像视频生成** 为了进一步提升交互的真实感，ViDA-MAN系统还集成了说话人头像视频生成技术，使得虚拟数字人类不仅能够通过语音交流，还能通过面部表情和肢体语言进行沟通。 5. **知识库支持** ViDA-MAN背后拥有庞大的知识库支撑，使得虚拟数字人类能够与用户就多个主题进行聊天，包括闲聊、天气查询、设备控制、新闻推荐、预订酒店等，并能通过结构化知识回答问题。 #### 应用场景 1. **个人助理** ViDA-MAN可以作为个人助理，在日常生活和工作中为用户提供帮助，如日程安排、提醒服务等。 2. **客户服务** 在客户服务领域，ViDA-MAN可以替代传统的人工客服，提供24小时不间断的服务，解答客户咨询，提高客户满意度。 3. **娱乐互动** 在娱乐领域，ViDA-MAN可以通过虚拟现实或增强现实技术，为用户提供沉浸式的娱乐体验，如虚拟导游、游戏伴侣等。 4. **教育培训** 在教育领域，ViDA-MAN可以作为一种新的教学辅助工具，通过互动的方式帮助学生学习新知识，提高学习效率。 5. **医疗健康** 在医疗健康领域，ViDA-MAN可以帮助患者了解疾病信息、指导用药方法等，为医生与患者之间搭建沟通桥梁。 #### 结论 ViDA-MAN是一个集成了多种先进技术和算法的数字人类系统，通过多模态交互技术实现了与用户之间更加自然、真实、高效的沟通。未来，随着人工智能技术的发展，ViDA-MAN将在更多领域发挥重要作用，为人们的生活带来便利和乐趣。

资源推荐

资源详情

资源评论

ViDA-MAN: Visual Dialog with Digital Humans

Tong Shen

, Jiawei Zuo

, Fan Shi

, Jin Zhang

, Liqin Jiang

Meng Chen

, Zhengchen Zhang

, Wei Zhang

, Xiaodong He

, Tao Mei

JD AI Research, Beijing, China,

Migu Culture Technology, Beijing, China

ABSTRACT

We demonstrate ViDA-MAN, a digital-human agent for multi-modal

interaction, which oers realtime audio-visual responses to instant

speech inquiries. Compared to traditional text or voice-based sys-

tem, ViDA-MAN oers human-like interactions (e.g, vivid voice,

natural facial expression and body gestures). Given a speech re-

quest, the demonstration is able to response with high quality

videos in sub-second latency. To deliver immersive user experi-

ence, ViDA-MAN seamlessly integrates multi-modal techniques

including Acoustic Speech Recognition (ASR), multi-turn dialog,

Text To Speech (TTS), talking heads video generation. Backed with

large knowledge base, ViDA-MAN is able to chat with users on a

number of topics including chit-chat, weather, device control, News

recommendations, booking hotels, as well as answering questions

via structured knowledge.

CCS CONCEPTS

• Computing methodologies → Computer vision

;

Natural lan-

guage processing

;

• Human-centered computing → Human

computer interaction (HCI).

KEYWORDS

Multimodal Interaction, Digital Human, Dialog System, Speech

Recognition, Text to Speech, Talking-head Generation

ACM Reference Format:

Tong Shen

, Jiawei Zuo

, Fan Shi

, Jin Zhang

, Liqin Jiang

,, Meng Chen

Zhengchen Zhang

, Wei Zhang

, Xiaodong He

, Tao Mei

. 2021. ViDA-

MAN: Visual Dialog with Digital Humans. In Proceedings of the 29th ACM

Int’l Conference on Multimedia (MM ’21), Oct. 20–24, 2021, Virtual Event,

China. ACM, New York, NY, USA, 3 pages. https://doi.org/10.1145/3474085.

3478560

1 INTRODUCTION

Digital humans are virtual avatars backed by Articial Intelligence

(AI), which are designed to behave like a real human. Agents pow-

ered by such systems can be applied in a wide range of scenarios

such as personal assistant, customer service and News broadcasting.

In this paper, we present ViDA-MAN, a multi-modal interaction

system for digital humans. The system is complex by nature, inte-

grating multimodal techniques such as ASR, TTS, dialog system,

visual synthesis.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

MM ’21, October 20–24, 2021, Virtual Event, China.

ACM ISBN 978-1-4503-8651-7/21/10... $15.00

https://doi.org/10.1145/3474085.3478560

Figure 1: Illustration of our interactive demo. The digital

characters have realistic appearance, voice, natural facial ex-

pression and body motions, oering lifelike interaction ex-

periences.

One of the core parts of a digital human system is the ability

to receive signals from the user and output the corresponding

feedback, which can be viewed as a chatbot. We develop a multi-

type spoken dialogue system that can handle user requests by

multiple dialog skills, such as chit-chat, task-oriented dialog, and

question answering based on knowledge graph.

What makes our ViDA-MAN dierent from a pure chatbot is its

concrete visual appearance and voice, which expresses far more

information than a pure text-based system, e.g. body language

or facial expressions. The voice is empowered by a high quality

TTS system, consisting a novel Duration Informed Auto-regressive

Network (DIAN)[

] and a speaker-specic neural vocoder LPCNet.

The appearance is powered by neural rendering techniques [

–

]. Dierent from a graphics rendering engines [

neural renderers do not require a specic high-quality 3D model

and are able to produce far more realistic visual results. Figure 1

demonstrates some examples. In this paper, we present our digital

human system, ViDA-MAN, to draw more attention on multi-modal

interaction systems.

2 SYSTEM ARCHITECTURE

The whole system is designed to pursue low latency and high visual

delity, seeking intelligent and real-time interactions with a lifelike

digital character. As shown in Figure 2, the system mainly consists

of six modules. The system accepts human voice by an ASR module

and feeds it to a dialog system. The response is further translated to

realistic voice using TTS. A driving system and a rendering system

are responsible for updating the appearance. A streaming service

is adopted to integrate everything and encode it into media stream

back to the user.

arXiv:2110.13384v1 [cs.CV] 26 Oct 2021

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余2页未读，立即下载

评论收藏

内容反馈

版权申诉

百态老人

粉丝: 6952
资源: 2万+

ViDA-MAN：与数字人类的视觉对话.pdf

可视化数字人体.pdf

可视化数字人体.docx

digital-human

虚拟数字人白皮书.pdf

vida-financeira:网站控制财务生活

Hoja-de-vida---Carlos:练习html

David Garrett - 2018 - Unlimited - Greatest Hits.rar

Jogo-da-vida-lp:客观的

trybe-exercises：Trybe锻炼，尝试性锻炼

vida-typing:维达打字应用程序

trybe-exercises:尝试软件存储库

ciclo-vida-activity-android:理解活动生命周期的例子

VIDA诊断和车辆通讯-学员用书.pdf

trybe-exercises:Trybe的每日练习！

trybe-exercises:软件仓库，试用软件，软件仓库

trybe-exercises：尝试软件存储库，可以在软件库中运行。

trybe-exercises-layo:埃里佩克·埃普里西·埃普雷西·埃普利西·德·特里比

汽配跨境电商行业：外贸汽车配件热销产品.docx

kata-01-el-juego-de-la-vida-final-equipo-2:GitHub教室创建的kata-01-el-juego-de-la-vida-final-equipo-2

vida-de-horista:Flutter移动应用程序进行小时工资计算

kata-02-el-juego-de-la-vida-nov20-rn-team-02：kata-02-el-juego-de-la-vida-nov20-rn-team-02由GitHub Classroom创建

vida-react-cube：使用vidajs和cubejs的示例React应用程序

Hoja-de-vida:第一个HTML研讨会

Doe-Sangue-Doe-Vida:初次使用FullStack

Mision-Vida-Oficial:社会责任组织的人与人之间的关系。通知。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。 萨拉维瓦宣誓书

paguina-sembrando-vida:为学校创建的页面

proyecto-vida:这是我的生活项目Web版本，带有HTML和CSS

索贝VIDA分布式视频处理框架产品通过权威认证.pdf

next-level-week-4:Um app在下一个第四周的ReactJS中增加了criado durante

最新资源

Mision-Vida-Oficial:社会责任组织的人与人之间的关系。通知。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。萨拉维瓦宣誓书