# Digital Human Intelligent Dialogue System - Linly-Talker â 'Interactive Dialogue with Your Virtual Self'
<div align="center">
<h1>Linly-Talker WebUI</h1>
[![madewithlove](https://img.shields.io/badge/made_with-%E2%9D%A4-red?style=for-the-badge&labelColor=orange)](https://github.com/Kedreamix/Linly-Talker)
<img src="docs/linly_logo.png" /><br>
[![Open In Colab](https://img.shields.io/badge/Colab-F9AB00?style=for-the-badge&logo=googlecolab&color=525252)](https://colab.research.google.com/github/Kedreamix/Linly-Talker/blob/main/colab_webui.ipynb)
[![Licence](https://img.shields.io/badge/LICENSE-MIT-green.svg?style=for-the-badge)](https://github.com/Kedreamix/Linly-Talker/blob/main/LICENSE)
[![Huggingface](https://img.shields.io/badge/ð¤%20-Models%20Repo-yellow.svg?style=for-the-badge)](https://huggingface.co/Kedreamix/Linly-Talker)
[**English**](./README.md) | [**ä¸æç®ä½**](./README_zh.md)
</div>
**2023.12 Update** ð
**Users can upload any images for the conversation**
**2024.01 Update** ðð
- **Exciting news! I've now incorporated both the powerful GeminiPro and Qwen large models into our conversational scene. Users can now upload images during the conversation, adding a whole new dimension to the interactions.**
- **The deployment invocation method for FastAPI has been updated.**
- **The advanced settings options for Microsoft TTS have been updated, increasing the variety of voice types. Additionally, video subtitles have been introduced to enhance visualization.**
- **Updated the GPT multi-turn conversation system to establish contextual connections in dialogue, enhancing the interactivity and realism of the digital persona.**
**2024.02 Update** ð
- **Updated Gradio to the latest version 4.16.0, providing the interface with additional functionalities such as capturing images from the camera to create digital personas, among others.**
- **ASR and THG have been updated. FunASR from Alibaba has been integrated into ASR, enhancing its speed significantly. Additionally, the THG section now incorporates the Wav2Lip model, while ER-NeRF is currently in preparation (Coming Soon).**
- **I have incorporated the GPT-SoVITS model, which is a voice cloning method. By fine-tuning it with just one minute of a person's speech data, it can effectively clone their voice. The results are quite impressive and worth recommending.**
- **I have integrated a web user interface (WebUI) that allows for better execution of Linly-Talker.**
**2024.04 Update** ð
- **Updated the offline mode for Paddle TTS, excluding Edge TTS.**
- **Updated ER-NeRF as one of the choices for Avatar generation.**
- **Updated app_talk.py to allow for the free upload of voice and images/videos for generation without being based on a dialogue scenario.**
**2024.05 Update** ð
- **Updated the beginner-friendly AutoDL deployment tutorial, and also updated the codewithgpu image, allowing for one-click experience and learning.**
- **Updated WebUI.py: Linly-Talker WebUI now supports multiple modules, multiple models, and multiple options**
**2024.06 Update** ð
- **Integrated MuseTalk into Linly-Talker and updated the WebUI, enabling basic real-time conversation capabilities.**
- **The refined WebUI defaults to not loading the LLM model to reduce GPU memory usage. It directly responds with text to complete voiceovers. The enhanced WebUI features three main functions: personalized character generation, multi-turn intelligent dialogue with digital humans, and real-time MuseTalk conversations. These improvements reduce previous GPU memory redundancies and add more prompts to assist users effectively.**
**2024.08 Update** ð
- **Updated CosyVoice to offer high-quality text-to-speech (TTS) functionality and voice cloning capabilities; also upgraded to Wav2Lipv2 to enhance overall performance.**
**2024.09 Update** ð
- **Added Linly-Talker API documentation, providing detailed interface descriptions to help users access Linly-Talkerâs features via the API.**
---
<details>
<summary>Content</summary>
<!-- TOC -->
- [Digital Human Intelligent Dialogue System - Linly-Talker â 'Interactive Dialogue with Your Virtual Self'](#digital-human-intelligent-dialogue-system---linly-talker--interactive-dialogue-with-your-virtual-self)
- [Introduction](#introduction)
- [TO DO LIST](#to-do-list)
- [Example](#example)
- [Setup Environment](#setup-environment)
- [API Documentation](#api-documentation)
- [ASR - Speech Recognition](#asr---speech-recognition)
- [Whisper](#whisper)
- [FunASR](#funasr)
- [Coming Soon](#coming-soon)
- [TTS - Text To Speech](#tts---text-to-speech)
- [Edge TTS](#edge-tts)
- [PaddleTTS](#paddletts)
- [Coming Soon](#coming-soon-1)
- [Voice Clone](#voice-clone)
- [GPT-SoVITSï¼Recommendï¼](#gpt-sovitsrecommend)
- [XTTS](#xtts)
- [CosyVoice](#cosyvoice)
- [Coming Soon](#coming-soon-2)
- [THG - Avatar](#thg---avatar)
- [SadTalker](#sadtalker)
- [Wav2Lip](#wav2lip)
- [Wav2Lipv2](#wav2lipv2)
- [ER-NeRF](#er-nerf)
- [MuseTalk](#musetalk)
- [Coming Soon](#coming-soon-3)
- [LLM - Conversation](#llm---conversation)
- [Linly-AI](#linly-ai)
- [Qwen](#qwen)
- [Gemini-Pro](#gemini-pro)
- [ChatGPT](#chatgpt)
- [ChatGLM](#chatglm)
- [GPT4Free](#gpt4free)
- [LLM Multiple Model Selection](#llm-multiple-model-selection)
- [Coming Soon](#coming-soon-4)
- [Optimizations](#optimizations)
- [Gradio](#gradio)
- [Start WebUI](#start-webui)
- [WebUI](#webui)
- [Old Verison](#old-verison)
- [Folder structure](#folder-structure)
- [Reference](#reference)
- [License](#license)
- [Star History](#star-history)
<!-- /TOC -->
</details>
## Introduction
Linly-Talker is an innovative digital human conversation system that integrates the latest artificial intelligence technologies, including Large Language Models (LLM) ð¤, Automatic Speech Recognition (ASR) ðï¸, Text-to-Speech (TTS) ð£ï¸, and voice cloning technology ð¤. This system offers an interactive web interface through the Gradio platform ð, allowing users to upload images ð· and engage in personalized dialogues with AI ð¬.
The core features of the system include:
1. **Multi-Model Integration**: Linly-Talker combines major models such as Linly, GeminiPro, Qwen, as well as visual models like Whisper, SadTalker, to achieve high-quality dialogues and visual generation.
2. **Multi-Turn Conversational Ability**: Through the multi-turn dialogue system powered by GPT models, Linly-Talker can understand and maintain contextually relevant and coherent conversations, significantly enhancing the authenticity of the interaction.
3. **Voice Cloning**: Utilizing technologies like GPT-SoVITS, users can upload a one-minute voice sample for fine-tuning, and the system will clone the user's voice, enabling the digital human to converse in the user's voice.
4. **Real-Time Interaction**: The system supports real-time speech recognition and video captioning, allowing users to communicate naturally with the digital human via voice.
5. **Visual Enhancement**: With digital human generation technologies, Linly-Talker can create realistic digital human avatars, providing a more immersive experience.
The design philosophy of Linly-Talker is to create a new form of human-computer interaction that goes beyond simple Q&A. By integrating advanced technologies, it offers an intelligent digital human capable of understanding, responding to, and simulating human communication.
![The system architecture of multimodal humanâcomputer interaction.](docs/HOI_en.png)
> [!NOTE]
>
> You can watch the demo video [here](https://www.bilibili.com/video/BV1rN4y1a76x/).
>
> I have recorded a series of videos on Bilibili, which also represent every step of my updates and methods of use. For detailed information, please refer to [Digital Human Dialogue System - Linly-Talker Collection](https://space.bilib
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
该资源是数字人与大模型结合项目,专注于利用现代AI技术实现数字人生成、语音合成以及虚拟主播的相关功能。该项目整合了多个最新的模型和工具,旨在为开发者提供创建虚拟人物和语音交互系统的高效框架。是一个基于多种深度学习模型的框架,支持包括面部生成、语音驱动、自然语言处理、虚拟人物动画等功能的实现。它整合了多个开源AI模型,如 NeRF(神经辐射场)、FunASR、ERNeRF 等,能够完成数字人的语音生成、语音识别、面部动画生成以及基于自然语言的对话系统。
资源推荐
资源详情
资源评论
收起资源包目录
该资源是数字人与大模型结合项目,专注于利用现代AI技术实现数字人生成、语音合成以及虚拟主播的相关功能 (511个子文件)
bindings.cpp 2KB
bindings.cpp 282B
bindings.cpp 282B
bindings.cpp 268B
log.csv 1KB
raymarching.cu 78KB
shencoder.cu 37KB
gridencoder.cu 20KB
freqencoder.cu 4KB
.gitignore 155B
.gitignore 13B
.gitmodules 259B
raymarching.h 7KB
gridencoder.h 966B
freqencoder.h 549B
shencoder.h 439B
colab_webui.ipynb 842KB
full4.jpeg 26KB
Alipay.jpg 199KB
UI2.jpg 156KB
QR.jpg 143KB
WeChatpay.jpg 142KB
UI.jpg 62KB
vocab.json 914KB
vocab.json 779KB
english.json 55KB
s2.json 2KB
tokenizer_config.json 604B
tokenizer_config.json 236B
special_tokens_map.json 90B
special_tokens_map.json 90B
added_tokens.json 25B
LICENSE 1KB
BBRegressorParam_r.mat 22KB
boy.mat 2KB
girl.mat 2KB
similarity_Lm3D_all.mat 994B
README.md 51KB
README_zh.md 44KB
常见问题汇总.md 23KB
README.md 8KB
AutoDL部署.md 8KB
README.md 8KB
README.md 7KB
speed_benchmark.md 6KB
README.md 4KB
README.md 4KB
README.md 3KB
README.md 2KB
install.md 2KB
README.md 1KB
Certificate.md 1KB
eval.md 655B
README.md 498B
README.md 209B
README.md 209B
README.md 33B
modelzoo.md 0B
seaside4_musev.mp4 17.74MB
man_musev.mp4 2.22MB
sun_musev.mp4 2.12MB
monalisa_musev.mp4 2.02MB
yongen_musev.mp4 1.78MB
sit_musev.mp4 1000KB
musk_musev.mp4 441KB
mel_filters.npz 2KB
key.pem 3KB
cert.pem 2KB
engdict_cache.pickle 6.23MB
art_4.png 3.46MB
art_8.png 2.97MB
art_17.png 2MB
art_16.png 1.41MB
art_3.png 1.29MB
boy.png 1.29MB
art_9.png 1.2MB
art_5.png 1.17MB
GPT-SoVITS.png 882KB
UI3.png 854KB
art_2.png 812KB
art_0.png 733KB
UI.png 705KB
art_12.png 704KB
art_20.png 694KB
WebUI.png 665KB
art_15.png 657KB
art_14.png 635KB
full3.png 617KB
art_13.png 617KB
art_10.png 556KB
art_7.png 509KB
art_1.png 478KB
art_11.png 477KB
art_19.png 462KB
UI5.png 443KB
XTTS.png 433KB
UI2.png 410KB
WebUI3.png 373KB
UI4.png 311KB
HOI.png 239KB
共 511 条
- 1
- 2
- 3
- 4
- 5
- 6
资源评论
萧鼎
- 粉丝: 2w+
- 资源: 109
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功