该资源是数字人与大模型结合项目，专注于利用现代AI技术实现数字人生成、语音合成以及虚拟主播的相关功能

共511个文件

py：349个

png：49个

md：21个

版权申诉

自然语言处理

人工智能

python

124 浏览量 2024-10-14 14:48:53 上传评论收藏 78.28MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

该资源是数字人与大模型结合项目，专注于利用现代AI技术实现数字人生成、语音合成以及虚拟主播的相关功能（511个子文件）

bindings.cpp 2KB

bindings.cpp 282B

bindings.cpp 268B

log.csv 1KB

raymarching.cu 78KB

shencoder.cu 37KB

gridencoder.cu 20KB

freqencoder.cu 4KB

.gitignore 155B

.gitignore 13B

.gitmodules 259B

raymarching.h 7KB

gridencoder.h 966B

freqencoder.h 549B

shencoder.h 439B

colab_webui.ipynb 842KB

full4.jpeg 26KB

Alipay.jpg 199KB

UI2.jpg 156KB

QR.jpg 143KB

WeChatpay.jpg 142KB

UI.jpg 62KB

vocab.json 914KB

vocab.json 779KB

english.json 55KB

s2.json 2KB

tokenizer_config.json 604B

tokenizer_config.json 236B

special_tokens_map.json 90B

added_tokens.json 25B

LICENSE 1KB

BBRegressorParam_r.mat 22KB

boy.mat 2KB

girl.mat 2KB

similarity_Lm3D_all.mat 994B

README.md 51KB

README_zh.md 44KB

常见问题汇总.md 23KB

README.md 8KB

AutoDL部署.md 8KB

README.md 8KB

README.md 7KB

speed_benchmark.md 6KB

README.md 4KB

README.md 3KB

README.md 2KB

install.md 2KB

README.md 1KB

Certificate.md 1KB

eval.md 655B

README.md 498B

README.md 209B

README.md 33B

modelzoo.md 0B

seaside4_musev.mp4 17.74MB

man_musev.mp4 2.22MB

sun_musev.mp4 2.12MB

monalisa_musev.mp4 2.02MB

yongen_musev.mp4 1.78MB

sit_musev.mp4 1000KB

musk_musev.mp4 441KB

mel_filters.npz 2KB

key.pem 3KB

cert.pem 2KB

engdict_cache.pickle 6.23MB

art_4.png 3.46MB

art_8.png 2.97MB

art_17.png 2MB

art_16.png 1.41MB

art_3.png 1.29MB

boy.png 1.29MB

art_9.png 1.2MB

art_5.png 1.17MB

GPT-SoVITS.png 882KB

UI3.png 854KB

art_2.png 812KB

art_0.png 733KB

UI.png 705KB

art_12.png 704KB

art_20.png 694KB

WebUI.png 665KB

art_15.png 657KB

art_14.png 635KB

full3.png 617KB

art_13.png 617KB

art_10.png 556KB

art_7.png 509KB

art_1.png 478KB

art_11.png 477KB

art_19.png 462KB

UI5.png 443KB

XTTS.png 433KB

UI2.png 410KB

WebUI3.png 373KB

UI4.png 311KB

HOI.png 239KB

共 511 条

# Digital Human Intelligent Dialogue System - Linly-Talker â 'Interactive Dialogue with Your Virtual Self' <div align="center"> <h1>Linly-Talker WebUI</h1> [![madewithlove](https://img.shields.io/badge/made_with-%E2%9D%A4-red?style=for-the-badge&labelColor=orange)](https://github.com/Kedreamix/Linly-Talker) <img src="docs/linly_logo.png" /><br> [![Open In Colab](https://img.shields.io/badge/Colab-F9AB00?style=for-the-badge&logo=googlecolab&color=525252)](https://colab.research.google.com/github/Kedreamix/Linly-Talker/blob/main/colab_webui.ipynb) [![Licence](https://img.shields.io/badge/LICENSE-MIT-green.svg?style=for-the-badge)](https://github.com/Kedreamix/Linly-Talker/blob/main/LICENSE) [![Huggingface](https://img.shields.io/badge/ð¤%20-Models%20Repo-yellow.svg?style=for-the-badge)](https://huggingface.co/Kedreamix/Linly-Talker) [**English**](./README.md) | [**ä¸æç®ä½**](./README_zh.md) </div> **2023.12 Update** ð **Users can upload any images for the conversation** **2024.01 Update** ðð - **Exciting news! I've now incorporated both the powerful GeminiPro and Qwen large models into our conversational scene. Users can now upload images during the conversation, adding a whole new dimension to the interactions.** - **The deployment invocation method for FastAPI has been updated.** - **The advanced settings options for Microsoft TTS have been updated, increasing the variety of voice types. Additionally, video subtitles have been introduced to enhance visualization.** - **Updated the GPT multi-turn conversation system to establish contextual connections in dialogue, enhancing the interactivity and realism of the digital persona.** **2024.02 Update** ð - **Updated Gradio to the latest version 4.16.0, providing the interface with additional functionalities such as capturing images from the camera to create digital personas, among others.** - **ASR and THG have been updated. FunASR from Alibaba has been integrated into ASR, enhancing its speed significantly. Additionally, the THG section now incorporates the Wav2Lip model, while ER-NeRF is currently in preparation (Coming Soon).** - **I have incorporated the GPT-SoVITS model, which is a voice cloning method. By fine-tuning it with just one minute of a person's speech data, it can effectively clone their voice. The results are quite impressive and worth recommending.** - **I have integrated a web user interface (WebUI) that allows for better execution of Linly-Talker.** **2024.04 Update** ð - **Updated the offline mode for Paddle TTS, excluding Edge TTS.** - **Updated ER-NeRF as one of the choices for Avatar generation.** - **Updated app_talk.py to allow for the free upload of voice and images/videos for generation without being based on a dialogue scenario.** **2024.05 Update** ð - **Updated the beginner-friendly AutoDL deployment tutorial, and also updated the codewithgpu image, allowing for one-click experience and learning.** - **Updated WebUI.py: Linly-Talker WebUI now supports multiple modules, multiple models, and multiple options** **2024.06 Update** ð - **Integrated MuseTalk into Linly-Talker and updated the WebUI, enabling basic real-time conversation capabilities.** - **The refined WebUI defaults to not loading the LLM model to reduce GPU memory usage. It directly responds with text to complete voiceovers. The enhanced WebUI features three main functions: personalized character generation, multi-turn intelligent dialogue with digital humans, and real-time MuseTalk conversations. These improvements reduce previous GPU memory redundancies and add more prompts to assist users effectively.** **2024.08 Update** ð - **Updated CosyVoice to offer high-quality text-to-speech (TTS) functionality and voice cloning capabilities; also upgraded to Wav2Lipv2 to enhance overall performance.** **2024.09 Update** ð - **Added Linly-Talker API documentation, providing detailed interface descriptions to help users access Linly-Talkerâs features via the API.** --- <details> <summary>Content</summary>  - [Digital Human Intelligent Dialogue System - Linly-Talker â 'Interactive Dialogue with Your Virtual Self'](#digital-human-intelligent-dialogue-system---linly-talker--interactive-dialogue-with-your-virtual-self) - [Introduction](#introduction) - [TO DO LIST](#to-do-list) - [Example](#example) - [Setup Environment](#setup-environment) - [API Documentation](#api-documentation) - [ASR - Speech Recognition](#asr---speech-recognition) - [Whisper](#whisper) - [FunASR](#funasr) - [Coming Soon](#coming-soon) - [TTS - Text To Speech](#tts---text-to-speech) - [Edge TTS](#edge-tts) - [PaddleTTS](#paddletts) - [Coming Soon](#coming-soon-1) - [Voice Clone](#voice-clone) - [GPT-SoVITSï¼Recommendï¼](#gpt-sovitsrecommend) - [XTTS](#xtts) - [CosyVoice](#cosyvoice) - [Coming Soon](#coming-soon-2) - [THG - Avatar](#thg---avatar) - [SadTalker](#sadtalker) - [Wav2Lip](#wav2lip) - [Wav2Lipv2](#wav2lipv2) - [ER-NeRF](#er-nerf) - [MuseTalk](#musetalk) - [Coming Soon](#coming-soon-3) - [LLM - Conversation](#llm---conversation) - [Linly-AI](#linly-ai) - [Qwen](#qwen) - [Gemini-Pro](#gemini-pro) - [ChatGPT](#chatgpt) - [ChatGLM](#chatglm) - [GPT4Free](#gpt4free) - [LLM Multiple Model Selection](#llm-multiple-model-selection) - [Coming Soon](#coming-soon-4) - [Optimizations](#optimizations) - [Gradio](#gradio) - [Start WebUI](#start-webui) - [WebUI](#webui) - [Old Verison](#old-verison) - [Folder structure](#folder-structure) - [Reference](#reference) - [License](#license) - [Star History](#star-history)  </details> ## Introduction Linly-Talker is an innovative digital human conversation system that integrates the latest artificial intelligence technologies, including Large Language Models (LLM) ð¤, Automatic Speech Recognition (ASR) ðï¸, Text-to-Speech (TTS) ð£ï¸, and voice cloning technology ð¤. This system offers an interactive web interface through the Gradio platform ð, allowing users to upload images ð· and engage in personalized dialogues with AI ð¬. The core features of the system include: 1. **Multi-Model Integration**: Linly-Talker combines major models such as Linly, GeminiPro, Qwen, as well as visual models like Whisper, SadTalker, to achieve high-quality dialogues and visual generation. 2. **Multi-Turn Conversational Ability**: Through the multi-turn dialogue system powered by GPT models, Linly-Talker can understand and maintain contextually relevant and coherent conversations, significantly enhancing the authenticity of the interaction. 3. **Voice Cloning**: Utilizing technologies like GPT-SoVITS, users can upload a one-minute voice sample for fine-tuning, and the system will clone the user's voice, enabling the digital human to converse in the user's voice. 4. **Real-Time Interaction**: The system supports real-time speech recognition and video captioning, allowing users to communicate naturally with the digital human via voice. 5. **Visual Enhancement**: With digital human generation technologies, Linly-Talker can create realistic digital human avatars, providing a more immersive experience. The design philosophy of Linly-Talker is to create a new form of human-computer interaction that goes beyond simple Q&A. By integrating advanced technologies, it offers an intelligent digital human capable of understanding, responding to, and simulating human communication. ![The system architecture of multimodal humanâcomputer interaction.](docs/HOI_en.png) > [!NOTE] > > You can watch the demo video [here](https://www.bilibili.com/video/BV1rN4y1a76x/). > > I have recorded a series of videos on Bilibili, which also represent every step of my updates and methods of use. For detailed information, please refer to [Digital Human Dialogue System - Linly-Talker Collection](https://space.bilib

评论收藏

内容反馈

版权申诉