# Bark Web UI
This application is a Python Flask-based web UI designed to facilitate the generation of text-to-speech using [Suno AI's Bark](https://github.com/suno-ai/bark). It offers a variety of customisation options, including the ability to modify voice pitch, speed, and other parameters.
## Screenshot
![Bark Web UI Screenshot](barkwebui_screenshot.png)
## Sample audio
Some have pitch and speed adjustments applied.
![Sample Audio 01](https://github.com/bradsec/barkwebui/assets/7948876/477d6410-e9df-4809-ac74-f22647292a36)
![Sample Audio 02](https://github.com/bradsec/barkwebui/assets/7948876/cf09b7b6-133e-435f-8b99-dfae8d5278da)
![Sample Audio 03](https://github.com/bradsec/barkwebui/assets/7948876/287472ce-896f-4412-b096-e78fc738f6dd)
![Sample Audio 04](https://github.com/bradsec/barkwebui/assets/7948876/04fbd340-7605-41b8-8c7b-abfbb923259a)
## Installation
1. Install Bark by following the instructions from the [Bark repository](https://github.com/suno-ai/bark).
1a. If you have not run bark before you will need to download the models, running a test will download and cache the required models (note models vary in size including one over 5GB in size).
```terminal
python -m bark --text "Let's get this party started!" --output_filename "party.wav"
```
2. Once bark is running clone this repo into a directory called `webui` within the `bark` installation location.
```Terminal
cd bark
git clone https://github.com/bradsec/barkwebui webui
```
3. Install any additional Python packages mentioned in the [requirements.txt](requirements.txt) file to meet the required imports in `app.py` and `bark_connector.py`. There will be shared imports already installed by the Bark setup process. If applicable before installing imports activate the Python venv or conda/miniconda environment you are using for Bark.
4. Run the `python barkwebui_server.py` from within the `webui` folder to start the Flask web server application and a similar output should be displayed:
```terminal
* Serving Flask app 'barkwebui_server'
* Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:5000
```
5. Access web application via browser address as shown in terminal window.
## Structure
- `barkwebui_server.py` provides the Flask web server functionality receives and returns information from the web interface and passes into `barkwebui_connector.py`. Also handles writing and deleting from of entries from the JSON dateset.
- `barkwebui_connector.py` breaks up text input before passing text to the Bark application. Also applies any audio effect selected like changes to speed, pitch, noise reductions and removing silence if selected. It will then write the `.wav` with unique filename to the `static/output` directory.
- `templates/index.html` the only HTML file used for the app. It will reference other files like css and JavaScript from the `static` directory.
- `static/js` - This directory contains two JavaScript for index.html template page.
- `barkwebui.js` provides most of the page functionality and the link between `app.py` using [Socket.IO](https://socket.io/)
- `populate.js` populates the select dropdown options in index.html.
- `theme.js` for dark and light theme switching.
- `static/output` contains the completed wav audio files.
- `static/json` contains the `barkwebui.json` which contains information about any generated audio files.
<details>
<summary>Text Temperature</summary>
<br>
This parameter affects how the model generates speech from text. A higher text temperature value makes the model's output more random, while a lower text temperature value makes the model's output more deterministic. In other words, with a high text temperature, the model is more likely to generate unusual or unexpected speech from a given text prompt. On the other hand, with a low text temperature, the model is more likely to stick closely to the most probable output.
</details>
<details>
<summary>Waveform Temperature</summary>
<br>
This parameter affects how the model generates the final audio waveform. A higher waveform temperature value introduces more randomness into the audio output, which might result in more unusual sounds or voice modulations. A lower waveform temperature, on the other hand, makes the audio output more predictable and consistent.
</details>
<details>
<summary>Reduce Noise / Noise Reduction (NR)</summary>
<br>
Reduce background noise (not as good as an AI enhanced cleaner and often difficult to tell impact to audio given the randomness of each Bark generated speech even with same settings, it also can't remove echoing or AI hallucination). Code Ref (bark_connector.py): If value of 'reduce_noise' is True, it triggers noise reduction on the generated audio using the noisereduce library. reduce_noise takes the audio data and the sample rate as parameters and returns the audio with reduced noise. If reduce_noise is False, no noise reduction is applied, and the original audio is used.
</details>
<details>
<summary>Remove Silence (RS)</summary>
<br>
Remove any extended pauses or silence (may not do much, was included for situations when generated voice contains long pauses for unknown reasons). Code Ref (bark_connector.py): If value of 'remove_silence' is True, it enables aggressive silence removal by setting the VAD (Voice Activity Detection) to level 3. The webrtcvad library is used for voice activity detection. If remove_silence is False, the VAD level is set to 0, which means no silence removal is applied. The sample rate also had to be reduced to 16000 from 24000 to get it to work with the webrtcvad library.
</details>
<details>
<summary>Adjusting audio speed and pitch</summary>
<br>
Changes to speed and pitch may cause a fair amount of echo and reverb in the output audio. Running the audio through a third-party AI audio tool may help remove echo or reverb. A library called librosa is used for manipulating the audio speed and pitch. The speed of the audio is adjusted using the `librosa.effects.time_stretch` function, which stretches or compresses the audio by a certain factor. If the speed parameter passed into the `generate_voice` function is not 1.0 (i.e., the speed of the audio needs to be changed), the audio is time-stretched by the given rate. For instance, if the speed is 2, the audio's duration will be halved, making it play twice as fast. The pitch of the audio is adjusted using the `librosa.effects.pitch_shift` function. This function shifts the pitch of the audio by a certain number of half-steps. If the pitch parameter passed into the `generate_voice` function is not 0 (i.e., the pitch of the audio needs to be changed), the pitch of the audio is shifted by the given number of half-steps. For instance, if the pitch is 2, the pitch of the audio will be increased by 2 half-steps.
</details>
### Clearer Speech and Audio Results
**You will get cleaner speech and better results just generating without NR or RS checked and running through an AI-enhanced tool like [Adobe Podcast Enhance](https://podcast.adobe.com/enhance) or other similar tools.**
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
1、资源内容:基于 Python Flask 的 Web UI,旨在促进使用 Suno AI 的 Bark 生成文本转语音 2、适用人群:计算机,电子信息工程、数学等专业的学习者,作为“参考资料”参考学习使用。 3、解压说明:本资源需要电脑端使用WinRAR、7zip等解压工具进行解压,没有解压工具的自行百度下载即可。
资源推荐
资源详情
资源评论
收起资源包目录
基于 Python Flask 的 Web UI,旨在促进使用 Suno AI 的 Bark 生成文本转语音.zip (20个子文件)
new-2
LICENSE 1KB
barkwebui_screenshot.png 199KB
templates
index.html 10KB
barkwebui_connector.py 10KB
requirements.txt 67B
.gitignore 241B
barkwebui_server.py 4KB
static
js
barkwebui.js 10KB
theme.js 1KB
populate.js 2KB
output
2ae166c31fd04d648676e2978fc4bdc2.wav 1.18MB
95ad6c96eec942b2a17802950a31abc0.wav 2MB
db373183a20f49afb3d345db33125504.wav 1.7MB
d27e5222690742e5b0c1ed769be98f28.wav 1.77MB
30257a6610ec42f6b4a7cdb6f6154764.wav 1.74MB
57b6a85964cf48aeb2c86bdf031130e9.wav 1.25MB
img
favicon.ico 15KB
css
barkwebui.css 12KB
json
barkwebui.json 4KB
README.md 7KB
共 20 条
- 1
资源评论
白话Learning
- 粉丝: 3271
- 资源: 2464
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- [大模型部署]在C# Winform中使用文心一言ERNIE-3.5 4K 聊天模型
- python毕业设计-基于Django+OpenCV的二维码生成与识别系统设计与实现.zip
- python毕业设计-基于Django+OpenCV的二维码生成与识别系统设计与实现+使用说明.zip
- 基于STM32单片机空气监测系统设计源码+详细文档+配套全部资料(毕业设计).zip
- rdf0412-kcu116-pcie-c-2019-1.zip(XILINX KCU116 源码)
- 基于C#语言的winform界面火车票订票系统(源码+实验报告)
- 【华为OD部分真题及讲解】华为OD部分真题及讲解
- 基于Python+Django的京东商品比价系统源码+全部资料(毕业设计).zip
- G460 G560 Z460 Z560的最新BIOS 2.18版(无白名单)
- MetaJUI v0.4
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功