【免费】用RNN生成音乐_王雪婷1_根据歌词生成旋律资源-CSDN文库

需积分: 0 95 浏览量更新于2022-08-08 收藏 551KB DOCX 举报

标题中的"用RNN生成音乐_王雪婷1"表明了本文主要讨论的主题是使用递归神经网络（RNN）来生成音乐。递归神经网络是一种深度学习模型，特别适合处理序列数据，如时间序列或者自然语言，因为它能记住之前的状态，这在音乐生成中非常关键，因为音乐是由连续的音符和节奏组成的。描述中提到，神经网络已经在多个领域产生了革命性的影响，包括图像分类、语言理解，并且展示了它们在艺术创作方面的潜力，比如将照片转化为特定风格的绘画，生成与特定写作风格相符的故事，甚至给出时尚建议。这些例子表明神经网络具有模仿和创造的能力，这为音乐生成提供了可能性。早期的尝试，如Bharucha & Todd(1989)、Mozer(1996)、Chen & Miikkulainen(2001)以及Eck & Schmidhuber(2002)等人所提出的方法，就已经开始利用RNN进行音乐创作。论文"SONG FROM PI: A MUSICALLY P LAUSIBLE NETWORK FOR POP MUSIC GENERATION"介绍了一个创新的框架，该框架是针对流行音乐生成的有层次的递归神经网络。模型的层次结构设计反映了对流行音乐构造的理解，其中低层负责生成旋律，而高层则处理鼓点和和弦等元素，这种分层方法有助于保持音乐的结构性和连贯性。通过与谷歌最近的方法进行比较，该模型在人类评估中表现出色，被更偏好于其生成的音乐。此外，该框架还展示了一些应用，如“神经舞蹈”，可能是指通过分析生成的音乐来创建与之同步的舞蹈动作；“卡拉OK”可能涉及到根据生成的音乐提供歌词或伴唱服务；“神经故事唱歌”可能意味着结合音乐生成和文本生成技术，创造出带有故事情节的歌曲。这些应用进一步证明了该框架的灵活性和实用性。 RNN在音乐生成领域的应用不仅仅是简单的模仿，而是能够创造出具有艺术性和可听性的作品。通过利用深度学习的力量，神经网络模型可以理解和学习音乐的内在规律，从而生成新的、有吸引力的音乐作品。这一领域的研究不仅对于音乐创作有深远影响，也可能启发其他艺术形式的创新。

从圆周率到歌曲：对流行音乐生成的一个音乐的仿真网络

SONG FROM PI: A MUSICALLY P LAUSIBLE NETWORKF OR POP MUSIC GENERATION

杭楚，Raquel Urtasun，Sanja Fidler

计算机科学系

多伦多大学

加拿大安大略湖 M5S 3G4

{ chuhang1122，urtasun，费德勒} @cs.toronto.edu

摘要（ABSTRACT）

We present a novel framework for generating pop music. Our model is a hierarchical

Recurrent Neural Network, where the layers and the structure of the hierarchy encode

our prior knowledge about how pop music is composed. In particular, the bottom layers

generate the melody, while the higher levels produce the drums and chords. We conduct

several human studies that show strong preference of our generated music over that

produced by the recent method by Google. We additionally show two applications of

our framework: neural dancing and karaoke, as well as neural story singing.

我们提出了一个产生流行音乐的新框架。我们的模型是一个有层次的递归神经网络（RNN），

其中层次和层次的结构编码使用了我们的先进知识，这个知识是关于如何组成流行音乐的。

特别是，底层产生旋律，而更高层产生鼓点和和弦。我们进行了几项人类研究，这些研究表

明，我们的生成音乐比谷歌最近的方法产生的更强。此外，我们还展示了两个框架的应用：

神经舞蹈和卡拉 OK，以及神经故事唱歌。

简介（INTRODUCTION）

Neural networks have revolutionized many fields. They have not only proven to be

powerful in performing perception tasks such as image classification and language

understanding, but have also shown to be surprisingly good “artists”. In Gatys

et al. (2015), photos were turned into paintings by exploiting particular drawing

styles such as Van Gogh’s, Kiros et al. (2015) produced stories about images biased

by writing style (e.g., romance books), Karpathy et al. (2016) wrote Shakespeare

inspired novels, and Simo-Serra et al. (2015) gave fashion advice.

神经网络使许多领域发生了革命性的变化。他们不仅被证明具有能力执行感知任务，如图像

分类和语言理解，也显示出了出众的好“艺术家”（能力）。gatys 等人（2015），照片变成

了绘画是利用特定的绘画风格，如梵高；Kiros 等人（2015）产生的写作风格偏向图片故事

（如浪漫书籍）；karpathy 等人（2016）受莎士比亚的启发，写小说；Simo Serra 等人

（2015）给出时尚建议。

Music composition is another artistic domain where neural based approaches have been

proposed. Early approaches exploiting Recurrent Neural Networks (Bharucha & Todd

(1989); Mozer (1996); Chen & Miikkulainen (2001); Eck & Schmidhuber (2002)) date

back to the 80’s.

The main variations between the different models is the representation of the notes

and the outputs they produced,which typically encode melody and chord. Most of these

approaches were single track, in that they produced only one note per time step.

The exception is Boulanger-lewandowski et al. (2012) which generated polyphonic

music, i.e., simultaneous independent melodies.

音乐创作是另一个以神经为基础的方法的艺术领域。早期的方法利用递归神经网络

（ Bharucha & Todd (1989);Mozer （ 1996 ） ;Chen & Miikkulainen (2001); Eck &

Schmidhuber (2002)）追溯到 80 年代。不同模型之间的主要变化是它们所产生的音符和输

出的代表性，它们通常用来编码旋律和和弦。这些方法大多是单轨的，因为他们每次只生产

一个音符。唯一的例外是 Boulanger lewandowski 等人（2012）产生复调音乐，即同时发生

又相互独立的旋律。

In this paper, we aim to generate pop music, where the melody but also chords and

other instruments make up what is typically called a song. We draw inspiration from

the Song from π by Macdonald1

a piano video on Youtube, where the pleasing music

is created from a sequence of digits of π. This video shows both the randomness

and the regularity of music. On one hand, since any possible digit sequence is a

subset of the π digit sequence, this implies that pleasing music can be created

even from a totally random base signal. On the other hand, the composer uses specific

rules such as A Harmonic Minor scale and harmonies to convert the digit sequence

into a music sheet. It is these rules that play the key role in converting randomness

into music.

在本文中，我们的目标是产生流行音乐，就是由旋律、和弦和其他乐器组成的，通常称为歌

曲的东西。我们的灵感来自麦克唐纳德在 YouTube 上的一段钢琴视频，视频中是他从π中得

到的歌曲，这悦耳的音乐是从一个π的数字序列创建的。这部影片展示了音乐的随机性和规

律性。一方面，因为任何可能的数字序列都是π的数字序列的一个子集，这意味着悦耳的音

乐甚至可以从一个完全随机的基带信号中产生。另一方面，作曲家使用特定的规则，如调和

小音阶和和声，将数字序列转换成乐谱。正是这些规则在将随机的东西转换为音乐中发挥了

关键作用。

Following the ideas of Songs from π, we aim to generate both the melody as well

as accompanying effects such as chords and drums. Arguably, these turn even a not

particularly pleasing melody into a well sounding song. We propose a hierarchical

approach, where each level is a Recurrent Neural Network producing a key aspect of

the song. The bottom layers generate the melody, while the higher levels produce

drums and chords. This enables the drum and chord layers to compensate for the melody

in order to produce appleasing music. Adopting the key idea from Songs from π,

we condition our model on the scale type allowing the melody generator to learn the

notes that are typically played in a particular scale（音阶）.

根据从π中获得的歌曲的思想，我们致力于创造既有旋律又附带如和弦和鼓效果的音乐。可

以说，这些歌曲甚至不是特别悦耳的旋律，而是一首动听的歌曲。我们提出了一种分层的方

法，每一层都是一个递归神经网络，产生了歌曲的一个关键方面。底层产生旋律，而更高的

层次产生鼓和和弦。为了生产 appleasing music，这使鼓与弦层为音乐旋律做出补偿。采

用来自π的音乐的关键想法，我们对应音阶类型调整我们的模型，让模型发声器去学习那些

通常谱在特殊音阶地方的音符。

We train our model on 100 hours of midi music containing user-composed pop songs

and video game music. We conduct human studies with music generated with our approach

and compare it against a recent approach by Google, showing that our songs are

strongly preferred over the baseline. In our human study we also perform an ablation

https://youtu.be/OMq9he-5HUU

analysis of our model. We additionally show two new applications: neural dancing

and karaoke as well as neural music singing. As part of the first application we

generate a stickman dancing to our music and lyrics that can be sung with, while

in the second application we condition on the output of Kiros et al. (2015) which

writes a story about an image and convert it into a pop song. We refer the reader

to http://www.cs.toronto.edu/songfrompi/ for our demos and results.

我们在 100 小时的 MIDI 音乐中训练我们的模型，其中包含用户创作的流行歌曲和电子游戏

音乐。我们用我们的方法产生的音乐进行人类研究，并与谷歌最近的方法进行比较，结果显

示我们的歌曲比基线更受欢迎。在我们的人类研究中，我们也对我们的模型进行烧蚀分析。

我们还展示了两个新的应用：神经舞蹈和像神经音乐唱歌的卡拉 OK。作为第一个应用程序

的一部分，我们产生了一个火柴人，它在我们可以唱的那部分音乐和歌词中；而在第二应用，

我们把 Kiros 等人由一幅图像写出的故事作为输出，来把它变成一首流行音乐。我们推荐读

者到 http://www.cs.toronto.edu/songfrompi/看演示和结果。

2.相关的工作（RELATED WORK）

Generating music has been an active research area for decades. It brings together

machines learn-ing researchers that aim to capture the complex structure of music

(Eck & Schmidhuber (2002);Boulanger-lewandowski et al. (2012)), as well as music

professionals (Chan et al. (2006)) and enthusiasts (Johnson; Sun) that want to see

how far a computer can get to be a real composer. Real-time music generation is also

explored for gaming (Engels et al. (2015)).

几十年来，音乐创作一直是一个活跃的研究领域。它汇集了机器学习的研究人员，旨在捕捉

音乐的复杂结构（Eck和Schmidhuber（2002）；Boulanger lewandowski等人（2012），以

及音乐专业人士（Chan等人，2006），和热心者（Johnson，Sun），他们想看看计算机能成

为多么真实的作曲家。实时音乐生成还被探索应用于游戏（恩格斯等人，2015）。

Early approaches mostly instilled knowledge from music theory into generation, by

using rules of how music segments can be stitched together in a plausible way, e.g.,

Chan et al. (2006). On the other hand, neural networks have been used for music

generation since the 80’s (Bharucha & Todd (1989); Mozer (1996); Chen & Miikkulainen

(2001); Eck & Schmidhuber (2002)). Mozer (1996) used a Recurrent Neural Network that

produced pitch, duration and chord at each time step. Unlike most other neural

network approaches, this work encodes music knowledge into the representation. Eck

& Schmidhuber (2002) was first to use LSTMs to generate both melody and chord.

Compared to Mozer (1996), the LSTM captured more global music structure across the

song.

早期的方法主要是从音乐理论到生成音乐的知识注入，通过使用音乐片段如何以合理的方式

缝合在一起的规则，例如，Chan等人（2006）。另一方面，神经网络已经被用于创作音乐是

自80年代起的(Bharucha & Todd (1989); Mozer (1996); Chen & Miikkulainen (2001); Eck

& Schmidhuber (2002))。Mozer（1996）用RNN在每一个时间步的持续时间和和弦中产生的音

调。与大多数其他神经网络方法不同，这项工作将音乐知识编码为表示法。Eck &

Schmidhuber (2002)首次使用LSTM创作既有旋律又有和弦的音乐。相比Mozer（1996），LSTM

会获取歌曲更加全面的音乐结构。

Like us, Kang et al. (2012) built upon the randomness of melody by trying to accompany

it with drums. However, in their model the scale type is enforced. No details about

剩余18页未读，继续阅读

资源推荐

资源评论

食色也

粉丝: 38
资源: 351

用RNN生成音乐_王雪婷1

BiLSTM_RNN-LSTM_RNN_short_lstm神经网络_LSTM_源码.zip

RNN.rar_RNN_RNN matlab_RNN matlab_RNN + matlab_rnn matlab

load_RNN_python负荷预测_rnn预测_RNN负荷预测_RNN_rnn预测_源码.rar.rar

ESNtools_rnn预测_RNN_RNN神经网络_rnn预测_神经网络_源码.zip

rnn-esn_rnnmatlab_RNN_Network_MATLABRNN_bearznl_源码.zip

rnn-esn_rnnmatlab_RNN_Network_MATLABRNN_bearznl_源码.rar

ESNtools_rnn预测_RNN_RNN神经网络_rnn预测_神经网络.zip

BiLSTM_RNN-LSTM_RNN_short_lstm神经网络_LSTM.zip

load_RNN_python负荷预测_rnn预测_RNN负荷预测_RNN_rnn预测.zip

class6_LSTM_pythonLSTM_python实现的LSTM长短时记忆网络_RNN_gru_

char_rnn_generation_tutorial_CN.ipynb

rnn_embedding_1.py

musical-engine.zip_LSTM_LSTM音乐生成_RNN_keras_tensorflow 音乐

RNN_RNN_CNN_

RNN.rar_RNN 时序预测_RNN时序_rnn预测_时序预测_预测

RNN.zip_RNN_RNN python_foodtsz_rnn 实现 加法

rnn.zip_LSTM MATLAB_LSTM-RNN_RNN-LSTM_rnn lstm_rnn matlab

MATLAB.rar_MATLAB RNN_RNN_layrecnet_matlab实现RNN_recurrent

rnn_RNN_theory_

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

BurpLoaderKeygen.jar.zip

BurpSuite V2024.1.1专业版

Chrome Header Editor 插件

Goby红队版-win-x64-2.4.7版本

软件工程导论(第六版)课后习题答案1

OpenVAS GVM 中文翻译补丁

安全认证cisp教材全套

最新资源

RNN.zip_RNN_RNN python_foodtsz_rnn 实现加法