Q4:"Please generate a piece of singing voice. Text sequence is 小酒
窝长睫毛 AP 是你最美的记号. Note sequence is C#4/Db4 | F#4/Gb4 |
G#4/Ab4 | A#4/Bb4 F#4/Gb4 | F#4/Gb4 C#4/Db4 | C#4/Db4 | rest |
C#4/Db4 | A#4/Bb4 | G#4/Ab4 | A#4/Bb4 | G#4/Ab4 | F4 | C#4/Db4.
Note duration sequence is 0.407140 | 0.376190 | 0.242180 | 0.509550
0.183420 | 0.315400 0.235020 | 0.361660 | 0.223070 | 0.377270 |
0.340550 | 0.299620 | 0.344510 | 0.283770 | 0.323390 | 0.360340."
A4:
音效
Q5: "Generate an audio of a piano playing."
A5:
Q6: Give me the description of this audio.
A6: The audio is recording of a goat bleating nearby several times.
3D 说话人
Q7: Generate a talking human portrait video .
此外,论文中还有一个多轮对话与上下文理解的例子: