Kaldi语音识别实验讲义(全)University of Edinburgh

所需积分/C币:42 2019-01-14 11:23:58 794KB PDF
收藏 收藏 1

The main goal of this lab is to get acquainted with Kaldi. We will begin by creating and exploring a data directory for the TIMIT dataset. Then we will extract features for TIMIT upon which we can train a complete speech recognition system in the coming labs. An underlying goal of this lab is to get
Your work dir now has a typical directory structure for Kaldi. Type the following command to list its contents Is You should see the following files and folders steps contains scripts for creating an AsR system utils contains scripts to modify Kaldi files in certain ways, for example to subset data directories into smaller pieces local this directory typically contains files that relate only to the corpus we're working on (c g. TIMIT). In this casc it also may contain files wc havc provided Ior you. data will contain any data directories, such as a train and test directory for TIMIT. We will create these below exp contains the actual experiments and models, as well as logs conf contains configurations for ccrtain scripts that may rcad them. morc on this later path. sh contains the path to the Kaldi source directory Kaldi binaries are stored some place else than the experiment directory. To access them from anywhere, we set- inside the file path. sh-an environment variable KALdI RooT to point to the Kaldi installation. To set this variable type source path. sh or equivalently /path. sh To see whether it is set and where it points run echo sKALDI ROOT The command echo prints any string to the terminal along with any variables(e. g SKALDI ROOT). To omit newlines use the flag -n imilarly, the steps and utils directories are symbolic links to directories in the base install. steps includes essential scripts to train and decode Asr systems, while utils contains a number of scripts to for example manipulate the data directory. Local files pertaining to the current experiment go in local This is a good place to put any utility scripts you write The path. sh file is called at the beginning of all Kaldi scripts, e. g. look at the first, 1& lines of the script that computes mfccs 3 head -18 steps/make_mfcc. sh head -l file and tail -1 file prints the first and last l lines of file. A useful variation is tail -n +l which prints from line l to the end Now that the environment variables are set, try to run a typical Kaldi binary without any arguments, cg feat-to-dim It should provide an explanation of its purpose and usage instructions. This is common to all Kaldi binaries and scripts Forgetting to run source path. sh if one of the most common mistakes If you are getting errors like bash: feat-to-dim: command not found Try to source path. sh and rerun the previous command Another common mistake is running commands from other directories than /asrworkdir, if you get an error, make sure that you are in c/asrworkdir with the pwd command Other common errors can be found in Appendix 2. 3 Kaldi comes with recipes for various corpora. Thcsc arc typically cm- bodied in a run. sh script in the main directory, with supporting files in local. This script will call high level scripts in steps and utils which in turn call binaries which perform the actual computation recipe scripts binaries run.sh steps/make_mfcc.sh compute-mfcc-feats shell shell /Perl /Python Mainly C++ 2 TIMIT We will create a data directory for TIMIT and extract features. We will write the commands one-by-one into the shel If you're new to Bash scripting or need a refresher, here's a few resources you Inlay find useful. The first three expect no previous kiowledge of Bash, the last is good to get, to know many useful commands .BashProgramming-introductionHow-to:http://tldp.org/howto/ Bash-Prog-Intro-HOWTO. html .AdvancedBash-scriptingGuidehttp://www.tldp.org/ldp/abs/htM1/ index. html .BashguideforBeginnershttp://www.tldp.orG/ldp/bash-beginners-guide/ html/index. html .UnixforPoets:http://www.cs.upc.edu/-"padro/unixforpoets.pdf 2.1 Data preparation In the da ta preparation step we will create directories in data which will store any training and test sets, features and eventually a language model The first line sets the environment variables, if path. sh exists. It's a good dea to run this at the beginning of any Kaldi scripts L -f./path. sh & path. sh &i& will execute the next command if the previous succeeded(a typical Kaldi convention is using the opposite, I in its scripts, ending li with command I exit 1, which means to exit the script with status 1 (error) if the preceding commands did not succeed Next, create we create data directories for timit by running the following two lines. Dont worry about warnings of nonzero return status timit=/group/corporapublic/timit/original local/timit create data. sh stimit The data we just created is in the data directory. To appreciate better what this script does, navigate to the original TIMIT corpus training data directory and list its contcnts cd/group/corporapublic/timit/original/train Is It's split into multiple folders. Dive into the first and look at it cd dr1 Is Each of these directories represent a speaker. Move into the first speakers directory and list the contents d fifo For each utterance there are four files:. phn,. txt,. wav and.wrd less. You can use the up and down arrows to navigate. Hit a to eni: ommand Look at each file in turn to figure out what they represent using the less sal phn 1e3ssa1.七xt less sal way less sal. wrd less is useful when you only want to view a file and not edit it. It is also smart in that it docsn't rcad the cntirc filc into momory at oncc so files like sal. wav which are not Ilorlllally soInlethiing you would look at are handled neatly It's probably more interesting to listen to sal.wav. On DICe you can write play sal. wav Let's go back to our Kaldi work directory and see what we created with the command above cd /asrworkdir Navigate to one of the created subdirs and look at the contents cd data/train ls The following files should be present. Have a look at each less text less spk2utt less wav scp less spk2gender The script has combined all the information from the timit directory we just looked at into files that neatly contain the information in a way that Kald can work with it The files are closely related by utterance and speaker ids, abbreviated to ull_id and sphi_id in Figure 2. If each utterance is froin a different speaker, or if we have no information about speakers, then the utterance and speaker ids match. Othorwisc thc speaker information is uscd to pool statistics across ut trances for speaker adaptation and for speaker specific scoring. In the absence of a segments file, which sets out what portions of each audio file should be used for an utterance id, the recording ids are equivalent to the utterance ids In this case we use the entire length of each audio file set out in wav scp Change directory back to the main workdir 6 text tspk utt id transcription utt_id spk_id utils/spk2utt-to_Utt2spk sh wav.scp spk2utt rec_id ext_filename spk_id utt_id1 utt_id2 ext-filename can be a pipe spk id = utt _id if no speaker info e g 'sox/path/file. wav -r 16000-t wav-I feats.sc utt id mfcc. ark. xx fseeki to pos xx in mfcc. ark Figure 2: Illustration of a Kaldi data directory structure d / asrworkdir To check that the data directories conforms to Kaldi specifications, validate them by running the following two lines utils/validate data dir. sh data/train utils/validate data dir. sh data/test Uh oh. We're missing utt2spk, but we have spk2utt. These two files contain the same information, just with the mapping reversed. So we can eas- ily convert one into the other. In the utils directory there is a file called spk2utt-to-utt2spk. pl. This is a Perl script which reads from stdin and instead of stdout) we use>. Run the following couldllds. Out and into a file writes to stdout. To pipe into the script we use < to pipe out and into a file utils/spk2utt-to_utt2spk pl data/train/spk2utt data/train/utt2spk utils/spk2utt-to_utt2spk pl data/test/spk2utt >data/test/utt2spk Run the validation scripts again. There should only be a feats. scp file missing, which we'll create next Have a look at the file you just created. How does it relate to spk2utt? less data/train/utt2spk less data/train/spk2utt To see how Inany utterances there are in the training directory, we call use th e command w wc-1< data/train/utt2sp How many speakers are there in the training data.? 2.2Fc eatures We'll now generate the features and the corresponding feats. scp script file that will map utterance ids to positions in an archive. e.g. feats. ark For GMM-HMM systems we typically use MfCC or PLP features, and then apply cepstral Imeall and variance norimlalisatiOll For the next step it can be handy to use a for loop, to loop over directory names. In Bash the syntax is for var in item1 item2 item3: do echo svar done This will print item1 item2 item3 We will create MFCCs for our dat a. Run the following lines, which OODS over the data directories and extracts features for each for dir in train test: do steps/make_mfcc. sh data/dir exp/make_mfcc/dir mfcc done This will have created feats. scp with corresponding archives in a folder called mfcc and written log files to exp/make mfcc You will now compute cepstra. I mean and variance normalisa tion st atistics for the data. Find the appropriate script in the steps folder- perhaps using ls steps/*cmvn*. The run the script as above for dir in train test: do steps/<insert-script-here> data/Sdir de one This will create cmvn scp in each data director e Validate the data directory again 2.2.1 Script and archives(. scp, *ark) scp files map utterance ids to positions in ark files. The latter con tain the actual data. Kaldi binaries generally read and write script and archives interchangeably, as long as the filename is prepended with the type of file you wish to read or write, e. g. scp: feats. scp or ark: mfcc. ark or ark -to write to stdout Archives will be written in binary, unless you append the t modifier: ark, t: mfcc. ark feats. scp feats. ark utt1 feats. ark: 14 utt1 utt2 feats ark: 201 51.49503-2.626585-10.14908 52.92405-3383574-10.91502 52.92405-1301857-1380937 FormoreseethedocumentationonKaldiI/omechanismsseehttp //kaldi-asr. org/doc/io.html#io_sec_tables Kaldi binaries typically read and / or write script and archive files. When this is the case, the usage message will show specifier or specifier. Scripts and archives represent the same data, so passing either to a program yields the same results Let's try using the programme feat-to-dim to find the dimensions of the Teatures we just created feat-to-dim scp: data/train/feats. sc feat-to-dim ark: mfcc/raw mfcc train. 1. ark Are they the same? Lets have a look at the act ual features too. The archives are by default writ- ten in binary, but we can make a suitable copy using the program copy-feats and a suitable write specifier(see box above). We pipe it into head to avoid overflowing the terminal window copy-feats data/ /feats. scp ark, t: head Do the features match Read specifiers can take bash commands ending with a pipe()as arguments This can be handy if you only want to look at the features for a particular utterance Try replacing the read specifier scp: data/train/feats. scp in your pre- Is solution with 'scp: grep fdfb0-sx58 data/train/feats.scp What does this do? grep will search for a string in a file and output that entire line by default: grep string filename. The string could be a regex query and there are a lot of options. See man grep for more. Try the same trick as above, but find how many frames that utterance has using the prograln feat-to-len While write specifiers call write to stdout(e.g. ark: -) read specifiers call read from stdin. What does the following command do? This syntax is crucia. to piping Kaldi programmes together head -10 data/train/feats. scp I tail -1 I copy-feats scp: -ark,t head steps/make mfcc. sh, which you ran above, will use the program compute-mfcc-feats to extract features. This program looks for conf/mfcc. conf in the conf folder for any non-default parameters. These are passed to the corresponding binaries Look at that file less conf/ mfcc. conf and compare it to the options for the program by running it without argu- ments compute-mfcc-feats If you have time, let's combine what we've learned and create filterbank features Create copies of your data directories and generate filterbank and pitch features for each (look in the steps folder for a suitable script). However first create a conf/fbank. conf file (using some text editor or see box bclow). Includc an argument to sct thc dimension of thc filterbank fcaturcs to 40.(Hint: Look at compute-fbank-feats for arguments). Finally, check the feature dimension and make sure it is 43(there are three pitch features) To open or create a file in nano, type nano conf/fbank. conf Inside nano, use the arrow keys to move around the text file. To exit hit ctrl+X and hit Y or N to the question of whether to save any changes or not. Other commands are listed at the bottom of the window Were done! Next time we 'll build a. gMM-HMM system

试读 34P Kaldi语音识别实验讲义(全)University of Edinburgh
立即下载 低至0.43元/次 身份认证VIP会员低至7折
    • GitHub

    • 签到新秀

    • 分享精英


    关注 私信 TA的资源

    Kaldi语音识别实验讲义(全)University of Edinburgh 42积分/C币 立即下载
    Kaldi语音识别实验讲义(全)University of Edinburgh第1页
    Kaldi语音识别实验讲义(全)University of Edinburgh第2页
    Kaldi语音识别实验讲义(全)University of Edinburgh第3页
    Kaldi语音识别实验讲义(全)University of Edinburgh第4页
    Kaldi语音识别实验讲义(全)University of Edinburgh第5页
    Kaldi语音识别实验讲义(全)University of Edinburgh第6页
    Kaldi语音识别实验讲义(全)University of Edinburgh第7页
    Kaldi语音识别实验讲义(全)University of Edinburgh第8页
    Kaldi语音识别实验讲义(全)University of Edinburgh第9页
    Kaldi语音识别实验讲义(全)University of Edinburgh第10页
    Kaldi语音识别实验讲义(全)University of Edinburgh第11页


    42积分/C币 立即下载 >