Alex Graves大神的论文

This paper presents a speech recognition sys tem that directly transcribes audio data with text, without requiring an intermediate phonetic repre sentation. The system is based on a combination of the deep bidirectional LSTM recurrent neural network architecture and the Connectionist Tem poral Cl
Towards EndtoEnd Speech Recognition with Recurrent Neural Networks Ut+1 nput Gate Ou: put Gate C Forget Gate at1 Figure 1. Long Shortterm Memory Cell Outputs t+1 Figure 3. Deep recurrent Neural Network. for all N layers in the stack, the hidden vector sequences Backward Layer h"are iteratively computed from n= 1 to N and t= l to Forward laver hH(=Ihnht+Whnhnhi1+h)(11) where ho=a. The network outputs y, are WhNahfv+b (12) Figure 2. Bidirectional recurrent Neural Network. Deep bidirectional RNNs can be implemented by replacing each hidden sequence hn with the forward and back ward do this by processing the data in both directions with two sequences h n and h m,and ensuring that every hidden separate hidden layers, which are then fed forwards to the layer receives input from both the forward and backward same output layer. As illustrated in Fig. 2, a BRNN com lavers at the level below If lstm is used for the hidden putes the forward hidden sequence h, the backward hid layers the complete architecture is referred to as deep bidi den sequence h and the output sequence y by iterating the rectional lstm(Graves et al., 2013) backward layer from t= T to 1, the forward layer from t= l to T and then updating the output layer 3. Connectionist Temporal Classification ht=wiHt+Whhnt1+b (8 Neural networks(whether feedforward or recurrent) are typically trained as fr (W←rt+W + b (9) This requires a separate training target fo yt=Ws, ht+wi h ery frame, which in turn requires the alignment between hy ht+bo (10) the audio and transcription sequences to be determined by Combing BRNNs with LSTM gives bidirectional the HMM. However the alignment is only reliable once LSTM (Graves Schmidhuber, 2005), which can the classifier is trained, leading to a circular dependency between segmentation and recognition (known as Sayres access longrange context in both input directions paradox in the closelyrelated field of handwriting recog A crucial element of the recent success of hybrid systems nition Furthermore, the alignments are irrelevant to most is the use of deep architectures, which are able to build up speech recognition tasks, where only the wordlevel tran progressively higher level representations of acoustic data. scriptions matter:. Connectionist Temporal Classification Deep rnns can be created by stacking multiple rnn hid(CTC)(Graves, 2012, Chapter 7)is an objective function den layers on top of each other, with the output sequence of that allows an rnn to be trained for sequence transcrip one layer forming the input sequence for the next, as shown tion tasks without requiring any prior alignment between in Fig 3. Assuming the same hidden layer function is used the input and target sequences Towards EndtoEnd Speech Recognition with Recurrent Neural Networks The output layer contains a single unit for each of the tran assessed in a more nuanced way. In speech recognition scription labels(characters, phonemes, musical notes etc. ) for example, the standard measure is the word error rate plus an extra unit referred to as the 'blank' which corre (WER), defined as the edit distance between the true word sponds to a null emission Given a length T input sequence sequence and the most probable word sequence emitted by c, the output vectors yt are normalised with the softmax the transcriber. We would therefore prefer transcriptions function, then interpreted as the probability of emitting the with high Wer to be more probable than those with low label (or blank ) with index k at time t WER. In the interest of reducing the gap between the ob jective function and the test criteria, this section proposes Pr(ki, ta) exp yf) (13) a method that allows an RNN to be trained to optimise the ∑ k,exp(yi expected value of an arbitrary loss function defined over output transcriptions(such as WEr) where y is element k of yt. A CT alignment a is a length T sequence of blank and label indices. The probability The network structure and the interpretation of the output Pr(ale of a is the product of the emission probabilities activations as the probability of emitting a label (or blank) at every timestep at a particular timestep remain the same as for CTC Given input sequence a, the distribution Pr(y a)over tran Pr(ala)=IPr(at, tle) scriptions sequences y defined by CTC, and a realvalued transcription loss function C(a, y), the expected transcrip tion loss f( r)is defined as For a given transcription sequence, there are as many possi ble alignments as there different ways of separating the la C(m)= ∑ Pr(yl)(m, 1) (17) bels with blanks. For example(using to denote blanks) the alignments (a and a,,b,c) both In general we will not be able to calculate this expectation correspond to the transcription(a, b,c). When the same exactly, and will instead use MonteCarlo sampling to ap label appears on successive timesteps in an alignment, proximate both L and its gradient. Substituting Eq. (15) the repeats are removed: therefore (a, b, b,b, C,c)and into Eq (17)we see that b Id to (a, b, c). Denoting by B an operator that removes first the repeated labels, then C(a)=∑∑Pra)C(,y)(8 the blanks from alignments and observing that the total ya∈B1(y) probability of an output transcription y is equal to the sum of the probabilities of the alignments corresponding to it, ∑Pr(a)C(m,B(a) (19) we can write Eq(14)shows that samples can be drawn from Pr(ala Pr(yx)=∑P r(aa (15) by independently picking from Pr(h, ta)at each timestep acB and concatenating the results, making it straightforward to approximate the loss: This ' integrating out over possible alignments is what al lows the network to be trained with unsegmented data. The intuition is that. because we dont know where the labels ∑C(m,B(a),a2~Pr(a) within a particular transcription will occur, we sum over all the places where they could occur. Eg. (15)can be ef To differentiate C with respect to the network outputs, first ficiently evaluated and differentiated using a dynamic pro observe from Eq(13)that gramming algorithm (Graves et al., 2006). Given a target alog Pr(alae transcription y", the network can then be trained to min (21) imise the ctc objective function aPr(k, tlac) Pr(k, t Then substitute into Eq. (19), applying the identity CTC(a)=log Pr(y a) (16)Vx/(x)=∫(x)Ⅴ log /(x), to yield aC(a aPr(al) 4. Expected Transcription LOSS a Pr(k,ta a Pr(ki, lla) C(a, B(a)) The CtC objective function maximises the log probahil ity of getting the sequence transcription completely correct ∑I Pr(aa alog Pr(ac) a Pr(h, ta) C(a, B(a)) The relative probabilities of the incorrect transcriptions are therefore ignored, which implies that they are all equally 2 Pr(alz, at=k)C(a, B(a) bad. In most cases however, transcription performance is a: at=k Towards EndtoEnd Speech Recognition with Recurrent Neural Networks This expectation can also be approximated with monte only recalculating that part of the loss corresponding to the Carlo sampling. Because the output probabilities are in alignment change. For our experiments, five samples per dependent, an unbiased sample a' from Pr(ala)can be sequence gave sufficiently low variance gradient estimates converted to an unbiased sample from Pr(a, at=h)by for effective training setting a= k. Every a can therefore be used to provide a gradient estimate for every Pr(k, tar)as follows Note that in order to calculate the word error rate. an end ofword label must be used as a delimiter aC() a Pr(k, tac) ∑C(x,B(a)(2)5. Decoding Decoding a CtC network(that is, finding the most prob with ain Pr(ala)and a, t, k ai vt'f t, a, t, Kk able output transcription y for a given input sequence The advantage of reusing the alignment samples(as can be done to a first approximation by picking the single posed to picking separate alignments for every k, t)is most probable output at every timestep and returning the the noise due to the loss variance largely cancels out, and corresponding transcription only the difference in loss due to altering individual labels is added to the gradient. As has been widely discussed in the argmax, Pr(y a)a B(argmaxa Pr(aa)) policy gradients literature and elsewhere(Peters schaal More accurate decoding can be performed with a beam 2008), noise minimisation is crucial when optimising with search algorithm, which also makes it possible to integrate stochastic gradient estimates. The Pr(k, t a) derivatives a language model. The algorithm is similar to decoding are passed through the softmax function to give methods used for HMMbased systems, but differs slightly due to the changed interpretation of the network outputs aC(a) Pr(k, ta) auk M ∑C(x,B(a2)z(a,1) In a hybrid system the network outputs are interpreted as posterior probabilities of state occupancy, which are then combined with transition probabilities provided by a lan where guage model and an HMM. With Ctc the network out ∑ Pr(k', taC(a B(a., k)) puts themselves represent transition probabilities (in HMM terms, the label activations are the probability of makin transitions into different states and the blank activation is the probability of remaining in the current state). The situa tion is furthe plicated by th The derivative added to yt by a given a is therefore equal emissions on successive timesteps, which makes it nec to the difference between the loss with al= k and the ex essary to distinguish alignments ending with blanks from pected loss with at sampled from Pr(k', ta). This means those ending with labels the network only receives an error term for changes to the alignment that alter the loss. For example, if the loss The pseudocode in algorithm I describes a simple beam function is the word error rate and the sampled alignment search procedure for a CtC network, which allows the yields the character transcription"WTRD ERROR RATE integration of a dictionary and language model. De the gradient would encourage outputs changing the second fine Pr(y, t), Prt(y, t) and Pr(y, t)respectively as the output label to'O, discourage outputs making changes to blank, nonblank and total probabilities assigned to some the other two words and be close to zero every where else. (partial) output transcription y, at time t by the beam searc For the sampling procedure to be effective, there must be h,and set Pr(y, t)=Pr(y, t)+Pr(y t) Define a reasonable probability of picking alignments whose vari the extension probability Pr(h, 9, t )of y by label k at time l as follows: ants receive different losses. The vast majority of aligI ments drawn from a randomly initialised network will give completely wrong transcriptions, and there will therefore Pr(k, y, t)=Pr(k, ta)Pr(kly) Pr(y, t1)if ye= k Pr(y, t1) otherwise be little chance of altering the loss by modifying a single output. We therefore recommend that expected loss min where Pr(k, ta) is the CtC emission probability of k at t, imisation is used to retrain a network already trained with as defined in Eq (13), Pr(hly) is the transition probability CTC, rather than applied from the start from y to y+k and y is the final label in y. Lastly, define Sampling alignments is cheap, so the only significant com y as the prefix of y with the last label removed, and o as putational cost in the procedure is recalculating the loss the empty sequence, noting that Prt(, t)=0Vt for the alignment variants. However, for many loss func The transition probabilities Pr(ky) can be used to inte tions (including word error rate) this could be optimised by grate prior linguistic information into the search. If no Towards EndtoEnd Speech Recognition with Recurrent Neural Networks Algorithm 1 CTC Beam Search There were a total of 43 characters(including upper case Initialise:B←{∞};Pr(∞,0)←1 letters, punctuation and a space character to delimit the words). The input data were presented as spectrograms de b<the W most probable sequences in B rived from the raw audio files using the specgram function B←{} of the matplotlib' python toolkit, with width 254 Fourier fory∈Bdo ify≠ e then windows and an overlap of 127 frames, giving 128 inputs Pr(y, t)<PrT(y, t1)Pr(y, ta) per frame ify∈ B then The network had five levels of bidirectional lstm hid Pr(y, t< Pr(y, t)+ Pr(y, y, t) Pr(y, t)<Pr(y, t1)Pr(,tac den layers, with 500 cells in each layer, giving a total of Add y to B 26.5M weights. It was trained using stochastic gradient for k=1. do descent with one weight update per utterance, a learnin Pr(y+k,t)←0 rate of 104 and a momentum of 0.9 k,t)←Pr(k,y,t) Add(y+k)to B The rnn was compared to a baseline deep neural network Return: maxyEB Pr lul(y, T) HMM hybrid (DNNHMM) The dNNHMM was created using alignments from an SGMMHMM system trained us ing Kaldi recipe‘s5’, model tri4b’( Povey et al.,2011) such knowledge is present(as with standard ctC)then all The 14 hour subset was first used to train a Deep belief Network (DBN(Hinton Salakhutdinov, 2006) with six Pr(kly)are set to 1. Constraining the search to dictionary hidden layers of 2000 units each. The input was 15 frames words can be easily implemented by setting Pr(ky)=1 if (y+h) is in the dictionary and 0 otherwise. To apply a sta of Melscale log filterbanks(I centre frame +7 frames of tistical language model, note that Pr(kly) should represent context) with 40 coefficients, deltas and accelerations. The DBN was trained layerwise then used to initialise a dNN normalised labeltolabel transition probabilities The dnn was trained to classify the central input frame vert a wordlevel language model to a labellevel one, first into one of 3385 triphone states The dnn was trained with note that any label sequence y can be expressed as the con stochastic gradient descent, starting with a learning rate of catenation y=(w+p) where w is the longest complete 0.1. and momentum of 0.9. The learning rate was divided sequence of dictionary words in y and p is the remaining by two at the end of each epoch which failed to reduce the word prefix. Both w and p may be empty. Then we can frame error rate on the development set. After six failed at write tempts, the learning rate was frozen The dnn posteriors Pr(ky)==yc(p+k) Pr(wlw) (23) yere divided by the square root of the state priors durin Pr(ww) decodin where Pr(ww) is the probability assigned to the transition The rnn was first decoded with no dictionary or language from the word history w to the word w, pk is the set of model, using the space character to segment the charac dictionary words prefixed by p and y is the language model ter outputs into words, and thereby calculate the WEr weighting factor The network was then decoded with a 146k word dictio nary, followed by monogram, bigram and trigram language The length normalisation in the final step of the algo models. The dictionary was built by extending the default rithm is helpful when decoding with a language model, wsJ dictionary with 125K words using some augmen as otherwise sequences with fewer transitions are unfairly tation rules implemented into the Kaldi recipe's5'.The favoured it has little impact otherwise anguge models were built on this extended dictionary, us ing data from the WSJ CD(see scripts"wsj_extend dict. sh 6. Experiments and__Ims.sh’ In recipe‘s5”). The language mode weight was optimised separately for all experiments. For The experiments were carried out on the Wall Street Jour the rnn experiments with no linguistic information, and nal (wSJ) corpus(available as LDC corpus ldc93s6b those with only a dictionary, the beam search algorithm in and LDC94s13B). The rnn was trained on both the 14 Section 5 was used for decoding For the RNN experiments hour subset'trainsi84 and the full 8I hour set, with the with a language model, an alternative method was used testdev93 development set used for validation. For both partly due to implementation difficulties and partly to en training sets, the rnn was trained with CTC, as described sure a fair comparison with the baseline system: an Nbest in Section 3, using the characters in the transcripts as the list of at most 300 candidate transcriptions was extracted target sequences. The rnn was then retrained to minimise from the baseline DNNHMM and rescored by the rnN the expected word error rate using the method from Sec using Eq (16). The RNN scores were then combined with tion 4, with five alignment samples per sequence Towards EndtoEnd Speech Recognition with Recurrent Neural Networks Table 1. wall Street Journal Results. All scores are word er fering with the explicit model. Nonetheless the difference was small, considering that so much more prior informa ror rate/character error rate(where known)on the evaluation set LMis the language model used for decoding. 14 Hr'and'81 tion(audio preprocessing, pronunciation dictionary, state Hr'refer to the amount of data used for training ying, forced alignment)was encoded into the baseline sys tem. Unsurprisingly, the gap between RNNCTC and SYSTEM LM 14HR iHR RNNWER also shrank as the lm became more domi NNCTC NoNE 4.2/30.930.1/9.2 nant RNNCTO DICTIONARY69.2/30.02408.0 RNNCTC MONOGRAM 25.8 15.8 The baseline system improved only incrementally from the RNNCTC BIGRAM 10.4 1 4 hour to the 8l hour training set while the rnn error RNNCTC TRIGRAM 13.5 8.7 rate dropped dramatically. a possible explanation is that 14 RNNWER NONE 74.5/31.327.3/8.4 RNNWER DICTIONARY69.7/31.021.9/7.3 hours of transcribed speech is insufficient for the rnn to RNNWER MONOGRAM 26.0 15.2 learn how tospell enough of the words it needs for accu RNNWER BIGRAM 9.8 rate transcriptionwhereas it is enough to learn to identify RNNWER TRIGRAM 8.2 phonemes bASelIN BASELINE DICTIONARY56. The combined model performed considerably better than bAsEliNE MONOGRAM23.4 19.9 either the rnn or the baseline individually. The improve bASELINE BIGRAM 6 9.4 ment of more than 1%o absolute over the baseline is consid bASELINE TRIGRAM 9.4 7.8 COMBINATION TRIGRAM 6.7 erably larger than the slight gains usually seen with model averaging; this is presumably due to the greater difference between the syste the language model to rerank the Nbest lists and the WEr 7. Discussion of the best resulting transcripts was recorded. The best re sults were obtained with an rNn score weight of 7. 7 and a To provide characterlevel transcriptions, the network must language model weight of 16 not only learn how to recognise speech sounds, but how to For the gl hour training set, the oracle error rates for the transform them into letters In other words it must learn monogram, bigram and trigram candidates were 8.9%.o how to spell. This is challenging, especially in an ortho and 1. 4 %o resepectively, while the antioracle(rank 300)er graphically irregular language like English. The following ror rates varied from 45. 5% for monograms and 33 for examples from the evaluation set decoded with no dictio trigrams. USing larger Nbest lists(up to N=1000)did not nary or language model, give some insight into how the yield significant performance improvements, from which network operates we concluded that the list was large enough to approximate target: TO ILLUSTRATE THE POINT A PROMINENT MIDDLE EASTANALYST the true decoding performance of the RNn IN WASHINGTON RECOUNTS A CALL FROM ONE CAMPAIGN An additional experiment was performed to measure the ef output: TWO ALSTRAIT THE POINT A PROMINENT MIDILLE EAST AN fect of combining the rnN and DNN. The candidate scores LYST IM WASHINGTON RECOUNCACALL FROM ONE CAMPAIGN for'RNNWER'trained on the 81 hour set were blended target: T. W.A. ALSO PLANS TO HANG ITS BOUTIQUE SHINGLE IN AIR with the dnn acoustic model scores and used to rerank PORTS AT LAMBERT SAINT the candidates. Best results were obtained with a language output: T W.A. ALSO PLANS TOHING ITS BOOTIK SINGLE IN AIRPORTS AT model weight of l. an rnn score weight of i and a dnn LAMBERT SAINT weight of 1 target: ALL THE EQUITY RAISING IN MILAN GAVE THAT STOCK MARKET The results in Table I demonstrate that on the full training INDIGESTION LAST YEAR set the character level RNN outperforms the baseline model output: ALL THE EQUITY RAISING IN MULONG GAVE THAT STACRK MAR when no language model is present. The rNn retrained to KET IN TO JUSTIAN LAST YEAR minimise word error rate (labelled RNNWer' to distin guish it from the original'RNNCTC network) performed target: THERE'S UNREST BUT WERE NOT GOING TO LOSE THEM TO particularly well in this regime. This is likely due to two DUKAKIS factors: firstly the rnn is able to learn a more powerful Output: THERES UNREST BUT WERE NOT GOING TO LOSE THEM TO acoustic model. as it has access to more acoustic context. DEKAKIS and secondly it is able to learn an implicit language model Like all speech recognition systems, the netwok makes from the training transcriptions. However the baseline sys phonetic mistakes, such as" instead of"single,,and tem overtook the rnn as the Lm was strengthened: in this sometimes confuses homophones like'two'and'to'.The ase the rnns implicit LM may work against it by inter Towards EndtoEnd Speech Recognition with Recurrent Neural Networks H R END outputs errors waveform Figure 4. Network outputs. The figure shows the framelevel character probabilities emitted by the CtC layer (different colour for each character, dotted grey line for ' blanks), along with the corresponding training errors, while processing an utterance. The target transcription was 'HIS FRIENDS, where the underscores are endofword markers. The network was trained with WER loss, which tends to give very sharp output decisions, and hence sparse error signals (if an output probability is 1, nothing else can be sampled, so the gradient is o even if the output is wrong). In this case the only gradient comes from the extraneous apostrophe before the 's. Note that the characters in common sequences such as'IS', 'RI and'END' are emitted very close together, suggesting that the network learns them as single sounds latter problem may be harder than usual to fix with a lan In the future, it would be interesting to apply the system to guage model, as words that are close in sound can be quite datasets where the language model plays a lesser role, such distant in spelling. Unlike phonetic systems, the network as spontaneous speech, or where the training set is suffi also makes lexical errorseg. bootik' for boutique ciently large that the network can learn a language model and errors that combine the two, such as 'alstrait' for il from the transcripts alone. Another promising direction lustrate rould be to integrate the language model into the Ctc or It is able to correctly transcribe fairly complex words such expected transcription loss objective functions during train as 'campaign,,analyst andequity'that appear frequentl in financial texts(possibly learning them as special cases) but struggles with both the sound and spelling of unfamil Acknowledgements iar words, especially proper names such as Milanand Dukakis. This suggests that outofvocabulary words The authors wish to thank daniel Povey for his assistance may still be a problem for characterlevel recognition, even with Kaldi. This work was partially supported by the cana in the absence of a dictionary however, the fact that the dian institute for advanced research network can spell at all shows that it is able to infer sig nificant linguistic information from the training transcripts paving the way for a truly endtoend speech recognition system 8. Conclusion This paper has demonstrated that characterlevel speech transcription can be performed by a recurrent neural net work with minimal preprocessing and no explicit phonetic representation. We have also introduced a novel objective unction that allows the network to be directly optimised for word error rate, and shown how to integrate the net work outputs with a language model during decoding. Fi nally, by combining the new model with a baseline, we have achieved stateoftheart accuracy on the wall street Journal corpus for speaker independent recognition Towards EndtoEnd Speech Recognition with Recurrent Neural Networks References Graves, Alex. Supervised Sequence labelling with Recur Bahl, L, Brown, P. De Souza, P.V., and Mercer, R. Max rent Neural Networks, volume 38.5 of Studies in compu imum mutual information estimation of hidden markov tational intelligence. Springer, 2012 Imodel parameters for speech recognition. In AcousticS, Hinton, G. E. and Salakhutdinov, R.R. Reducing the Di Speech, and Signal Processing, IEEE International Con mensionality of Data with Neural Networks. Science ference on ICASSP 86., volume ll, pp. 4952, Apr 313(5786):504507,July2006 1986.doi:10.1109/ CASSE.1986.1169179 Hinton, Geoffrey, Deng, Li, Yu, Dong, Dahl, George, rah Bisani, Maximilian and Ney, Hermann. Open vocabulary man Mohamed, Abdel, Jaitly, Navdeep, Senior, Andrew speech recognition with flat hybrid models. In INTER Vanhoucke, Vincent, Nguyen, Patrick, Sainath, Tara, and SPEECH,pp.725728,2005 Kingsbury, Brian. Deep neural networks for acoustic Imodeling in speech recognition Signal Processing Mug Bourlard, Herve A. and morgan, Nelson. Connection ist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers Norwell MA usa. 1993. isbn hochreiter, s and schmidhuber, J. Long short Term Mem 0792393961 ory. Neural Computation, 9(8): 17351780, 1997 Ciresan. Dan C. Meier. Ueli. Masci. Jonathan, and Jaitly, Navdeep and Hinton, Geoffrey E. Learning a bet Schmidhuber, Jrgen. A committee of neural networks ter representation of speech sound waves using restricted for traffic sign classification. In //CNN, pp. 1918192 boltzmann machines. In ICASSP, pp. 58845887, 2011 IEEE 2011 Jaitly, Navdeep, Nguyen, Patrick, Senior, Andrew w, and Davis, S. and Mermelstein, P. Comparison of paramet Vanhoucke, Vincent. Application of pretrained deep ric representations for monosyllabic word recognition neural networks to large vocabulary speech recognition in continuously spoken sentences. /EEE Transactions In INTERsPeeCh. 2012 on Acoustics, Speech and Signal Processing, 28(4):357 Krizhevsky, Alex, Sutskever, Ilya, and Hinton, Geoffrey E 366, August 1980 Imagenet classification with deep convolutional neural Eyben, F, WIlmer, M, Schuller, B, and Graves, A networks. In Advances in Neural Information Processing Systems. 2012 From speech to letters using a novel neural network architecture for grapheme based asr. In Proc. Au Lee, Li and Rose, R. A frequency warping approach to tomatic Speech Recognition and Understanding Work speaker normalization. Speech and Audio Processing shop(asrc 2009), Merano, Italy. IEEE, 2009. 13 TEEE Transactions on, 6(1): 4960, Jan 1998 17.12.2009 Peters, J and Schaal,S. Reinforcement learning of motor Galescu, Lucian. Recognition of outofvocabulary words skills with policy gradients. In Neural Networks, num with sublexical language models. In INTERSPEECH ber4,pp.68297.2008 2003 Povey, D, Ghoshal, A, Boulianne, G, Burget, L,, Glem Gers, F, Schraudolph, N, and Schmidhuber, J. Learning bek, O, Goel, N, Hannemann, M, Motlicek, P, Qian Precise Timing with lstm recurrent Networks. Jour. Y, Schwarz, p Silovsky, j, Stemmer, G, and vesely nal of Machine learning Research, 3: 115143, 2002 K. The kaldi speech recognition toolkit. In IEEE 2011 Automatic S Speech Recognition und Un Graves, A and Schmidhuber, J. Framewise Phoneme Clas derstanding. IEEE Signal Processing Society,December sification with bidirectional lstm and other neural 2011. Network Architectures. Neural Networks, 18(56): 602 610, June/July 2005 Schuster. m. and paliwal. K.K. Bidirectional recurrent Neural Networks. IEEE Transactions on Signal Process Graves. A. Fernandez. s. Gomez. f. and Schmidhuber ing,45:26732681,1997 J. Connectionist Temporal classification Labelling un segmented Sequence Data with Recurrent Neural Net works In ICML, Pittsburgh, USA, 2006 Graves, A, Mohamed, A, and Hinton, G. Speech recog nition with deep recurrent neural networks. In Proc ICASSP 2013, Vancouver, Canada, May 2013
 1.94MB
CTC(Connectionist Temporal Classfication)详细介绍，中文版
20190406楼主自己整理的，网上关于CTC的资料参差不齐，一些比较好的资料都是英文版的，因此在这里祭上全网最好的CTC中文版学习资料，图文并茂，容易理解，欢迎下载。
 263KB
论文研究  使用TSH受体抗体测试与非TSH受体抗体测试方法诊断Graves病和非Graves甲状腺功能亢进症的诊断方法
20200515背景：使用血清TRAb检测将Graves甲状腺功能亢进症与其他甲状腺功能亢进症原因区分开来，是诊断的重要步骤。 目的：研究TRAb在Graves病诊断中的重要性，将其与甲状腺炎区分开来，并将其与临床特
 43.24MB
快速生成自己手写字体项目：Handwriting Synthesis
20200513快速生成自己手写字体项目：Handwriting Synthesis。创意始于 Alex Graves 在 2013 年发布的一篇论文：《Generating Sequences With Recur
 259KB
论文研究  静脉注射甲基强的松龙对Graves眼病的症状和体征的影响
20200517背景：格雷夫斯眼病（GO）是格雷夫斯病（GD）最常见的甲状腺外表现。 在孟加拉国人群中，尚未评估静脉注射甲基泼尼松龙对GO体征和症状的影响。 目的：观察静脉注射甲基强的松龙对格雷夫斯主动性眼病的疗效。
 293KB
论文研究  达喀尔艾伯特罗伊尔国家儿童医院儿童Graves病的进化概况
20200521格雷夫斯病是甲状腺机能亢进的最常见原因。 适当的管理是一个有争议的领域，尤其是在儿童方面。 我们研究的目的是评估达喀尔的艾伯特·罗伊尔儿童医院的Graves病治疗结果。 这是一项从2001年至2015
 292KB
DBTMAa new MAC protocol for PRNET.pdf
20200325ahoc网络mac层协议Volodymyr Mnih1*, Koray Kavukcuoglu1*, David Silver1*, Andrei A. Rusu1, Joel Veness1, Ma
 368KB
Graves病患者外周血中CD4+T细胞中miR223的变化
20200202Graves病患者外周血中CD4+T细胞中miR223的变化，耿丽娜，杨珺，目的：建立实时荧光定量PCR（real time fluorescence quantitative PCR，qRTPC
 268KB
论文研究  格雷夫斯病和海洋伦哈特综合征：罕见的临床表现
20200517这项研究的目的是回顾和讨论MarineLenhart综合征（一种罕见的甲状腺疾病）的临床特征和治疗选择。 采用的方法是书目研究。 这项研究的结果表明，不同的机制与Graves病的发病机制以及具有功能
 261KB
论文研究  格雷夫斯病的演变：塞内加尔受试者的社会人口统计学和临床因素的影响
20200513背景：在格雷夫斯病中，缺乏针对撒哈拉以南非洲受试者的性别和年龄的描述。 目的是评估性别和年龄对塞内加尔受试者的格雷夫斯病谱的影响，以了解其演变并改善治疗选择。 方法：这是一项于2010年1月1日至20
 214KB
论文研究  几内亚科纳克里大学医院的甲状腺疾病
20200517目的：描述几内亚科纳克里大学医院甲状腺疾病的流行病学，临床和治疗特征。 方法：这是一项描述性研究，收集了前瞻性数据，于2016年12月至2019年4月在科纳克里大学医院内分泌科门诊进行。 收集，分析和
 7.68MB
Supervised Sequence Labelling with Recurrent Neural Networks
20170323Supervised Sequence Labelling with Recurrent Neural Networks
 43.5MB
递归神经网络模型.zip
20170222递归神经网络的条件随机场 Conditional random fields as recurrent neural networks (2015) 作者S. Zheng and S. Jayasum
 277KB
有毒多结节性甲状腺肿（甲状腺功能亢进症伴甲状腺功能亢进）诱发心肌病：一例报告
20200604简介：有毒的多结节性甲状腺肿，由HS Plummer于1913年首次描述，它持续存在并且经常缓慢发展，其症状比Graves病更微妙。 心脏症状如心动过速，心力衰竭或心律不齐和房颤。 在这里，我们描述了
 7KB
传统传染病建模的matlab程序
20160315该程序为传染病建模的随机，微分，差分方程的matlab代码
 66.88MB
亚太数学建模（APMCM）历年赛题与优秀论文1418年.zip
20191125亚太数学建模（APMCM）历年优秀论文
HoloLens2开发入门教程
20200501本课程为HoloLens2开发入门教程，讲解部署开发环境，安装VS2019，Unity版本，Windows SDK，创建Unity项目，讲解如何使用MRTK，编辑器模拟手势交互，打包VS工程并编译部署应用到HoloLens上等。
C语言视频精讲
20160508C语言作为被长期使用的编程语言，可以被运用到各种操作系统，游戏，开发中。本课程作为互联网上首家使用C99录制的C语言教程，展现了全面、专业。标准的C语言教程。可以帮助学员从基础开始一点点的深刻理解C语言。
软件测试2小时入门
20181010本课程内容系统、全面、简洁、通俗易懂，通过2个多小时的介绍，让大家对软件测试有个系统的理解和认识，具备基本的软件测试理论基础。 主要内容分为5个部分： 1 软件测试概述，了解测试是什么、测试的对象、原则、流程、方法、模型； 2.常用的黑盒测试用例设计方法及示例演示； 3 常用白盒测试用例设计方法及示例演示； 4.自动化测试优缺点、使用范围及示例‘； 5.测试经验谈。
YOLOv3目标检测实战：训练自己的数据集
20190529YOLOv3是一种基于深度学习的端到端实时目标检测方法，以速度快见长。本课程将手把手地教大家使用labelImg标注和使用YOLOv3训练自己的数据集。课程分为三个小项目：足球目标检测（单目标检测）、梅西目标检测（单目标检测）、足球和梅西同时目标检测（两目标检测）。 本课程的YOLOv3使用Darknet，在Ubuntu系统上做项目演示。包括：安装Darknet、给自己的数据集打标签、整理自己的数据集、修改配置文件、训练自己的数据集、测试训练出的网络模型、性能统计(mAP计算和画出PR曲线)和先验框聚类。 Darknet是使用C语言实现的轻型开源深度学习框架，依赖少，可移植性好，值得深入探究。 除本课程《YOLOv3目标检测实战：训练自己的数据集》外，本人推出了有关YOLOv3目标检测的系列课程，请持续关注该系列的其它课程视频，包括： 《YOLOv3目标检测实战：交通标志识别》 《YOLOv3目标检测：原理与源码解析》 《YOLOv3目标检测：网络模型改进方法》 敬请关注并选择学习！
 419.72MB
数模_改进SEIR模型的matlab代码.zip
20200618各种基于SEIR模型的改进算法代码、Python代码，还包含Si，sir，sis的代码，是做数学建模比赛整理的资料，具全，带论文。研究COVID19的传播过程和受感染人数的变化规律，是探索如何制止C
MySQL数据库从入门到实战课
20191231限时福利1：购课进答疑群专享柳峰（刘运强）老师答疑服务。 为什么说每一个程序员都应该学习MySQL？ 根据《20192020年中国开发者调查报告》显示，超83%的开发者都在使用MySQL数据库。 使用量大同时，掌握MySQL早已是运维、DBA的必备技能，甚至部分IT开发岗位也要求对数据库使用和原理有深入的了解和掌握。 学习编程，你可能会犹豫选择 C++ 还是 Java；入门数据科学，你可能会纠结于选择 Python 还是 R；但无论如何， MySQL 都是 IT 从业人员不可或缺的技能！ 【课程设计】 在本课程中，刘运强老师会结合自己十多年来对MySQL的心得体会，通过课程给你分享一条高效的MySQL入门捷径，让学员少走弯路，彻底搞懂MySQL。 本课程包含3大模块： 一、基础篇： 主要以最新的MySQL8.0安装为例帮助学员解决安装与配置MySQL的问题，并对MySQL8.0的新特性做一定介绍，为后续的课程展开做好环境部署。 二、SQL语言篇： 本篇主要讲解SQL语言的四大部分数据查询语言DQL，数据操纵语言DML，数据定义语言DDL，数据控制语言DCL，学会熟练对库表进行增删改查等必备技能。 三、MySQL进阶篇： 本篇可以帮助学员更加高效的管理线上的MySQL数据库；具备MySQL的日常运维能力，语句调优、备份恢复等思路。
 53.88MB
18款表白网页源代码(表白)
2018102618款表白网页源代码(表白),集合了经典款和合并3连款，直接可以发布到网站，来进行表白！！
 11.81MB
c++经典代码大全 清晰版
20111206c++经典代码大全 适合C++新手看的经典代码！！！
 3KB
单车道交通流元胞自动机模型MATLAB
20150410Matlab里写的 基于元胞自动机原理对单道交通流进行仿真
R语言入门基础
20190601本课程旨在帮助学习者快速入门R语言： 课程系统详细地介绍了使用R语言进行数据处理的基本思路和方法。 课程能够帮助初学者快速入门数据处理。 课程通过大量的案例详细地介绍了如何使用R语言进行数据分析和处理 课程操作实际案例教学，通过编写代码演示R语言的基本使用方法和技巧
 2.73MB
pycharm完整教程_全套.pdf
20180429pycharm完整教程_全套.pdf ，祝大家不再因为Python库掉头发
手把手实现Java图书管理系统（附源码）
20200116【超实用课程内容】 本课程演示的是一套基于Java的SSM框架实现的图书管理系统，主要针对计算机相关专业的正在做毕设的学生与需要项目实战练习的java人群。详细介绍了图书管理系统的实现，包括：环境搭建、系统业务、技术实现、项目运行、功能演示、系统扩展等，以通俗易懂的方式，手把手的带你从零开始运行本套图书管理系统，该项目附带全部源码可作为毕设使用。 1. 包含：项目源码、项目文档、数据库脚本、软件工具等所有资料 2. 手把手的带你从零开始部署运行本套系统 3. 该项目附带的源码资料可作为毕设使用 4. 提供技术答疑和远程协助指导 技术实现： 1. 后台框架：Servlet、JSP、JDBC、DbUtils 2. UI界面：EasyUI、jQuery、Ajax 3. 数据库：MySQL 项目截图： 1）系统登陆界面 2）管理系统 3）系统管理 更多Java毕设项目请关注【毕设系列课程】https://edu.csdn.net/lecturer/2104 【课程如何观看？】 PC端：https://edu.csdn.net/course/detail/27513 移动端：CSDN 学院APP（注意不是CSDN APP哦） 本课程为录播课，课程永久有效观看时长，大家可以抓紧时间学习后一起讨论哦~ 【学员专享增值服务】 源码开放 课件、课程案例代码完全开放给你，你可以根据所学知识，自行修改、优化
 3KB
基于元胞自动机的Matlab单双道交通流仿真程序及说明
20120209自己在Matlab里写的 基于元胞自动机原理对单双道交通流进行仿真

博客
【转】Object类的常用方法
【转】Object类的常用方法

博客
Web基础与HTTP协议
Web基础与HTTP协议

博客
C语言函数指针与回调函数使用方法
C语言函数指针与回调函数使用方法

学院
VMware Converter P2V SQL Cluster
VMware Converter P2V SQL Cluster

下载
命令备份和恢复文件夹.doc
命令备份和恢复文件夹.doc

学院
Zabbix 5.2 基础与实践（1）
Zabbix 5.2 基础与实践（1）

学院
Windows批处理教程
Windows批处理教程

学院
Java微服务分布式开发全套课程（含项目+面试）
Java微服务分布式开发全套课程（含项目+面试）

下载
唯美古风彩绘中国风PPT模板
唯美古风彩绘中国风PPT模板

学院
测试的课程1
测试的课程1

下载
2020年考研数学一真题及答案解析
2020年考研数学一真题及答案解析

下载
InfineonTC27xUserManualTC290:TC297:TC298V2.23:3.pdf
InfineonTC27xUserManualTC290:TC297:TC298V2.23:3.pdf

博客
git clone下载代码，解决中途断开下载的方法
git clone下载代码，解决中途断开下载的方法

下载
创意艺术水彩底纹ppt背景
创意艺术水彩底纹ppt背景

博客
Java方法练习
Java方法练习

学院
白帽子黑客与网络安全工程师带你学习：网络攻防与渗透测试面试练习题
白帽子黑客与网络安全工程师带你学习：网络攻防与渗透测试面试练习题

博客
Python错误集锦：for x in range(5) ^ SyntaxError: invalid syntax
Python错误集锦：for x in range(5) ^ SyntaxError: invalid syntax

下载
InfineonTC27xUserManualTC290:TC297:TC298V2.21/3.pdf
InfineonTC27xUserManualTC290:TC297:TC298V2.21/3.pdf

学院
常用的DOS命令
常用的DOS命令

下载
动感褶皱七色彩带PPT背景
动感褶皱七色彩带PPT背景

学院
Tomcat 安装搭建基础实战精讲（大神必经之路）
Tomcat 安装搭建基础实战精讲（大神必经之路）

下载
基础电子中的常用的光耦合器有哪些种类
基础电子中的常用的光耦合器有哪些种类

博客
大数据分析笔记 (7)  时间序列分析(Time Series Analysis)
大数据分析笔记 (7)  时间序列分析(Time Series Analysis)

学院
Laravel Excel 快速入门课程
Laravel Excel 快速入门课程

学院
MySQL8.0数据库基础入门视频教程
MySQL8.0数据库基础入门视频教程

下载
基础电子中的选择什么样的固体录音集成电路好
基础电子中的选择什么样的固体录音集成电路好

下载
基础电子中的如何用简单的方法控制白炽灯的亮度的介绍
基础电子中的如何用简单的方法控制白炽灯的亮度的介绍

学院
AI基础到实战+店面门头设计
AI基础到实战+店面门头设计

学院
类型转换和运算符 视频
类型转换和运算符 视频

博客
Docker 引擎概述
Docker 引擎概述