Connectionist Temporal Classification: A Tutorial with Gritty Details

所需积分/C币:14 2017-09-14 13:42:54 138KB PDF
收藏 收藏 1

教程:Connectionist Temporal Classification详解补充中文翻译的对应的英文原版教程,链接为:,希望能帮助到大家
This tells that the output we really want is not y nor T. Instead, it is some sequence called label and denoted by l where s< T For above example, all those paths correspond to the same label: l= al Few more examples help reveal the relationship between path T and labels 1 All the following examples correspond to l=h,el hhheeee heeee hh eee hh eee h ee h ee Please note that blanks could appear before, between, and after consequent appearances of“h”(and“e”) A little more complex example is that all the following paths correspond to label bbbeee ee bb ee babe e ot for b eeeeee bbb eeee which correspond to 1=16, e]. This example shows the importance of blank in separating consecutive letters In summary, the output of speech recognition is not alphabet probability dis- tribution y or path T, but label l. And the Ctc approach is all about infer 1 from y, by integrating out. T: P(x)=∑P(lx)P(xx) 3 where 1if丌 matches l 0 otherwise T Dynamic Time Warping Because the length of y might differ from(often longer than) l, so the inference f I from y is actually a dynamic time warping problem There is a dynamic programming algorithm that can solve this time warping problem. And this algorithm enables us to train the rNn using x and I pairs instead of x and y pairs. This is very valuable because there is no need to segment y and align y with x before training When we say time warping, we mean that we want to map each frame yt to some ls. The second example above shows that we actually want to make yt to either a ls or a blank because only if we do so we have the chance to recognize successive letters like“f”and“ee”in“ coffee,and“ee”in“bee Herc is a more dctailcd cxample about this. Suppose that we have an audio clip of"bee. The inost probable path T is bbeeee we would like to wrap it to 1=6 It looks like we can map as breeze |||// but the truth is that, we do not have much informa tion to prevent the algorithm from mapping all e's in t into the first e in I bbeeel ||/// bee And another problem is that to whom should we map those blanks in T? A solution other than to map i to I is to map i to I,, which is constructed by inserting blanks before, after, and into l. For the above example, we have 1={,b,e,e,} and the warping could work as bbeeee \||/|||/ b ee An a little more special case could be that t does not contain leading blanks In this case, it is reasonable to map the first fraIne to "b, instead of the leading f bbbeeee l// a similar edge case is that i does not have padding blanks, therefore no frame should be mapped to the padding blank of 1. These two cases are what we need to care about in designing the algorithm that computes the map from i to l The Forward-Backward algorithm Hcrc we derive the dynamic programming algorithm for computing P(1x which, is notablly equivalent to P(lx), because I' is constructed to be unique for the The Forward Algorithm The above time warping example shows that TT could be Mapped to either l1 the padding blank of l n ort /: whi cording to the construction rule of 1. is the last element, of 1.s (1x)=P(1,丌 丌T We can generalize P(I', tT-lirlx)and P(, tt=liI-1lx)to be P(1x)=a(T,|1)+a(,1-1) Above time warping example also shows that T1 could be mapped to either Z1 the leading blank of I, or l2, the first element of 1. So we have a(1,1)=1, a(1,2) c(1,s)=0,Vs>2 Here let us take an example. Suppose that 1=, h, and a y(which for the simplicity, is illustrated as a sequence of subscripts). It is reasonable to map 1 to li, the leading blank of I,, if arg maxk y1.k 123456789 to 2=h", if arg maxk 123456789 Then we think a step further- mapping 72, or more generally, Ts. Roughly, it is reasonable to map t to 1. where Tt-1 was mapped to, denoted by s(L-1) 2. the element next to Z/ enote s(t-1)+1,or 3.l (t-1)+2 if l' (t-1)+1 Among these, case 3 is reasonable in case that we want to skip that blank (t-1)+1 An example is that when we want to map 丌={h,h,h,h.e,∈,e}toI={,h it is not mandatory to map any Tt to any blank in l, in order to recognize the word"he 6 hhhheee |//|// But case 3 is not reasonable when [/ S(t-1)+2 s(t-1) In this case. we should not skip that blank. For example, to recognize the word "bee, we have 1 b, e,, e,. In these case, if we skip over the blank between the two es in I,, we would misunderstand the double-e as a singel e. In the example below even if frame(ys) sounds more like e than blank, we want to map it to s so to recognize the word"bee bbbeeeeee |||||/// / Summarizing above thrcc cases, we gct the following generalization rulc for a(,4)=2=12(-1,)i2=-om1=12 yt 1,i)otherwise The backward algorithm Similar to the forward variable a(t, s), we can define the backward variable Bt,s)=P T Because the time warping must map the it to either che padding blank with probability 1 or lr_il, the last element in l, with probability 1, we have (7,11)=1 6(T,‖1 Similar to the generalization rule of the forward algorithm, we have 6(t,s ∫∑B(+1,)mnif ∑汁2B(t+1,2y: otherwise he Search Space This general rule for a shows that, to compute a(t, s), we need a(t ct-1, s-1)and c(t-1, 8-2). Similarly, to compute B(l, s), we leed B(L+1, s) B(t+l,s+1),B(t +1, s+2). Some of these values are obviously zero. The following figure(Figure 7. 2 in Alex Graves'Ph D thesis) helps us understand which are zeros ○○○ A ● T -2 T-1 T The search of the fo yard algori Every circle in this figure shows a possible state in the search space. These states are aligned in the grid of t and s. Arrows connect a st ate with its consequent states. These connected states are possible states, whe the rest al and should have zero probability. We do not need to go into the impossible area to the top-right of those connected states when we compute a(t, s) (t,5)-0,Vs<‖l-2(T-t)-1 And we do not need to go into the impossible area to the left-bottom when we ompute B(t, s) 6(t,s)=0,Vs>2t Actually, in order to bound the dynamic programming algorithm, we also need a( t 0. Vt and 6(t.1+1)=0,vt u 1. Please illustrate the search space of the forward-backward algorithm given 1=tb, e, e, like Alex Graves illustrates the case of l=c,a, t with Figure 7. 2 in his ph D. thesis 2. Suppose that, we can motion data collected from a sensor placed on our elbows. The data is a sequence of frames, and each frame records the 3D position and velocity of the sensor. Can we train a CtC network that counts how many push-ups we are doing 3. How can we extend ctc into two dimensional case and use this extended version of ctc for object recognition in computer vision?

试读 9P Connectionist Temporal Classification: A Tutorial with Gritty Details
立即下载 低至0.43元/次 身份认证VIP会员低至7折
    sharpg1f8 nice share
    JackCode nice pdf about ctc classification
    • GitHub

    • 分享精英

    关注 私信 TA的资源
    Connectionist Temporal Classification: A Tutorial with Gritty Details 14积分/C币 立即下载
    Connectionist Temporal Classification: A Tutorial with Gritty Details第1页
    Connectionist Temporal Classification: A Tutorial with Gritty Details第2页
    Connectionist Temporal Classification: A Tutorial with Gritty Details第3页


    14积分/C币 立即下载 >