Introduction
RecurrentNeuralNetworksareallaboutlearningsequences.
YoumayhavealreadylearnedaboutMarkovmodelsforsequencemodeling,whichmake
use of the Markov assumption, p{ x(t) | x(t-1), …, x(1) } = p{ x(t) | x(t-1) }. In other
words, the current value depends only on the last value. While easy to train, one can
imagine this may not be very realistic. Ex. the previous word in a sentence is “and”,
what’sthenextword?
Whereas Markov Models are limited by the Markov assumption, Recurrent Neural
Networksarenot.Asaresult,theyaremoreexpressive,andmorepowerfulthananything
we’veseenontasksthatwehaven’tmadeprogressonindecades.
In the first section of the book we are going to add time to our neural networks. I’ll
introduceyoutotheSimpleRecurrentUnit,alsoknownastheElmanunit.
The Simple Recurrent Unit will help you understand the basics of recurrent neural
networks-thetypesoftaskstheycanbeusedfor,howtoconstructtheobjectivefunctions
forthesetasks,andbackpropagationthroughtime.
Wearegoingtorevisitaclassicalneuralnetworkproblem-theXORproblem,butwe’re
going to extend it so that it becomes the parity problem - you’ll see that regular
feedforwardneuralnetworkswillhavetroublesolvingthisproblembutrecurrentnetworks
willworkbecausethekeyistotreattheinputasasequence.
In the next section of the course, we are going to revisit one of the most popular
applicationsofrecurrentneuralnetworks-languagemodeling,whichplaysalargerolein
naturallanguageprocessingorNLP.
Another popular application of neural networks for language is word vectors or word
embeddings.ThemostcommontechniqueforthisiscalledWord2Vec,butI’llshowyou
howrecurrentneuralnetworkscanalsobeusedforcreatingwordvectors.
Inthesectionafter,we’lllookattheverypopularLSTM,orlongshort-termmemoryunit,
andthemoremodernandefficientGRU,orgatedrecurrentunit,whichhasbeenprovento
yieldcomparableperformance.