MasteringTextMiningwithR
TableofContents
MasteringTextMiningwithR
Credits
AbouttheAuthors
AbouttheReviewers
www.PacktPub.com
eBooks,discountoffers,andmore
Whysubscribe?
CustomerFeedback
Preface
Whatthisbookcovers
Whatyouneedforthisbook
Whothisbookisfor
Conventions
Readerfeedback
Customersupport
Downloadingtheexamplecode
Errata
Piracy
Questions
1.StatisticalLinguisticswithR
Probabilitytheoryandbasicstatistics
Probabilityspaceandevent
Theoremofcompoundprobabilities
Conditionalprobability
Bayes'formulaforconditionalprobability
Independentevents
Randomvariables
Discreterandomvariables
Continuousrandomvariables
Probabilityfrequencyfunction
ProbabilitydistributionsusingR
Cumulativedistributionfunction
Jointdistribution
Binomialdistribution
Poissondistribution
Countingoccurrences
Zipf'slaw
Heaps'law
Lexicalrichness
Lexicalvariation
Lexicaldensity
Lexicaloriginality
Lexicalsophistication
Languagemodels
N-grammodels
Markovassumption
HiddenMarkovmodels
Quantitativemethodsinlinguistics
Documenttermmatrix
Inversedocumentfrequency
Wordssimilarityandedit-distancefunctions
Euclideandistance
Cosinesimilarity
Levenshteindistance
Damerau-Levenshteindistance
Hammingdistance
Jaro-Winklerdistance
Measuringreadabilityofatext
Gunningfrogindex
Rpackagesfortextmining
OpenNLP
Rweka
RcmdrPlugin.temis
tm
languageR
koRpus
RKEA
maxent
lsa
Summary
2.ProcessingText
Accessingtextfromdiversesources
Filesystem
PDFdocuments
MicrosoftWorddocuments
HTML
XML
JSON
HTTP
Databases
Processingtextusingregularexpressions
Tokenizationandsegmentation
Wordtokenization
Operationsonadocument-termmatrix
Sentencesegmentation
Normalizingtexts
Lemmatizationandstemming
Stemming
Lemmatization
Synonyms
Lexicaldiversity
Analyselexicaldiversity
Calculatelexicaldiversity
Readability
Automatedreadabilityindex
Languagedetection
Summary
3.CategorizingandTaggingText
Partsofspeechtagging
POStaggingwithRpackages
HiddenMarkovModelsforPOStagging
Basicdefinitionsandnotations
ImplementingHMMs
Viterbiunderflow
Forwardalgorithmunderflow
OpenNLPchunking
Chunktags
Collocationandcontingencytables
Extractingco-occurrences
SurfaceCo-occurrence
Textualco-occurrence
Syntacticco-occurrence
Co-occurrenceinadocument
Quantifyingtherelationbetweenwords
Contingencytables
Detailedanalysisontextualcollocations
Featureextraction
Synonymyandsimilarity
Multiwords,negation,andantonymy
Conceptsimilarity
Pathlength
Resniksimilarity
Linsimilarity
Jiang–Conrathdistance
Summary
4.DimensionalityReduction
Thecurseofdimensionality
Distanceconcentrationandcomputationalinfeasibility
Dimensionalityreduction