NLP Tools for
Biology Literature Mining
Qiaozhu Mei
Jing Jiang
ChengXiang Zhai
Nov 3, 2004
What do we have?
Biology Literature (huge amount of text)
E.g. Mites in the genus Varroa are the primary
parasites of honey bees … Ten of 22 transfer
RNAs are in different locations relative to hard
ticks, and the 12S ribosomal RNA subunit is
inverted and separated from the 16S rRNA by a
novel non-coding region, a trait not yet seen in
other arthropods. … (from
Biological Abstracts
)
What do we want?
Named entities:
gene names, protein names, drugs, etc.
Interaction events between entities:
transcription, translation, post translational
modification, etc.
Relationships between basic events:
caused by, inhibited by, etc.
(from Hirschman
et al.
02)
Preliminary System Structure
Pre-processed data ready to mine
POS Tagger Parser Entity Extractor
…
Collections of raw textual data
Genes, proteins, other entitiesNouns, Verbs, etc.
NPs, VPs, Relations
…
Text Pre-processing: NLP
Text Mining Modules: TM
POS Taggers
Tree Tagger
Brill Tagger
SNoW Tagger
LT Chunk
Stanford Tagger
评论0