三代基因组组装软件

所需积分/C币:3 2018-07-03 17:36:37 6.15MB PDF

MECAT is an ultra-fast Mapping, Error Correction and de novo Assembly Tools for single molecula sequencing (SMRT) reads. MECAT employs novel alignment and error correction algorithms that are much more efficient than the state of art of aligners and error correction tools. MECAT can be used for effe
、三代测序简介( PabLo原理) SMRT(Single molecule,real- time)测序技术 (Pacific Biosciences, USA) 2 页 Aluminum puise A pulse Glass Emission Excitation Timer ●优点 ①长读长(平均14kbp)二代:50-200bp ②无需要PR扩增,不会引入GC偏好 三代测序简介(广泛应用) 基因组组装 DNA甲基化 T C Construct 530 preassembled mA reads NATURE METHODS I ARTICLE Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data Nature Methods 7, 461-465(1 June 2010) doi: 10.1038nmeth 1459 Direct detection of DNA methylation during 705 71.0 71.5 72.0 72.5 73.0 73.5 single-molecule, real-time sequencing Times NATURE BIOTECHNOLOGY RESEARCH I ARTICLE 全长转率组 日本語要豹 A single-molecule long-read survey of the human 复杂区解析 transcriptome Ta eool 740o时4 SDs LETTER doi:101038/ nature1397 i GRCh37 Resolving the complexity of the human genome using single-molecule sequencing H Contig 2(1.2 Mb) SDs 月 里理生1 CH17-2B7A3cH1744F9cH17-334L6 CH1747 cH740419cH717M15CH174012K8 Clone assembly 三代测序简介(计算难题) 错误率高(15% 序列比对重叠区域寻找最耗时 It requires 250, 000 CPU hours to assemble a 54x human genome by pBcR-MHAP Nature Biotechnology (2015 )33: 6, 623 大概20万计算费用 参考基因组比对 两两序列比对 long read(14kbp ong reads Construct Longest preassembled seed reads reads Preassembled reads Assemble to finished Genome genome 二、计算方法设计—序列比对方法 CCTGGTTC-TAGGATGGCAGGCICGTTTCICATTATGGCCHTCGIGCCACACGGITGIGGAATGGCAA 1*∏*||1||||***★*||**★★★*|*||*||*|||||||||| TTCCTGGTCATA-GATGGCA--C-C--TCCATTA T-G-CCICAC-GHTGIGGAATGGCAA 基于BL0CK结构的三代测序序列比对方法 ZZ,Z,Z, Z Ref Z-times compress BL0CK数据结 seed counter 构提出 DLLHHLL candidate positions and seed numbers 基于序列距离差异的种子投票全局打分机制 SL- Sl DE= 1.0 LI SW- SN DF=11-L2 1.0 L1 L2 DF<E(E∈[0,0.5]) Block 1 Block2 Block3 I LR1 Step1: seed LR2 B Block2 Block2 Step2 DDF >0.3 DDF≤0.3 Po|:3 33 D Block 1 Block2 Block3 LR1 Step3: ■seed LR2 DDF≤0.3 Gobal seed voting score: 8 E LR1 Step4 see LR2 Sequence alignment by diff 种子投票全局打分评估: 160 arabidopsis 140 drosophila e yeast- 120 0-0Un00UUE0U000>0 结论:全局打分与 序列重叠长度成 d 正相关 40 20 2000400060008000100001200014000160001800020000 220 global score overlap size 200 local score 180 结论:全局打分可 160 T 以过滤掉2/3局部m 20 100 候选位置。 80 60 40 20 arabidopsis drosophila yeast species 2.1参考基因组序列比对结果评估 Dataset Data size Time(min) BLASR Time(min) BWA-mem Time(min)MECATAREF E coli 6494M 162.2 56G 756,7 40865 A Thaliana 361G 25117 23234 308 D Melanogaster 293G 24063 32445 388 Human 389,2G 26940.0 94175 1047,2 结论: METCAT2REF运行速度是 BAsRA和 BWA-mem的10-70倍 2.1参考基因组序列比对结果评估 Dataset Method SMS reads Mapped Correct Correct mapped count count count length Precision Sensitivity Coverage BLAST 6.634 6.634 660688207.1989958% 9958%0 9953%0 Ecoli BWA-mem 6.634 6634 651186980.84798.15 98.15%98.15 MECAT2ref 6.634 6.634 6.63388.592740999800 9998 99970 BLASR17.38617.38617.315231.112.187 9959999.5909955 Yeast BWa-mem17.3861738616921225918564 97.33%97.33%0 97.32 MECAT17.3861738417.367231880.652 9990%99,89099.9900 BLASR4,422,3504,079,1864,040,51553,9695784299905%91.37%91.31% Human BWA-mem4,422,3504,079,1863,925,31352.454,1096299623%88.76%88.74 MECAT2ref4422.3504.0790214046.19554073.550.5279920%91.490091480 结论:灵敏度和正确率比 BLASR和BWA-mem稍高

...展开详情
    img
    qq_42189072

    关注 私信 TA的资源

    上传资源赚积分,得勋章
    相关内容推荐