ncbi_cxx--7_0_0.tar.gz_blast_cxx7_it_ncbi_ncbi_cxx--7_0

版权申诉

blast

ncbi

121 浏览量 2022-09-14 19:35:46 上传评论收藏 17.55MB GZ 举报

共7191个文件

cpp：1948个

hpp：1790个

in：471个

《NCBI C++工具包7.0.0：BLAST的C++实现与深度解析》 NCBI（National Center for Biotechnology Information）是生物信息学领域的重要机构，它提供了丰富的生物序列分析工具，其中最著名的当属BLAST（Basic Local Alignment Search Tool）。本资源为“ncbi_cxx--7_0_0.tar.gz”，包含了NCBI C++工具包的7.0.0版本，这是一个专门用于生物序列比对的强大软件包，主要由C++语言编写，其体积庞大，功能丰富。 BLAST是生物学家进行序列比对的首选工具，它可以快速地寻找两个或多个生物序列之间的相似性。C++实现的BLAST工具包，即BLAST C++，将高效的算法与C++的灵活性相结合，为开发者提供了更底层的控制，从而能够定制化处理特定的生物信息学问题。这个版本7.0.0的发布，无疑带来了最新的优化和改进，可能包括性能提升、新功能添加以及对旧有功能的完善。在C++工具包中，我们可以期待找到以下核心组件： 1. **BLAST引擎**：这是整个工具的核心，负责执行序列比对。它采用了先进的算法，如Smith-Waterman-Gotoh算法和Needleman-Wunsch算法，以确定序列间的最佳匹配。 2. **数据结构**：为了处理大量的生物序列数据，NCBI C++工具包内含了高效的数据结构，如动态编程矩阵和哈希表，以支持快速查询和计算。 3. **搜索策略**：工具包包含了多种搜索策略，如TBLASTN、BLASTP、BLASTX等，针对不同的序列类型（DNA对DNA、DNA对蛋白质、蛋白质对蛋白质）提供最佳的比对方法。 4. **输入/输出接口**：NCBI C++工具包支持多种序列格式的读取和写入，如FASTA、GenBank等，并能处理XML、JSON等结果输出格式，方便后续的数据处理和分析。 5. **可扩展性**：由于是C++实现，用户可以方便地通过继承和多态性来扩展或修改现有功能，以适应特定的研究需求。 6. **并行计算支持**：面对大数据量的序列比对任务，工具包通常会利用多核CPU或GPU的并行计算能力，以提升计算速度。 7. **开发文档**：除了源代码，NCBI还会提供详细的开发文档，帮助用户理解和使用这个工具包，包括API参考、示例代码和教程。对于IT专业人士，尤其是生物信息学领域的开发者来说，理解并掌握NCBI C++工具包7.0.0的使用和开发，不仅可以提升生物序列分析的效率，还可以为个性化研究和应用开发打开新的可能性。在深入研究前，建议先阅读官方文档，了解工具包的架构和使用方法，再逐步探索其内部实现细节，以便更好地利用这个强大的工具。

资源推荐

资源详情

资源评论

收起资源包目录

ncbi_cxx--7_0_0.tar.gz_blast_cxx7_it_ncbi_ncbi_cxx--7_0_0 （7191个子文件）

bzip2.1 16KB

bzmore.1 4KB

bzgrep.1 1KB

bzdiff.1 897B

aa.129295 303B

set.xml.2 47KB

set.ent.2 12KB

Makefile.windowmasker_2.2.22_adapter 403B

zlib.3 4KB

nt.555 698B

configure_dialog._ 189B

configure._ 184B

configure_dialog._ 135B

configure_dialog._ 134B

configure_dialog._ 132B

configure_dialog._ 131B

configure._ 130B

configure._ 129B

configure._ 127B

configure_dialog._ 127B

configure._ 126B

configure_dialog._ 126B

configure._ 126B

configure_dialog._ 126B

configure._ 122B

configure._ 121B

configure.ac 225KB

bad_accessions.agp 413KB

bad_test.agp 11KB

obj_eq_comp_test.agp 1KB

alt_bad_test.agp 1KB

obj_id_OrderNotNumerical.agp 695B

fa_test.agp 558B

wgs_component_id.agp 504B

overlap_test.agp 464B

obj_len_test.agp 370B

space_in_object_id.agp 137B

showdefline-cppunit.aln 792KB

blastfmtutil-cppunit.aln 792KB

README.API 6KB

Makefile.cn3d_nowin.app 2KB

Makefile.blast_unit_test.app 2KB

Makefile.cn3d.app 2KB

Makefile.dbapi_unit_test.app 1KB

Makefile.test_ncbidiag_f_mt.app 1KB

Makefile.oligofar.app 1KB

Makefile.asn_sample.app 1KB

Makefile.test_objmgr_data_mt.app 1KB

Makefile.fadice.app 1KB

Makefile.project_tree_builder.app 1KB

Makefile.test_objmgr_data.app 1008B

Makefile.dbapi_context_test.app 976B

Makefile.pkl2hdf5.app 960B

Makefile.test_mshdf5.app 952B

Makefile.mshdf2mzXML.app 952B

Makefile.hdf5_speed.app 949B

Makefile.grid_cgi_sample.app 925B

Makefile.cgi_sample.app 920B

Makefile.demo_genomic_compart.app 916B

Makefile.demo_score_builder.app 910B

Makefile.datatool.app 903B

Makefile.fcgi_sample.app 895B

Makefile.mzXML2hdf5.app 840B

Makefile.fixMsHdf5.app 839B

Makefile.rcgi_sample.app 835B

Makefile.remote_blast_demo.app 821B

Makefile.blastfmt_unit_test.app 820B

Makefile.test_bdb_cursor.app 815B

Makefile.unit_test_gene_model.app 812B

Makefile.unit_test_idmapper.app 810B

Makefile.asn2asn.app 808B

Makefile.blast_demo.app 800B

Makefile.vsrun_sample.app 792B

Makefile.unit_test_sample.app 775B

Makefile.blast_sample.app 772B

Makefile.splign.app 761B

Makefile.test_serial.app 758B

Makefile.ctl_lang_ftds64.app 752B

Makefile.dbapi_driver_test_ftds_ctlib.app 730B

Makefile.bmfilter.app 728B

Makefile.python_ncbi_dbapi_test.app 724B

Makefile.objmgr_sample.app 720B

Makefile.app 712B

Makefile.unit_test_alt_sample.app 682B

Makefile.testsamrecord.app 677B

Makefile.blast_format_unit_test.app 670B

Makefile.ctl_sp_databases_ftds64.app 667B

Makefile.align_format_unit_test.app 664B

Makefile.fasthello.app 658B

Makefile.basic_sample_lib_test.app 655B

共 7191 条

oligoFAR 3.101 03-NOV-2009 1-NCBI !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! This file may be obsolete and will be removed - see man/oligofar.* for documentation !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! NAME oligoFAR version 3.101 - global alignment of single or paired short reads SYNOPSIS usage: [-hV] [--help[=full|brief|extended]] [-U version] [short-read-options] [-0 qbase] [-d genomedb] [-b snpdb] [-g guidefile] [-v featfile] [-l gilist|-y seqID] [--hash-bitmap-file=file] [-o output] [-O -eumxtdhz] [-B batchsz] [-s 1|2|3] [-k skipPos] [--pass0 hash-options] [--pass1 hash-options] [-a maxamb] [-A maxamb] [-P phrap] [-F dust] [-X xdropoff] [-Y bandhw] [-I idscore] [-M mismscore] [-G gapcore] [-Q gapextscore] [-D minPair[-maxPair]] [-m margin] [-R geometry] [-p cutoff] [-x dropoff] [-u topcnt] [-t toppct] [-L memlimit] [-T +|-] [--NaHSO3=yes|no] where hash-options are: [-w win[/word]] [-N wcnt] [-f wstep] [-r wstart] [-S stride] [-H bits] [-n mism] [-e gaps] [-j ins] [-J del] [-E dist] [--add-splice=pos([min:]max)] [--longest-del=val] [--longest-ins=val] [--max-inserted=val] [--max-deleted=val] and short-read-options are: [-i reads.col] [-1 reads1] [-2 reads2] [-q 0|1|4] [-c yes|no] EXAMPLES oligofar -i pairs.tbl -d contigs.fa -b snpdb.bdb -l gilist -g pairs.guide \ -w 20/12 -B 250000 -H32 -n2 -p90 -D100-500 -m50 -Rp \ -L16G -o output -Omx INPUT FORMAT OPTIONS following combinations of input format and data flags are allowed: 1. with column file: -q0 -i input.col -c no -q1 -i input.col -c no -q0 -i input.col -c yes 2. with fasta or fastq files: -q0 -1 reads1.fa [-2 reads2.fa] -c yes|no -q1 -1 reads1.faq [-2 reads2.faq] -c no 3. with Solexa 4-channel data -q4 -i input.id -1 reads1.prb [-2 reads2.prb] -c no See options and file formats for more info. CHANGES Following parameters are new, have changed or have disappeared in version 3.25: -n, -w, -N, -S, -x, -f, -R in version 3.26: -n, -w, -N, -z, -Z, -D, -m, -S, -x, -f, -k in version 3.27: -n, -w, -e, -H, -S, -a, -A, --pass0, --pass1 in version 3.28: -y, -R, -N in version 3.29: --NaHSO3 (Development) in version 3.91: -X -Y -r -O --NaHSO3 in version 3.98: -x -g -O -B in version 3.100: -v in verison 3.101: -i -1 -2 -q -O DESCRIPTION Performs global alignments of multiple single or paired short reads with noticeable error rate to a genome or to a set of transcripts provided in a blast-db or a fasta file. Reads may be provided as UIPACna base calls, possibly accompanied with phrap scores (referred below as 1-channel quality scores), or as 4-channel Solexa scores. Input file format is described below in section FILE FORMATS. Output of srsearch (referred below as guide-file) or of a similar program which performs exact or nealy exact short read alignment may be used as input for oligoFAR to ignore processing of perfectly matched reads, but format the matches to output in uniform with oligoFAR matches way. Input is processed by batches of size controlled by option -B. Reads to match are hashed (one window (unless option -N is used) per read, preferrably at the 5' end) with a window size controlled by option -w. Option -n controls how many mismatches are allowed within hashed values. Option -a controls how many ambiguous bases withing a window of a read may be hashed independently to mismatches allowed. Low quality 3' ends of the reads may be clipped. Low complexity (controlled by -F argument) and low quality reads may be ignored. OligoFAR may use different implementations of the hash table (see -H): vector (uses a lot of memory, but is faster for big batches) and arraymap (lower memory requirements for smaller batches). For vector -L should always be used and set to large value (GygaBytes). Database is scanned. If database is provided as blastdb, it is possible to limit scan to a number of gis with option -l. If snpdb is provided, all variants of alleles are used to compute hash values, as well as regular IUPACna ambiguities of the sequences in database. Option -A controls maximum number of ambiguities in the same window. Alignments are seeded by hash and may be extended by Smith-Watermann algorithm (unless -X0 or -Y0 is used). Alignments are filtered (see -p option). For paired reads geometrical constraints are applied (reads of the same pair should be mutually oriented according to -R option, distance is set by -D and -m options). Then hits ranked by score (hits of the same score have same rank, best hits have rank 0). Week hits or too repetitive hits are thrown away (see -t and -u options). At the end of each batch both alignments produced by oligoFAR and alignments imported from guide-file which have passed filtering and ranking get printed to output file (if set) or stdout (see FILE FORMATS for output format). NOTE Since it is global alignment tool, independent runs against, say, individual chromosomes and run against full genome will produce different results. To save disk space and computational resources, oligoFAR ranks hits by score and reports only the best hits and ties to the best hits. In the two-pass mode tie hits may be incompletely reported - in this case only hits of same score as the best are guarranteed to appear in output no matter what value of -t is set. Scores of hits reported are in percent to the best score theoretically possible for the reads. Scores of paired hits are sums of individual scores, so they may be as high as 200. PAIRED READS Pairs are looked-up constrained by following requirements: - relative orientation (geometry) which may be set by --geometry or -R (see section OPTIONS subsection ``Filtering and ranking options'') - distance between lowest position of the two reads and highest position of the two reads one should be in range [ $a - $m ; $b + $m ] where $a, $b and $m are arguments of parameters -D $a-$b -m $m. If pair has no hits which comply constraints mentioned above, individual hits for the pair components still will be reported. Also for each component unpaired hits better then the best paired hit will be reported. Paired reads have one ID per pair. Individual reads in this case do not have individual ID, although report provides info which component(s) of the pair produce the hit. SODIUM BISULFITE TREATMENT To discover methylation state of DNA sodium bisulfite curation may be used before producing reads. In order to simulate this procedure oligoFAR has special mode, which may be turned on by: --NaHSO3=true It is advised to use longer words and windows in this mode for better performance. This mode is not compatible with colorspace computations. MULTIPASS MODE By default oligoFAR aligns all reads just once, but if option --pass1 is used, oligoFAR switches to the two-pass mode. Parameters -w, -n, -e, -H, and some other, preceeding --pass1 or following --pass0 affect first run, same parameters when follow --pass1 are for the second run. For the second run only reads (or pairs) having more mismatches or indels then allowed in parameters for the first pass will be hashed and aligned. So using something like: oligofar --pass0 -w22/22 -n0 -e0 --pass1 -w22/13 -n2 -e1 will pick up exact matches first, and then run search with less strict parameters only for those reads which did not have exact hits. WINDOW

评论收藏

内容反馈

版权申诉