# generand
Generate random genomic data in FASTA/FASTQ, SAM, or VCF format, output
to the standard output.
Output data is completely random and suitable for basic testing and
benchmarking of new programs, or for quickly generating small samples for
academic purposes.
Generand is especially useful for generating large test inputs on-the-fly
so that they need not be manually generated and/or stored.
Currently the randomization is very simplistic, but we will likely add
paramaters in the future to allow generating more realistic data for specific
situations.
# Usage
generand fasta sequences sequence-length
generand fastq sequences sequence-length
generand sam chromosomes alignments-per-chromosome sequence-length
generand vcf chromosomes calls-per-chromosome samples
# Description
generand fast[aq] sequences sequence-length
generates a FASTA or FASTQ stream of
"sequences" sequences, each of length "sequence-length". The sequence
content is random with a uniform distribution of bases, so that GC content
should be very close to 50%.
PHRED scores in FASTQ streams are generated in blocks of equal scores and
are mostly high-quality. The last few scores are lower quality and
independent to simulate Illumina sequencing, where quality tends to drop
near the end of each read.
generand sam chromosomes alignments-per-chromosome sequence-length
generates a SAM stream with chromosomes * alignments-per-chromosome total
alignments. It outputs increasing indexes for QNAME and CHROM, randomly
increasing POS, random QUAL scores, and random sequences and PHRED scores
as stated for FASTQ above.
generand vcf chromosomes calls-per-chromosome samples
generates a VCF stream with chromosomes * calls-per-chromosome calls.
It outputs chromosomes with increasing indexes, randomly increasing POS,
uniformly random REF and ALT, uniformly random QUAL scores, and random
sample columns including GT (genotype), AD (allelic depth) and DP (depth).
REF counts are always >= ALT counts in the AD data and DP = REF count + ALT
count.

迷荆
- 粉丝: 67
- 资源: 4720
最新资源
- 538114a36f4815de38d10f977a2e7219.pdf
- mermaid代码转图片工具
- 基于PCA主成分分析的BP神经网络回归预测MATLAB代码详解-初学者上手指南,基于PCA主成分分析的BP神经网络回归预测MATLAB代码详解:数据预处理、KMO验证及神经网络预测,基于PCA主成分
- 基于分布式驱动电动汽车的路面附着系数估计:无迹与容积卡尔曼滤波方法的高效精准估算,基于分布式驱动电动汽车的路面附着系数估计:无迹与容积卡尔曼滤波方法的高效精准估算,基于分布式驱动电动汽车的路面附着系数
- CloudCompare版本v2.13完整源码
- 基于Python的Django-vue基于大数据技术的智慧居家养老服务平台源码-说明文档-演示视频.zip
- 基于TimeNet与TSMixer的先进时间序列预测模型:创新、优化与多变量处理的最佳选择,标题:TimesNet与TSMixer融合的先进时间序列预测模型:创新、高效且潜力无穷的预测新范式,Time
- 粒子群算法PSO优化随机森林RFR回归预测MATLAB代码:EXCEL数据读取与代码解析适用于初学者上手实践,教程粒子群算法(PSO)优化随机森林(RFR)的回归预测MATLAB代码,注释清楚+读
- Xray主动扫描报告1.html
- MYDB技术文档.zip
- 基于Python的Django-vue基于数据可视化的智慧社区内网平台设计与实现源码-说明文档-演示视频.zip
- 3月3日版代码-first-web.rar
- COMSOL多物理场耦合在瓦斯抽采中的应用案例研究:从理论模型到实践探索(涵盖钻孔瓦斯抽采、顺层抽采等),COMSOL瓦斯抽采案例:多物理场耦合的数值模拟与工程实践研究,涉及钻孔瓦斯抽采模型、复杂热流
- 基于Python的Django-vue基于协同过滤的儿童图书推荐系统实现源码-说明文档-演示视频.zip
- WordPress主题:Haida多功能响应式WordPress高级主题1.3.6最新版.zip
- 64位 WPS 支持的VBA插件
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



评论0