在变异图谱上绘制整个DNA序列资源-CSDN文库

73 浏览量 2021-03-06 00:56:42 上传评论收藏 994KB PDF 举报

整个DNA序列与大数据流自然相关，对整个DNA序列进行分类和可视化是一项艰巨的任务。本文提出了一种新的全DNA序列作图方法，并采用一种特殊的作图方案将整个DNA序列作为多个二维统计概率图进行转移。从南美（夜莺（Aotus Nancymaae））的夜猴物种中选出一个样本，从相关地图上观察到有趣的模式。 ### 在变异图谱上绘制整个DNA序列的关键知识点 #### 一、引言随着基因组学的迅速发展，大量的DNA序列数据不断涌现，这为生物学研究提供了丰富的资源。然而，如何有效地处理这些庞大的数据流，对其进行分类和识别，特别是在整个序列层面上，成为了一个挑战性的任务。本文介绍了一种新的全DNA序列映射方法，该方法能够通过特殊的映射方案将整个DNA序列转换成多个二维统计概率图，进而实现对其有效的分析和可视化。 #### 二、基因序列映射方法 ##### 1. 映射方法概述本文提出的映射方法是一种创新的技术，它不仅适用于基因序列的数据处理，还能有效支持后续的分析工作。该方法的核心在于利用特殊的映射方案将DNA序列转换成二维统计概率图。这种方法能够更直观地展示序列中的特征，并有助于发现潜在的模式和结构。 ##### 2. 特殊映射方案特殊映射方案是本文提出的全DNA序列映射方法的关键技术之一。该方案通过将DNA序列中的每一个核苷酸映射到一个二维空间中的特定位置，形成一个统计概率图。这些图可以反映出序列中的各种特性，如重复序列、突变位点等。此外，通过比较不同样本的统计概率图，还可以揭示出它们之间的相似性和差异性。 #### 三、序列模型与变异图谱 ##### 1. 序列模型为了更好地理解和分析DNA序列，研究人员通常会构建相应的序列模型。序列模型可以帮助理解序列的结构和功能，尤其是在预测未知序列的功能时非常重要。在本文中提到的映射方法也建立在这样的序列模型之上，通过对序列的结构进行建模，可以更准确地将其转化为可视化的形式。 ##### 2. 变异图谱变异图谱是一种新兴的技术手段，用于处理DNA序列中的四个基本符号（即A、T、C、G）作为元结构来处理随机序列。这种图谱可以捕捉到序列中的变异情况，并以图形化的方式呈现出来。通过变异图谱，可以更容易地识别出序列中的关键变异点，这对于理解生物进化过程中的遗传变异具有重要意义。 #### 四、案例研究：夜猴物种（Aotus Nancymaae）本文选取了来自南美的夜猴物种（Aotus Nancymaae）作为研究对象。通过对该物种的DNA序列应用本文提出的映射方法，研究人员发现了许多有趣的模式。这些模式不仅有助于理解该物种的基因组结构，还可能揭示出其独特的生物学特性。 ##### 1. 观察到的模式通过对夜猴物种的DNA序列进行映射并生成相应的变异图谱，研究人员观察到了一些明显的模式。例如，某些区域的重复序列比其他区域更多，或者在某些特定位置上出现了频繁的变异。这些模式对于理解该物种的进化历史以及其与其他物种的关系具有重要意义。 ##### 2. 结果分析通过对观察到的模式进行深入分析，研究人员可以进一步了解夜猴物种的遗传特征。例如，通过比较不同个体间的变异图谱，可以探索这些变异是如何影响该物种的适应性和生存能力的。此外，这些结果还可能对未来的保护生物学研究提供有价值的参考。 #### 五、结论本文提出了一种新的全DNA序列映射方法，该方法通过特殊的映射方案将整个DNA序列转换成多个二维统计概率图。通过对南美夜猴物种（Aotus Nancymaae）的应用案例分析，展示了这种方法的有效性和实用性。未来的研究可以进一步探索该方法在其他物种上的应用潜力，以及如何通过优化映射方案来提高分析的准确性和效率。

资源推荐

资源详情

资源评论

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies

are not made or distributed for profit or commercial advantage and

that copies bear this notice and the full citation on the first page.

Copyrights for components of this work owned by others than the

author(s) must be honored. Abstracting with credit is permitted. To

copy otherwise, or republish, to post on servers or to redistribute to

lists, requires prior specific permission and/or a fee. Request

permissions from Permissions@acm.org.

ASONAM '17, July 31-August 03, 2017, Sydney, Australia

licensed to ACM.

ACM ISBN 978-1-4503-4993-2/17/07…$15.00

http://dx.doi.org/10.1145/3110025.3110140

Mapping Whole DNA Sequence on Variant Maps

Yuyuan Mao, Jeffrey Zheng, Wenjia Liu

School of Software, Yunnan University

Kunming, China

yujemao@qq.com, conjugatelogic@yahoo.com

Abstract— Whole DNA sequence is naturally related to big

data streams, it is a challenge task to make a classification and

visualization for whole DNA sequences. In this paper, a new

mapping method for whole DNA sequence is proposed, and a

special mapping scheme is used to transfer a whole DNA sequence

as multiple 2D statistical probability maps. A sample case is

selected from a night monkey species from south America (Aotus

Nancymaae), interesting patterns are observed from relevant

maps.

Keywords— Gene sequence, Aotus Nancymaae, mapping

method, sequential model, variant map

I. INTRODUCTION

In modern biologics, DNA sequences are sequencing from

wider species from human to simple cells in DNA data banks as

big data streams. It is difficult to process various DNA streams

for classification and identification on various species from

whole sequences. The main task of present genomic research is

to obtain more biological information by processing and

analyzing of the DNA sequence from multi angles and multi-

levels. In recent years, the processing and utilization of

biological gene data is being carried out in a variety of ways,

such as gene feature extraction, gene sequence location and so

on.

Variant map is an emerging technology to handle four

symbols as meta structure to process random sequences from

cryptographic sequences, DNA sequences to ECG signals.

Multiple statistical probability distributions are generated from

selected sequences to form 2D-3D visual maps in representation.

This scheme makes whole data sequences more compact and

effectively visualized, and mapping results may be useful to

explore non-linear complex behaviors of whole genomics.

In this paper, a special scheme is proposed to show a series

of mapping results from a selected gene sequence of a Aotus

Nancymaae.

II. PROCESS MODEL

A. Architecture

The architecture of the process model is shown in Figure

1(a) The process model consists of five parts: input, processing,

measurement, projection and output. There are three modules:

Processing, Measurement and Projection.

Input: A DNA sequence

Output: A 2D map

Modules: Processing, Measurement, Projection

Process: From a selected DNA sequence, multiple segments

are divided by a fixed length m on the whole sequence

sequentially in Processing module. Each segment needs to

count four symbols: {A, C, G, T} in the segment to transfer all

segments into a measuring sequence of four measures in

Measurement module. A special combination on X: {AT} and

Y: {AG} is selected to determine four measures in a projection

position and the whole measuring sequence projected to be a

2D map in Projection module.

B. Processing module

From an input DNA sequence, multiple segments can be

separated by a fixed length m to generate a sequence of

segments.

Input: a DNA sequence

Output: a sequence of segments

C. Measurement module

In this module shown in Figure 1(b), each segment counts

four numbers of {A, G, C, T} in each proportions respectively.

As the result, each count is an integer number between 0 and m

to transfer a segment sequence into a measuring sequence of

four measures.

Input: a sequence of segments

Output: a sequence of four measures

D. Projection module

The projection module is shown in Figure 1(c) as two units:

Position and Projecting. For each four measures, two axis

positions are determined by X(AT) and Y(AG) respectively.

When all measures are processed, a 2D histogram is established

as a statistical distribution as a 2D map.

Input: a sequence of four measures

Output: a 2D map

(a)

Processing

Measurement

Projection

Input: {A

DNA

sequence}

Output: {A

2D map}

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余3页未读，立即下载

评论收藏

内容反馈

weixin_38713393

粉丝: 8
资源: 878

在变异图谱上绘制整个DNA序列

变异图谱，用于鉴定选自多种物种的基因组的编码和非编码DNA序列

中国芒叶绿体DNA trnL-F序列变异及遗传结构分析

DnaFeaturesViewer:用于绘制DNA序列特征的Python库（例如，来自Genbank文件）

共有及变异峰率双指标序列法分析两种不同来源千斤拔红外指纹图谱 (2010年)

基于SPDP的DNA限制性图谱绘制过程中的算法

sequencher DNA序列比对分析软件

2000年数模竞赛中40个人工序列与182个自然序列

DNAStar 软件，实用

DNAstar的介绍及使用PPT课件.pptx

Genetyx_Version6-2

考研复习资料2004年重庆大学分子生物学试题.doc

生物信息学辞典

chimp_human_dna:2015年ARJ论文“ BLASTN算法最新版本中记录的异常以及使用Nummer和LASTZ对黑猩猩和人类全基因组DNA相似性的完整重新分析”中使用的代码”

分子生物学试题库.doc

DNAMAN.rar

遗传学考试题收集.pdf

生物必修2第3-4章基础测试题(无答案).pdf

外显子组测序数据分析流程.pdf

苹果果实含酸量遗传特性的SSR标记分析

genetic algorithm

基因芯片技术在生物研究中的应用进展.pdf

Rana, Restriction Analysis Libraries-开源

RADscripts:限制性位点相关DNA测序（RAD-seq）分析的脚本

Genomic_Analyses:R主要用于从多个来源获得的模糊数据集的分析，序列数据分析和复杂数据集的可视化

广东省天河区重点高中2018高考生物一轮复习专项检测试题35

chrom_plot

新高考2021届高考生物小题必练11伴性遗传与人类遗传病202104211125

1.生物医学大数据概述-刘雷-20160914.pptx

uce_ri_pipeline:用于比较河岛植物地理学项目的UCE数据汇编中使用的配置文件和代码

最新资源