霍夫曼树法在蛋白质序列二维图形表示中的应用_展示蛋白质中氨基酸位置资源-CSDN文库

153 浏览量 2021-03-28 11:18:18 上传评论收藏 668KB PDF 举报

霍夫曼树法在生物信息学中，尤其在蛋白质序列的二维图形表示方面，扮演着重要角色。蛋白质序列分析是生物信息学中的一个重要研究领域，它通过不同的表示方法来帮助科学家理解蛋白质的结构、功能和进化。霍夫曼编码，作为一种有效的数据压缩技术，通过为不同元素分配不等长的编码，使得出现频率较高的元素拥有较短的编码，而频率较低的元素拥有较长的编码。这种编码方式可以应用于蛋白质序列的表示，进而转化为二维图形，帮助研究人员从图形中解读序列信息。蛋白质序列通常由20种标准的氨基酸组成，而每种氨基酸的出现频率在不同的蛋白质序列中是不同的。霍夫曼树法首先根据这些氨基酸出现的频率，通过构建霍夫曼树来生成氨基酸的0-1编码。构建霍夫曼树的过程是一个不断合并最低频率节点的过程，直到所有节点合并成一棵树。在树中，每条从根到叶的路径代表一个氨基酸，路径上的0和1则对应于该氨基酸的编码。二维图形表示则是将一维的蛋白质序列按照某种规则映射到二维平面上，使得其图形表现能够反映蛋白质序列的某些特征。霍夫曼树法在这里是作为映射规则的一部分。通过将氨基酸的霍夫曼编码转换成图形元素，研究者可以将长的蛋白质序列用二维图形的方式表示出来。文章中提到的霍夫曼树法在二维图形表示中的应用展示了这一方法在分析蛋白质序列和整个基因组进化模式方面的潜力。具体来说，作者应用该方法于ND5基因和大肠杆菌七株菌株的分析。通过这种方法，研究者能够展示蛋白质序列中的进化关系，从而获得从蛋白质序列和完整基因组中确定的进化模式的新见解。文章的发表具有重要意义，因为它不仅展示了霍夫曼树法在处理蛋白质序列信息时的效用，也表明了通过这种图形表示可以揭示序列间的关系和进化过程。此外，文章的发表机构是Elsevier，这是一本国际知名的科学和医学出版机构，所发表的研究成果通常具有较高的学术权威性。文章中所涉及的关键词，如“蛋白质序列”、“图形表示”、“序列分析”、“大肠杆菌”、“霍夫曼树”等，揭示了研究的主要方向和应用范围。文章提到的研究成果是在2012年发表的，那时正是生物信息学领域快速发展的阶段，对生物序列分析的需求日益增加。通过霍夫曼树法将蛋白质序列转化为二维图形表示，不仅有助于加深对蛋白质结构和功能的理解，而且对疾病的诊断和新药的开发也可能有所助益。随着生物信息学和计算机科学的不断进步，未来通过类似的算法和模型来处理和分析生物序列数据，相信会有更多的发现和应用。

资源推荐

资源详情

资源评论

This article appeared in a journal published by Elsevier. The attached

copy is furnished to the author for internal non-commercial research

and education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling or

licensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of the

article (e.g. in Word or Tex form) to their personal website or

institutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies are

encouraged to visit:

http://www.elsevier.com/copyright

Author's personal copy

Application of 2D graphic representation of protein sequence based on

Huffman tree method

Zhao-Hui Qi

, Jun Feng

, Xiao-Qin Qi

, Ling Li

College of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang, Hebei 050043, People’s Republic of China

Basic Courses Department, Zhejiang Shuren University, Hangzhou, Zhejiang 310015, People’s Republic of China

article info

Article history:

Received 13 May 2011

Accepted 30 January 2012

Keywords:

Protein sequence

Graphic representation

Sequence analysis

Escherichia coli

Huffman tree

abstract

Based on Huffman tree method, we propose a new 2D graphic representation of protein sequence. This

representation can completely avoid loss of information in the transfer of data from a protein sequence

to its graphic representation. The method consists of two parts. One is about the 0–1 codes of 20 amino

acids by Huffman tree with amino acid frequency. The amino acid frequency is deﬁned as the statistical

number of an amino acid in the ana lyzed protein sequences. The other is about the 2D graphic

representation of protein sequence based on the 0–1 codes. Then the applications of the method on ten

ND5 genes and seven Escherichia coli strains are presented in detail. The results show that the proposed

model may provide us with some new sights to understand the evolution patterns determined from

protein sequences and complete genomes.

1. Introduction

The rapid growth of biological sequence such as DNA and

protein has created many challenges for bioscientists. Facing the

explosive growth of DNA and protein sequences, experimental,

mathematical and graphic approaches have been employed to

study the structure, function, evolution and attribution [1] of

these sequences.

Graphic techniques have emergedasapowerfultoolforthe

analysis and visualization of long biology sequences. The advantage

of graphic representations of biology sequences is that they provide

a simple way of viewing, sorting, and comparing various gene

structures, helping in recognizing major differences among similar

DNA and protein sequences. Graphical method for visualizing DNA

sequence is early proposed by Hamori in 1983 [2]. Afterwards,

Hamori [3] and Jeffrey [4] considered two other graphical repre-

sentation methods of DNA sequences. The original plot of a DNA

sequence as a random walk on a 2D grid using the four cardinal

directions to represent the four bases A (adenine), G (guanine),

T (thymine) and C (cytosine) was done by Gates [5],Nandy[6] and

Leong and Morgenthaler [7]. In recent ten years, some authors such

as Bielinska-Waz [8,9], Randic

[10–13], Jaklic [14],Novic[15] and

Qi [16–18], also presented their graphical representations. These

graphical methods visualizing DNA sequences provide useful

insights into local and global characteristics along a sequence, which

are not easily observed from DNA sequences. In recent two

references, Randic

et al. [19] and Gho sh and Nan dy [20], authors

gave more detailed introduction about graphical methods visualiz-

ing DNA sequences. Readers can ﬁnd more detailed accounts of

various graphical representation of DNA.

Compared with the graphical representation of DNA, the ﬁrst

graphical representation of proteins was published in 2004 [21].It

assumes a unique correspondence between one selected collections

of 20 nucleotide triplets and the 20 amino acids, which they

represent. The Virtual Genetic Code converts a protein sequence

into a hypothetical DNA sequence, and allows one to use available

graphical representations of DNA to generate a graphical represen-

tation for proteins [19]. Then some novel graphical approaches were

developed for graphical representation of proteins that allow a

direct representation of proteins [22,23]. In addition, to reﬂect the

difference among 20 natural amino acids, some graphic representa-

tions of proteins consider more physicochemical properties. For

example, Chou et al. [24] proposed a 2D representation method,

‘wenxiang diagram’, to characterize the disposition of hydrophobic

and hydrophilic residue. Wen and Zhang [25] proposes a 2D graphic

representation based on the pKa values of different amino acids. Wu

et al.

[26] build up a web-server for creating graphic representation

of protein sequences by two different physicochemical properties of

their constituent amino acids.

In the present study, we propose a new 2D graphic represen-

tation of protein sequence based on the 0–1 codes of 20 amino

acids from Huffman tree. The 0–1 codes of 20 amino acids based

on Huffman tree can provide an approach with a compression

to represent protein sequences by binary unit. Further, the use of

Contents lists available at SciVerse ScienceDirect

journal homepage: www.elsevier.com/locate/cbm

Computers in Biology and Medicine

doi:10.1016/j.compbiomed.2012.01.011

Corresponding author.

E-mail address: zhqi_yh2004@yahoo.com.cn (Z.-H. Qi).

Computers in Biology and Medicine 42 (2012) 556–563

剩余8页未读，继续阅读

评论收藏

内容反馈

weixin_38616435

粉丝: 1
资源: 908

霍夫曼树法在蛋白质序列二维图形表示中的应用

蛋白质序列的新型二维图形表示及其应用

蛋白质的新颖图形表示及其应用

基于格雷码的蛋白质序列的3D图形表示

UC曲线：蛋白质序列的高度紧凑的2D图形表示

霍夫曼树编码解码

霍夫曼树实现文件解压缩

霍夫曼编码_针对图像_二维

霍夫曼树和堆排序

霍夫曼树和二叉搜索树代码实现

霍夫曼树实现编码解码C语言实现

霍夫曼树代码

用MFC图形界面写的一个霍夫曼树的例子

数据结构霍夫曼树的课程设计

用C++实现霍夫曼树编码

霍夫曼树——创建霍夫曼数及其部分应用

霍夫曼树数据结构课程设计

霍夫曼树动态构造及霍夫曼编码

[实例]利用霍夫曼树获得霍夫曼编码并进行加密和解密

用霍夫曼树实现的文本压缩

C语言实现的霍夫曼树.rar

基于霍夫曼树的稀疏数据结构.pptx

最小堆编程构造霍夫曼树

霍夫曼树算法

hfm.zip_霍夫曼树_霍夫曼编码

HuffTree(霍夫曼树)

01数据结构上机测试二又树应用霍夫曼编码.txt

霍夫曼树在数据压缩中的应用.pptx

最新资源