遗传学、基因组学，蛋白质组学和生物信息学百科全书_第一届全国基因组信息学大会资源-CSDN文库

共2个文件

pdf：2个

生命科学

生物信息学

基因组学

蛋白质组学

需积分: 50 125 浏览量 2018-05-30 11:30:25 上传评论 1 收藏 35.18MB 7Z 举报

资源推荐

资源详情

资源评论

收起资源包目录

遗传学、基因组学、蛋白质组学和生物信息学百科全书.7z （2个子文件）

__MACOSX

._遗传学、基因组学、蛋白质组学和生物信息学百科全书 4000页彩色版.pdf 1009B

遗传学、基因组学、蛋白质组学和生物信息学百科全书 4000页彩色版.pdf 42.84MB

Introductory Review

Population genomics: patterns

of genetic variation within

populations

Greg Gibson

North Carolina State University, Raleigh, NC, USA

1. Polymorphism

Polymorphism at the nucleotide level ranges over at least an order of magnitude

within species, and average polymorphism ranges over two orders of magnitude

between species. Homo sapiens is among the least polymorphic of all species,

with a heterozygous single nucleotide polymorphism (SNP) generally occurring

once every 500 to 1000 bp (International SNP Map Working Group, 2001). By

contrast, marine invertebrates such as the sea squirt and echinoderms have an

astonishing level of sequence diversity with a SNP every 5 to 10 bp (Dehal et al .,

2002). Diversity is a function of organism-level factors such as population size,

generation time, and breeding structure (Aquadro et al., 2001), but variation within

and a mong chromosomes signiﬁes that recombination and mutation rates are also

critical (Begun and Aquadro, 1992; Charlesworth et al., 1995). In most species,

centromeric and telomeric regions are less recombinogenic, hence have smaller

effective population sizes, and tend to be less polymorphic (Nachman, 2002). Even

within a locus, polymorphism can vary over an order of magnitude, according

primarily to functional constraint: synonymous substitution rates tend to be uniform,

whereas replacements can be excluded from highly conserved domains. Noncoding

gene sequences are typically more polymorphic than exons and less polymorphic

than intergenic DNA, but core regulatory sequences up to several hundred basepairs

in length may often be the most conserved of all sequences (Wray et al ., 2003).

Signiﬁcant disparity between two measures of polymorphism, namely, the num-

ber of segregating sites a nd the average heterozygosity, provides evidence for

departure from “neutrality” (Hudson et al., 1987; Kreitman, 2000). However, neu-

trality comes in many ﬂavors, and demographic processes are just as likely to

affect the difference between these two measures as is selection (Nielsen, 2001).

Heterozygosity is a function of allele frequency as well as density, so unexpectedly

high or low numbers of heterozygotes relative to the number of SNPs in a popula-

tion can arise as a result of several processes that may be superimposed on random

drift. Thus, rapid population expansion or strong purifying selection both reduce

2 Genetic Variation and Evolution

heterozygosity, whereas admixture or balancing selection will increase heterozy-

gosity. Tests such as Tajima’s D (Tajima, 1989) have remained useful descriptors

of diversity, but have been joined by a new series of tests that are more ﬁrmly

rooted in coalescent theory (Wall and Hudson, 2001). Rather than strictly interpret-

ing test scores relative to theoretical expectations, comparison of the distribution of

test scores across tens or hundreds of loci among species emphasizes that diversity

is affected by a complex interplay of factors and that it is the location of a gene

at either extreme of the continuum that marks it a s a candidate target of selection,

rather than a p-value per se (Hey, 1999; Bustamante et al ., 2002).

A trend toward empirical evaluation of signiﬁcance by permutation in light

of genomic data is also seen in relation to population structure. Standard

F -statistics introduced by Sewall Wright based on differences in genotype frequen-

cies among populations (Weir and Hill, 2002) have been extended into an analysis

of molecular variance (AMOVA) framework, one popular implementation of which

is the Arlequin software (Schneider et al ., 2000). Estimates of SNP, indel, hap-

lotype, or microsatellite allele frequency differences are sensitive to sample size,

so samples of at least 100 individuals per population are recommended. Using

genomic data, the multiple comparison issue also arises: in a set of 500 sites, a sin-

gle site with a testwise p-value of 0.0001 is not unexpected, but in a large sample

this may correspond to an allele frequency difference of just 10%. Consequently,

population structure is best estimated from multilocus data. For example, Pritchard

et al . (2000) have introduced Bayesian statistics to assign individuals to likely sub-

populations with numerous applications in evolutionary, conservation, quantitative,

and human genetics. It is well known that over 90% of all human polymorphism

is common to all populations, but the ability to genotype hundreds of loci has

led to the recognition that given sufﬁcient data there is a detectable signature of

demographic history even in our species (Rosenberg et al., 2002). Similarly, long-

held assumptions of panmixia in Drosophila melanogaster are being challenged

by deeper sampling (Glinka et al ., 2003), as are commonly held notions about the

genetic uniformity of crops such as maize (Matsuoka et al., 2002), and in fact the

power to discriminate population structure in most species will have a profound

impact on quantitative biology. An important implication of the ability to detect

population structure is inference of departure from neutrality, by comparison of

the observed F -statistics with those obtained from a collection of assumed neutral

markers (Lewontin and Krakauer, 1973; Rockman et al ., 2003).

The advent of new sequencing and genotyping technologies will only accel-

erate the data-driven nature of evolutionary genetic research (see Article 7, Sin-

gle molecule array-based sequencing, Volume 3). ABI 3730 automated DNA

sequencing machines routinely generate traces with over 1 kb of high-quality

sequence and have a throughput capacity exceeding 1 Mb per day. Single-molecule

sequencing methods are expected to make the sequencing of complete eukaryotic

genomes for $1000 each a reality, possibly in the next decade (Meldrum, 2000),

while massively parallel resequencing by hybridization to wafers of tiled oligonu-

cleotides has a lready been used to characterize polymorphism between primate

species (Frazer et al., 2003). Such studies have identiﬁed hundreds of loci that are

candidates for the adaptive evolution in the recent human lineage, some of which are

likely to contribute to the etiology of common disease (Tishkoff and Verrilli, 2003;

Introductory Review 3

Clark et al., 2003). Molecular evolutionary studies of single genes in samples of 30

individuals have been typical but will soon be dwarfed by genome-scale sampling,

and increasingly, attention will be placed on the efﬁcient sampling design and for-

mulation of hypotheses that utilize patterns of variation across the genome to inter-

pret unusual patterns of variation at focal loci. Describing the variance of standard

population-genetic parameters at a genome-wide scale is unprecedented territory,

and developing approaches to quantify this variation across these expansive con-

tiguous regions is the challenge for the near future. This type of data will also allow

reexamination of some of the most basic assumptions underlying many population-

genetic approaches, such as the inﬁnite sites and island migration models.

2. Recombination and linkage disequilibrium

Recombination and mutation are the two biochemical processes that inﬂuence

the distribution of molecular variation. Recombination can be directly measured

by monitoring the coinheritance of markers transmitted from parent to offspring,

but with the exception of technically demanding single sperm typing (Jeffreys

et al ., 2000); the resolution of this method is of the order of just centimorgans

or hundreds of kilobases. Since an important consequence of recombination is its

effect on linkage disequilibrium over scales from tens of bases to tens of kilo-

bases, indirect methods for measuring recombination have been introduced based

on population-genetic measurement of the cosegregation of markers (Hudson and

Kaplan, 1985; Stumpf and McVean, 2003). Linkage disequilibrium (LD) is the

nonrandom assortment of genetic markers: given two alleles each at a frequency

of 20%, just 4% of individual chromosomes should have both alleles if a ssortment

is random, but physically adjacent markers will often cosegregate more often. In

this case, the maximum possible LD would have 20% of the chromosomes with

both less c ommon alleles, and 80% with both common alleles. Two commonly

used statistics measure this departure from randomness, D



and r

, only the latter

of which explicitly takes allele frequencies into account (Hill and Robertson, 1966;

Weir, 1996). A further technical challenge in the measurement of LD is establishing

the linkage phase of double heterozygotes, which can be addressed directly by

studying trios of parents and their offspring (which is however impractical for many

species) or computationally with EM likelihood algorithms (Fallin and Schork,

2000; Stephens et al ., 2001).

Quantitative geneticists have long been interested in LD because detection of

association between markers and phenotypes is dependent on LD between anony-

mous markers and the causative disease or quantitative trait nucleotide(s) (Zonder-

van and Cardon, 2004). This idea has given rise to the human HapMap project,

which is an effort to describe the complete pattern of haplotypes in the human

genome (International HapMap Consortium, 2003). Haplotypes are sets of multi-

locus alleles, and because of LD they tend to be less common than chance would

predict: there are 32 possible ways that ﬁve biallelic alleles can combine, but typ-

ically just a handful of these will be at any appreciable frequency in a population.

Standard population-genetic theory predicts that LD should decay monotonically

with distance, but at least in the human genome it now appears that there are often

4 Genetic Variation and Evolution

fairly discrete boundaries that deﬁne haplotype blocks that range in length from

10 to 100 kb or more (Gabriel et al., 2002; see also Article 12, Haplotype map-

ping, Volume 3 and Article 74, Finding and using haplotype blocks in candidate

gene association studies, Volume 4). Consequently, while there are in excess of

5 million SNPs in the human genome, there may be as few as 50 000 common

haplotype blocks, and consequently it is argued that a similar number of markers

will be sufﬁcient to perform genome scans for association with disease (Risch and

Merikangas, 1996). According to the common disease–common variant hypothe-

sis, the polymorphisms that contribute to many complex human diseases are likely

to have arisen early in human history, but sufﬁciently recently that they remain

embedded in observable common haplotypes. Similarly, selected phenotypes or

polymorphic traits of interest to evolutionary biologists and ecologists may be due

to nucleotide variants that can be identiﬁed by LD mapping.

There is c onsiderable debate over the reasons for the detection of haplotype

blocks, with explanations ranging from sampling variance to unequal recombination

rates and/or gene conversion hotspots within loci (Wall and Pritchard, 2003; Stumpf

and Goldstein, 2003), and study of the population structure of haplotypes are in

their infancy. With respect to evolutionary and agricultural genetics, measurement

of haplotype structure is increasingly important. Domesticated crops and livestock

are likely to have strong haplotype structure as a result of their breeding history

(Flint-Garcia et al., 2003), whereas outbred and highly polymorphic species such

as Drosophila melanogaster are almost devoid of haplotypes (see Article 10,

Linking DNA to production: the mapping of quantitative trait loci in livestock,

Volume 3). More recent is the advent of population genetics in nonmodel systems

that are important with respect to epidemiology, particularly in humans, such as

HIV and Plasmodium (malaria). The frequency of outcrossing or mixing among

these species may contribute to these organisms’ ability to evade host immunity

(Awadalla, 2003). The ability to dissect quantitative traits to the nucleotide level in

any species is ultimately dependent on the thorough characterization of haplotype

diversity.

3. Mutation, gene content, and the transcriptome

Population genomics also encompasses several novel aspects of variation that were

beyond the technical reach of classical population genetics. For example, direct

measurement of mutation rates is now possible, and will complement a large body

of literature on the genetic consequences of mutation accumulation (Keightley and

Lynch, 2003). For many species, it has been estimated that new genetic variance for

ﬁtness or morphological traits is generated at a rate within an order of magnitude

of 0.1% of the environmental variance per generation (Clayton and Robertson,

1955; Houle et al ., 1996). Similarly, genetic evidence suggests that a typical per

locus spontaneous mutation rate is approximately 10

−6

per generation, from which

nucleotides are inferred to substitute in each meiosis at a rate close to 10

−9

Microsatellites evolve at a much accelerated rate, but with a high variance, as

directly measured by comparison of parent and offspring genotypes in several

studies (Ellegren, 2000). Insertion–deletion (indel) polymorphism is prevalent,

评论收藏

内容反馈

zyx335588

粉丝: 4
资源: 13

遗传学、基因组学，蛋白质组学和生物信息学百科全书

遗传学、基因组学、蛋白质组学和生物信息学百科全书

bioinformatics and funcutional genomics

《遗传学》戴灼华等第二版课后习题答案.pdf

生物信息学札记（第三版）

生物信息学 （MMB两卷）

iSanXoT:使用SanXoT工作流程定量高通量蛋白质组学的生物信息学框架

大数据-算法-定量蛋白质组学与生物信息学结合研省略R信号通路以及MyD88的复合物.pdf

蛋白质组学（生物信息学的概念及其发展历史）.ppt

遗传学朱军第三版第章基因工程和基因组学ppt课件.ppt

遗传学基因工程考试题.doc

恶性血液病表观遗传基因组学异常及其靶向治疗的研究进展

分子生物学遗传病基因诊断和治疗.ppt

生物信息学工具：用于生物信息学和基因组学研究的有用算法和脚本的集合

基于宏基因组学技术分析污水处理系统中耐药基因和可移动遗传元件分布与丰度

表观遗传学实验手册 表观遗传学实验手册

get_phylomarkers：为微生物系统基因组学，种群遗传学和基因组分类法选择最佳标记的管道

Computational Genome Analysis - 计算基因组分析

数量遗传学导论

戴灼华遗传学课后习题答案.doc

全基因组数据分析软件PLINK在统计遗传学教学中的应用最终版.pdf

基于常染色体显性多囊肾病基因芯片数据的生物信息学分析.pdf

GA.rar_Heredity_RNA GA_基因_生物信息_生物细胞学

遗传学策略在昆虫学中的最新应用

经典遗传学实验的方法分析及应用.ppt

遗传性牙本质疾病致病基因突变谱的生物信息学研究

危重新生儿遗传性疾病快速全基因组测序专家共识.pdf

Vector Davinci官方帮助配置使用手册（AutoSAR）.pdf

c++入门，核心，提高讲义笔记

离散数学及其应用 第八版 奇数编号练习答案.pdf

最新资源

生物信息学（MMB两卷）

表观遗传学实验手册表观遗传学实验手册

离散数学及其应用第八版奇数编号练习答案.pdf