PrediXcan
=========
PrediXcan is a command-line tool that predicts gene expression from
genotype data and performs gene-based association tests, allowing
researchers to prioritize genes that are likely to be causal for a
phenotype.
## Reference
Gamazon ER†, Wheeler HE†, Shah KP†, Mozaffari SV, Aquino-Michaels K,
Carroll RJ, Eyler AE, Denny JC, Nicolae DL, Cox NJ, Im HK*. (2015) A
gene-based association method for mapping traits using reference
transcriptome data. Nat Genet. doi:10.1038/ng.3367.
† equal contribution
[An open access preprint can be found on BioRxiv](http://biorxiv.org/content/early/2015/06/17/020164)
## Instructions
To run PrediXcan you will need
Software Requirements:
- Linux or Mac OS
- Python 2.7
- numpy package
- R
Scripts:
- PrediXcan.py
- PrediXcanAssociation.R
Input Files:
- genotype file
- sample file
- transcriptome prediction model (sqlite db to be downloaded from [PredictDB](http://predictdb.org/).)
- phenotype file
- filter file - Specifies a subset of rows on which to perform
association tests (optional)
### Predicting/Imputing Expression
To predict the transcriptome from a given genotype file, include the
`--predict` flag when running PrediXcan.py and specify the following
arguments:
1. genelist: list of genes. Optional. By default it will use all
available genes in the model database
2. dosages: imputed genotype file path. Default value: 'data/dosages/'
3. dosage_prefix: prefix of dosage file. Default value: 'chr'
4. weights: full path to the prediction model database. Default value:
'data/weights.db'
5. output_prefix: prefix for output files. This includes the path to
the output files as well as the prefix for the file name
#### Dosage File Format
- Columns are `chromosome rsid position allele1 allele2 MAF id1 .....
idn`.
- Dosage for each person refers to the number of alleles for the 2nd
allele listed (between 0 and 2).
- It is expected that there will be one file per chromosome.
- Files must be gzipped, and their names are expected to end with ".gz"
- In the dosages directory, there must be a file of the individuals with
id #'s listed in the same order as the genotype columns.
- The first column must contain the family ID, and the second must
contain the individual ID.
- If the the family ID is unavailable, it is ok if the individual ID
column is copied over to the FID
- The remaining columns of the sample file are not used in the
creating the output, so it is possible to have a file with only two
columns, but a [PLINK .fam file](https://www.cog-genomics.org/plink2/formats#fam)
is also an acceptable format for the samples file.
#### Usage
> ./PrediXcan.py --predict --dosages dosagefile_path --dosages_prefix
chr --samples samples_file --weights prediction_db --output_prefix
results/tissue
### Running Association with Phenotype
To perform an association test between the predicted expression levels
and phenotype, include the `--assoc` flag when running PrediXcan.py and
specify the following arguments:
1. pred_exp: predicted transcriptome from a previous run of PrediXcan.
Default value: 'predicted_expression.txt'.
2. pheno: phenotype file. No default value. See below for file format.
3. filter: filter file to specify which rows to include in test and a
number to filter on. Optional. See below for details.
4. linear or logistic: specify one of these to perform a linear or
logistic regression between the expression levels of each gene and
phenotype. Default is linear.
5. output_prefix: prefix for output files. This includes the path to
the output files as well as the prefix for the file name
This will produce a file with suffix `association.txt`, containing
summary statistics on the association between each gene and the
phenotype.
#### Phenotype File Format
Phenotype files are expected to be in a format similar to the format
required for PLINK. Most commonly, the phenotype file is tab delimited,
and preferably has a header. By default, PrediXcan will assume the
first column is the Family ID, the second column is the Individual ID,
and the *last* column is the phenotype column.
**Note**: If the phenotype file has a header line, which preferably it
will, the first two columns *must* be labeled FID and IID, respectively.
If there are multiple phenotype columns, you can specify which column to
perform the association on with the `--pheno_name` flag.
If there is more than one phenotype column in the file, you can specify
which phenotype to perform the association on with the `--mpheno`
option. For example `--mpheno 1` will do the association with the 3rd
column in the phenotype file, as columns 1 and 2 are ID numbers,
`--mpheno 2` does the association on 4th, etc. This option will mainly
be used for when there is no header line, and may behave unexpectedly if
the user does not specify options carefully.
By default, PrediXcan performs a linear regression for association
tests, and assumes quantitative traits in the phenotype file.
Unlike PLINK, for logistic tests on qualititative traits, by default the
trait is assumed to be encoded as 0 for unaffected and 1 for affected.
0 is NOT a missing value.
By default, NA specifies a missing phenotype value. To specify a
missing phenotype value that is encoded numerically, say -9 for example,
include `--missing_phenotype -9`.
If a logistic test is specified and there are more than two levels of
the phenotype, the user will recieve an error.
#### Filter File Format
Filter files can specify a subset rows in the pheno file to perform the
association on. It is a tab delimited file with the first 2 columns
identical to the pheno file. The third column holds numerical values
on which to filter. If the filter file is called filter.txt, with
filter values 1 and 2, including `--filter filter.txt 2` will perform
the association test only on individuals marked 2 in the filter file.
Header rows are optional for the filter file, but if they are included,
the first two columns must be labeled FID and IID.
#### Usage
> ./PrediXcan.py --assoc --pheno phenotype_file --pred_exp
predicted_expression_file --linear --filter filter_file filter_val
--output_dir output_dirExample for Prediction and Association
## Example for Prediction and Association
- Download and untar/unzip this file
[PrediXcan Example tar file](https://s3.amazonaws.com/imlab-open/Data/PredictDB/PrediXcanExample_3_29_17.tar.gz)
- Go to folder and run the following
```
./PrediXcan.py --predict --assoc --linear \
--weights weights/TW_Cells_EBV-transformed_lymphocytes_0.5.db \
--dosages genotype \
--samples samples.txt \
--pheno phenotype/igrowth.txt \
--output_prefix results/Cells_EBV-transformed_lymphocytes
```
#### Helper Scripts
Conversion from Plink to Dosage (provided by scottritchie73 via pull
request, thank you!)
[link](https://github.com/hakyimlab/PrediXcan/blob/master/Software/convert_plink_to_dosage.py)
没有合适的资源?快使用搜索试试~ 我知道了~
TWASindividual-level implementation of PrediXcan(实现PrediXcan)
共705个文件
py:282个
pyc:268个
r:19个
0 下载量 197 浏览量
2023-07-31
17:25:05
上传
评论
收藏 12.36MB ZIP 举报
温馨提示
TWASindividual-level implementation of PrediXcan(实现PrediXcan)
资源推荐
资源详情
资源评论
收起资源包目录
TWASindividual-level implementation of PrediXcan(实现PrediXcan) (705个子文件)
easy_install-2.7 271B
pip2.7 243B
python2.7 22B
python2.7 6B
activate 2KB
bin 47B
sysconfig.cfg 3KB
distutils.cfg 228B
activate.csh 1KB
easy_install 271B
encodings 28B
t64.exe 93KB
w64.exe 90KB
t32.exe 89KB
w32.exe 85KB
gui-64.exe 74KB
cli-64.exe 73KB
cli-arm-32.exe 68KB
gui-arm-32.exe 68KB
cli.exe 64KB
cli-32.exe 64KB
gui-32.exe 64KB
gui.exe 64KB
activate.fish 2KB
gene_location_info 1.7MB
.gitignore 538B
updatedFigs.tar.gz 397KB
PrediXcan_paper_plots.html 2.71MB
predict.html 1KB
cmdgen.html 1KB
login.html 931B
base.html 920B
edit.html 786B
user.html 781B
newpost.html 526B
multiupload.html 486B
tarupload.html 437B
uploadfile.html 382B
index.html 315B
500.html 259B
post.html 233B
404.html 166B
uploaded_file.html 141B
include 51B
pydist.json 5KB
metadata.json 1KB
lib 47B
lib-dynload 30B
LICENSE 1KB
README.md 7KB
README.md 4KB
CODE_OF_CONDUCT.md 3KB
FAQ.md 1KB
README.md 454B
README.md 37B
METADATA 77KB
METADATA 2KB
requires.txt.orig 71B
Fig2-PrediXcan-Framework.pdf 251KB
Fig7.pdf 107KB
Fig1-PrediXcan-Mechanism.pdf 59KB
genetically-determined.pdf 20KB
cacert.pem 301KB
pip 243B
pip2 243B
runPrediXcan3.pl 8KB
SNP2GReX.pl 8KB
PrediXcan_with_DGN_WTCCC.pl 8KB
runGWAS2.pl 6KB
WTCCC_PrediXcan_metaanalysis_overlap.pl 5KB
03_DGN_vcf2mach.dosage_hapmapSNPs.pl 5KB
vcf_to_dose_hapmap2.pl 5KB
WTCCC_imputation_QCcheck.pl 3KB
extract.pl 1KB
make_run_scripts_01.pl 737B
make_run_scripts_02.pl 733B
match_ids.pl 447B
FigS6.png 1003KB
FigS5.png 588KB
FigS4_cis_v_trans.png 332KB
FigS1_ggpairs_DGN-WB_10-fCV_with_topSNP.png 321KB
Fig3_compareR2_h2_en_poly_top.png 292KB
FigS7.png 286KB
Fig6.png 285KB
Fig3_compareR2_h2_en_poly_top.png 284KB
Fig5_DGNtoGEU_examples.png 167KB
Fig5.DGNtoGEU.examples.png 159KB
FigS3_DGN-EN_to_GTEx-pilot.png 102KB
Fig4_DGN_to_GEU_qqR2_R2h2.png 100KB
FigS3_DGN-EN_to_GTEx-pilot.png 98KB
Fig4_DGN_to_GEU_qqR2_R2h2.png 90KB
Fig3_compareR2_h2_en.png 88KB
FigS2_ggpairs_DGN-WB_10-fCV_en_SNPplatforms.png 53KB
FigS2_ggpairs_DGN-WB_10-fCV_en_SNPplatforms.png 35KB
gencode.v18.genes.patched_contigs.summary.protein 1.35MB
html5parser.py 114KB
pkg_resources.py 99KB
pkg_resources.py 98KB
doctest.py 97KB
tarfile.py 90KB
共 705 条
- 1
- 2
- 3
- 4
- 5
- 6
- 8
资源评论
Gremmie2003
- 粉丝: 4321
- 资源: 8
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功