# pandas-plink
[![Travis](https://img.shields.io/travis/com/limix/pandas-plink.svg?style=flat-square&label=linux%20%2F%20macos%20build)](https://travis-ci.com/limix/pandas-plink) [![AppVeyor](https://img.shields.io/appveyor/ci/Horta/pandas-plink.svg?style=flat-square&label=windows%20build)](https://ci.appveyor.com/project/Horta/pandas-plink) [![Documentation](https://img.shields.io/readthedocs/pandas-plink.svg?style=flat-square&version=stable)](https://pandas-plink.readthedocs.io/)
Pandas-plink is a Python package for reading [PLINK binary file format](https://www.cog-genomics.org/plink2/formats) and (since version 2.0.0) PLINK and GCTA realized relationship matrices.
The file reading is taken place via [lazy loading](https://en.wikipedia.org/wiki/Lazy_loading), meaning that it saves up memory by actually reading only the genotypes that are actually accessed by the user.
Notable changes can be found at the [CHANGELOG.md](CHANGELOG.md).
## Install
It be installed using [pip](https://pypi.python.org/pypi/pip):
```bash
pip install pandas-plink
```
Alternatively it can be intalled via [conda](http://conda.pydata.org/docs/index.html):
```bash
conda install -c conda-forge pandas-plink
```
## Usage
It is as simple as
```python
>>> from pandas_plink import read_plink1_bin
>>> G = read_plink1_bin("chr11.bed", "chr11.bim", "chr11.fam", verbose=False)
>>> print(G)
<xarray.DataArray 'genotype' (sample: 14, variant: 779)>
dask.array<shape=(14, 779), dtype=float64, chunksize=(14, 779)>
Coordinates:
* sample (sample) object 'B001' 'B002' 'B003' ... 'B012' 'B013' 'B014'
* variant (variant) object '11_316849996' '11_316874359' ... '11_345698259'
father (sample) <U1 '0' '0' '0' '0' '0' '0' ... '0' '0' '0' '0' '0' '0'
fid (sample) <U4 'B001' 'B002' 'B003' 'B004' ... 'B012' 'B013' 'B014'
gender (sample) <U1 '0' '0' '0' '0' '0' '0' ... '0' '0' '0' '0' '0' '0'
i (sample) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13
iid (sample) <U4 'B001' 'B002' 'B003' 'B004' ... 'B012' 'B013' 'B014'
mother (sample) <U1 '0' '0' '0' '0' '0' '0' ... '0' '0' '0' '0' '0' '0'
trait (sample) <U2 '-9' '-9' '-9' '-9' '-9' ... '-9' '-9' '-9' '-9' '-9'
a0 (variant) <U1 'C' 'G' 'G' 'C' 'C' 'T' ... 'T' 'A' 'C' 'A' 'A' 'T'
a1 (variant) <U1 'T' 'C' 'C' 'T' 'T' 'A' ... 'C' 'G' 'T' 'G' 'C' 'C'
chrom (variant) <U2 '11' '11' '11' '11' '11' ... '11' '11' '11' '11' '11'
cm (variant) float64 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
pos (variant) int64 157439 181802 248969 ... 28937375 28961091 29005702
snp (variant) <U9 '316849996' '316874359' ... '345653648' '345698259'
>>> print(G.sel(sample="B003", variant="11_316874359").values)
0.0
>>> print(G.a0.sel(variant="11_316874359").values)
G
>>> print(G.sel(sample="B003", variant="11_316941526").values)
2.0
>>> print(G.a1.sel(variant="11_316941526").values)
C
```
Portions of the genotype will be read as the user access them.
Covariance matrices can also be read very easily.
Example:
```python
>>> from pandas_plink import read_rel
>>> K = read_rel("plink2.rel.bin")
>>> print(K)
<xarray.DataArray (sample_0: 10, sample_1: 10)>
array([[ 0.885782, 0.233846, -0.186339, -0.009789, -0.138897, 0.287779,
0.269977, -0.231279, -0.095472, -0.213979],
[ 0.233846, 1.077493, -0.452858, 0.192877, -0.186027, 0.171027,
0.406056, -0.013149, -0.131477, -0.134314],
[-0.186339, -0.452858, 1.183312, -0.040948, -0.146034, -0.204510,
-0.314808, -0.042503, 0.296828, -0.011661],
[-0.009789, 0.192877, -0.040948, 0.895360, -0.068605, 0.012023,
0.057827, -0.192152, -0.089094, 0.174269],
[-0.138897, -0.186027, -0.146034, -0.068605, 1.183237, 0.085104,
-0.032974, 0.103608, 0.215769, 0.166648],
[ 0.287779, 0.171027, -0.204510, 0.012023, 0.085104, 0.956921,
0.065427, -0.043752, -0.091492, -0.227673],
[ 0.269977, 0.406056, -0.314808, 0.057827, -0.032974, 0.065427,
0.714746, -0.101254, -0.088171, -0.063964],
[-0.231279, -0.013149, -0.042503, -0.192152, 0.103608, -0.043752,
-0.101254, 1.423033, -0.298255, -0.074334],
[-0.095472, -0.131477, 0.296828, -0.089094, 0.215769, -0.091492,
-0.088171, -0.298255, 0.910274, -0.024663],
[-0.213979, -0.134314, -0.011661, 0.174269, 0.166648, -0.227673,
-0.063964, -0.074334, -0.024663, 0.914586]])
Coordinates:
* sample_0 (sample_0) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
* sample_1 (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
fid (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
iid (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
>>> print(K.values)
[[ 0.89 0.23 -0.19 -0.01 -0.14 0.29 0.27 -0.23 -0.10 -0.21]
[ 0.23 1.08 -0.45 0.19 -0.19 0.17 0.41 -0.01 -0.13 -0.13]
[-0.19 -0.45 1.18 -0.04 -0.15 -0.20 -0.31 -0.04 0.30 -0.01]
[-0.01 0.19 -0.04 0.90 -0.07 0.01 0.06 -0.19 -0.09 0.17]
[-0.14 -0.19 -0.15 -0.07 1.18 0.09 -0.03 0.10 0.22 0.17]
[ 0.29 0.17 -0.20 0.01 0.09 0.96 0.07 -0.04 -0.09 -0.23]
[ 0.27 0.41 -0.31 0.06 -0.03 0.07 0.71 -0.10 -0.09 -0.06]
[-0.23 -0.01 -0.04 -0.19 0.10 -0.04 -0.10 1.42 -0.30 -0.07]
[-0.10 -0.13 0.30 -0.09 0.22 -0.09 -0.09 -0.30 0.91 -0.02]
[-0.21 -0.13 -0.01 0.17 0.17 -0.23 -0.06 -0.07 -0.02 0.91]]
```
Please, refer to the [pandas-plink documentation](https://pandas-plink.readthedocs.io/) for more information.
## Authors
* [Danilo Horta](https://github.com/horta)
## License
This project is licensed under the [MIT License](https://raw.githubusercontent.com/limix/pandas-plink/master/LICENSE.md).
没有合适的资源?快使用搜索试试~ 我知道了~
pandas_plink-2.0.0.tar.gz
需积分: 1 0 下载量 180 浏览量
2024-03-15
23:33:42
上传
评论
收藏 71KB GZ 举报
温馨提示
Python库是一组预先编写的代码模块,旨在帮助开发者实现特定的编程任务,无需从零开始编写代码。这些库可以包括各种功能,如数学运算、文件操作、数据分析和网络编程等。Python社区提供了大量的第三方库,如NumPy、Pandas和Requests,极大地丰富了Python的应用领域,从数据科学到Web开发。Python库的丰富性是Python成为最受欢迎的编程语言之一的关键原因之一。这些库不仅为初学者提供了快速入门的途径,而且为经验丰富的开发者提供了强大的工具,以高效率、高质量地完成复杂任务。例如,Matplotlib和Seaborn库在数据可视化领域内非常受欢迎,它们提供了广泛的工具和技术,可以创建高度定制化的图表和图形,帮助数据科学家和分析师在数据探索和结果展示中更有效地传达信息。
资源推荐
资源详情
资源评论
收起资源包目录
pandas_plink-2.0.0.tar.gz (53个子文件)
pandas_plink-2.0.0
LICENSE.md 1KB
version.py 411B
setup.py 119B
PKG-INFO 7KB
pandas_plink
__init__.py 386B
_testit.py 507B
_builder.py 726B
_bed_read.py 2KB
_bed_reader.c 3KB
_read.py 15KB
_read_grm.py 8KB
_util.py 81B
test
__init__.py 0B
test_grm.py 3KB
data_files
rel-bin
plink2.rel.id 169B
plink2.rel.bin 800B
chr12.bim 13KB
grm-bin
plink.grm.N.bin 220B
plink.grm.bin 220B
plink.grm.id 160B
chr11.nosex 77KB
grm-list
plink2.grm.id 160B
plink2.grm 935B
chr12.bed 2KB
data.fam 95B
data.bed 13B
rel
plink.rel 532B
plink.rel.id 160B
rel-zs
plink2.rel.id 169B
plink2.rel.zst 290B
chr12.nosex 77KB
chr11.bim 21KB
grm-gz
plink.grm.id 160B
plink.grm.gz 434B
chr12.fam 266B
chr11.bed 3KB
data.bim 243B
chr11.fam 266B
test_rel.py 2KB
test_reader.py 4KB
_filetype.py 1KB
_data.py 514B
_read_rel.py 6KB
conftest.py 626B
pandas_plink.egg-info
SOURCES.txt 2KB
top_level.txt 13B
PKG-INFO 7KB
requires.txt 171B
not-zip-safe 1B
dependency_links.txt 1B
MANIFEST.in 1KB
setup.cfg 2KB
README.md 6KB
共 53 条
- 1
资源评论
程序员Chino的日记
- 粉丝: 3667
- 资源: 5万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功