没有合适的资源?快使用搜索试试~ 我知道了~
2019-KDD-GCN-MF, Disease-Gene Association Identification By Grap
需积分: 0 0 下载量 30 浏览量
2022-08-04
11:20:49
上传
评论
收藏 1.35MB PDF 举报
温馨提示
试读
9页
2) They are unable to capture nonlinearassociations between diseases and genes.
资源详情
资源评论
资源推荐
GCN-MF: Disease-Gene Association Identification By Graph
Convolutional Networks and Matrix Factorization
Peng Han
King Abdullah University of Science
and Technology
peng.han@kaust.edu.sa
Peng Yang
Cognitive Computing Lab
Baidu Research USA
yangpeng1985521@gmail.com
Peilin Zhao
∗
Tencent AI Lab
peilinzhao@hotmail.com
Shuo Shang
∗
University of Electronic Science and
Technology of China
Inception Institute of Articial
Intelligence
jedi.shang@gmail.com
Yong Liu
Alibaba-NTU Singapore Joint
Research Institute, Nanyang
Technological University
stephenliu@ntu.edu.sg
Jiayu Zhou
Michigan State University
jiayuz@msu.edu
Xin Gao
King Abdullah University of Science
and Technology
xin.gao@kaust.edu.sa
Panos Kalnis
King Abdullah University of Science
and Technology
panos.kalnis@kaust.edu.sa
ABSTRACT
Discovering disease-gene association is a fundamental and crit-
ical biomedical task, which assists biologists and physicians to
discover pathogenic mechanism of syndromes. With various clin-
ical biomarkers measuring the similarities among genes and dis-
ease phenotypes, network-based semi-supervised learning (NSSL)
has been commonly utilized by these studies to address this class-
imbalanced large-scale data issue. However, most existing NSSL
approaches are based on linear models and suer from two major
limitations: 1) They implicitly consider a local-structure represen-
tation for each candidate; 2) They are unable to capture nonlinear
associations between diseases and genes. In this paper, we propose
a new framework for disease-gene association task by combin-
ing Graph Convolutional Network (GCN) and matrix factorization,
named GCN-MF. With the help of GCN, we could capture non-
linear interactions and exploit measured similarities. Moreover, we
dene a margin control loss function to reduce the eect of spar-
sity. Empirical results demonstrate that the proposed deep learning
algorithm outperforms all other state-of-the-art methods on most
of metrics.
CCS CONCEPTS
• Computing methodologies → Semantic networks.
∗
Corresponding Author.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
KDD ’19, August 4–8, 2019, Anchorage, AK, USA
© 2019 Association for Computing Machinery.
ACM ISBN 978-1-4503-6201-6/19/08.. . $15.00
https://doi.org/10.1145/3292500.3330912
KEYWORDS
graph convolutional networks; deep learning; disease-gene associa-
tion
ACM Reference Format:
Peng Han, Peng Yang, Peilin Zhao, Shuo Shang, Yong Liu, Jiayu Zhou, Xin
Gao, and Panos Kalnis. 2019. GCN-MF: Disease-Gene Association Identi-
cation By Graph Convolutional Networks and Matrix Factorization. In The
25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
(KDD ’19), August 4–8, 2019, Anchorage, AK, USA. ACM, New York, NY, USA,
9 pages. https://doi.org/10.1145/3292500.3330912
1 INTRODUCTION
Identifying disease genes from human genome is an important and
fundamental problem in biomedical research [
9
,
15
,
31
,
43
]. Despite
many publications of machine learning methods have been applied
to discover new disease genes, it still remains a challenge. Because
the set of genes pleiotropy is large, and the number of conrmed
disease genes among whole genome and the genetic heterogeneity
of diseases is limited. Recent approaches have applied the concept
of ’guilty by association’ to investigate the association between
a disease phenotype and its causative genes, which means that
candidate genes with similar characteristics as known disease genes
are more likely to be associated with diseases.
However, due to the imbalance issues (few genes are experimen-
tally conrmed as disease related genes within human genome)
in disease-gene identication, semi-supervised approaches, like
label propagation approaches and positive-unlabeled learning, are
widely used to identify candidate disease-gene links [
27
,
28
]. These
methods make use of unknown genes for training typically in the
scenario of a small amount of conrmed disease-genes (labeled
data) with a large amount of unknown genome (unlabeled data).
The performance of disease-gene association models are limited by
potential bias of single learning models, incompleteness and noise
Research Track Paper
KDD ’19, August 4–8, 2019, Anchorage, AK, USA
705
曹多鱼
- 粉丝: 19
- 资源: 314
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0