没有合适的资源?快使用搜索试试~ 我知道了~
Network predicting drug's anatomical therapeutic chemical code
0 下载量 31 浏览量
2021-02-09
01:30:27
上传
评论
收藏 1.67MB PDF 举报
温馨提示
Motivation: Discovering drug's Anatomical Therapeutic Chemical (ATC) classification rules at molecular level is of vital importance to understand a vast majority of drugs action. However, few studies attempt to annotate drug's potential ATC-codes by computational approaches. <br>Results: Here, we introduce drug-target network to computationally predict drug's ATC-codes and propose a novel method named NetPredATC. Starting from the assumption that drugs with similar chemical structures or target
资源推荐
资源详情
资源评论
Network predicting drug’s anatomical therapeutic
chemical code
Yong-Cui Wang
1
, Shi-Long Chen
1
, Nai-Yang Deng
2
and Yong Wang
3 ∗
1
Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau
Biology, Chinese Academy of Sciences, Xining, China, 810001.
2
College of Science, China Agricultural University, Beijing, China, 100083.
3
National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and
Systems Science, Chinese Academy of Sciences, Beijing, China, 100190.
ABSTRACT
Motivation: Discovering drug’s Anatomical Therapeutic Chemical
(ATC) classification rules at molecular level is of vital importance to
understand a vast majority of drugs action. However, few studies
attempt to annotate drug’s potential ATC-codes by computational
approaches.
Results: Here, we introduce drug-target network to computationally
predict drug’s ATC-codes and propose a novel method named
NetPredATC. Starting from the assumption that dr ugs with similar
chemical structures or target proteins share common ATC-codes, our
method, NetPredATC, aims to assign drug’s potential ATC-codes by
integrating chemical structures and target proteins. Specifically, we
first construct a gold-standard positive dataset from drugs’ ATC-code
annotation databases. Then we characterize ATC-code and drug by
their similarity profiles and define kernel function to correlate them.
Finally, we utilize a kernel method, support vector machine (SVM),
to automatically predict drug’s ATC-codes. Our method was validated
on four drug datasets with various target proteins, including enzymes,
ion channels (ICs), G-protein couple receptors (GPCRs), and nuclear
receptors (NRs). We found that both drug’s chemical structure
and target protein are predictive and target protein information has
better accuracy. Further integrating these two data sources revealed
more experimentally validated ATC-codes for drugs. We extensively
compared our NetPredATC with SuperPred, which is a chemical
similarity only based method. Experimental results showed that our
NetPredATC outperforms SuperPred not only in predictive coverage
but also in accuracy. In addition, database search and functional
annotation analysis support that our novel predictions are worthy of
future experimental validation.
Conclusion: In conclusion, our new method, NetPredATC, can
predict drug’s ATC-codes more accurately by incorporating drug-
target network and integrating data, which will promote drug
mechanism understanding and drug repositioning and discovery.
Availability: NetPredATC is available at http://doc.aporc.org/
wiki/NetPredATC.
Contact: ycwang@nwipb.cas.cn, ywang@amss.ac.cn
∗
To whom correspondence should be addressed
1 INTRODUCTION
The Anatomical Therapeutic Chemical (ATC) classification system
categorizes drug substances at different levels by their therapeutic
properties, chemical properties, pharmacological properties, and
practical applications. This classification system is recommended by
the World Health Organization (WHO) and drug’s ATC-codes have
been widely applied in almost all drug utilization studies (WHO,
2006). Specifically, ATC classification system can be used as a basic
tool for drug utilization research. It also provides the presentation
and comparison of drug consumption statistics at international
level. In addition, ATC prediction will greatly facilitate the recent
drug repositioning and drug combination studies. Though useful,
mapping ATC-codes to drugs is quite challenging.
Recently, ATC-codes for some well characterized drugs have
been deposited in databases, such as KEGG BRITE (Kanehisa
et al., 2006) and DrugBank (Wishart et al., 2008). These databases
provide high quality expert curated data. However, they are in
small scale and the coverage is far from enough to serve practical
usage. Even for some well-collected drug datasets, the ATC code
assignments for drugs are far from complete. For example, the
dataset in Yamanishi et al., 2008 contains drugs with four different
type target proteins including enzymes, ion channels (ICs), G-
protein couple receptors (GPCRs), and nuclear receptors (NRs).
These drugs all have manually curated target proteins from KEGG
BRITE (Kanehisa et al., 2006), BRENDA(Schomburg et al., 2004),
SuperTarget (Gunther et al., 2008), and DrugBank (Wishart et al.,
2008). Even in this high-quality dataset, there are 102 drugs which
do not have any ATC-codes in all 445 drugs targeting enzyme, 13
drugs which do not have any ATC-codes in all 210 drugs targeting
IC, 23 drugs which do not have any ATC-codes in all 223 drugs
targeting GPCR, and 4 drugs which do not have any ATC-codes in
all 54 drugs targeting NR. The percent of drugs without ATC codes
varies from 10% to 25%.
The bottleneck is that current data collection procedure heavily
relies on human curation and is not efficient. One way out
is to learn the underlying drug ATC-codes classification rules
from the available high quality ATC-code annotations, and
further automatically assign new ATC-codes to drugs by a
computational predictor. This strategy will accelerate the functional
characterization of drugs under the ATC classification systems,
1
Associate Editor: Dr. Olga Troyanskaya
© The Author (201
3). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Bioinformatics Advance Access published April 5, 2013
at Periodicals Department/Lane Library on April 5, 2013http://bioinformatics.oxfordjournals.org/Downloaded from
especially those barely characterized drugs. Importantly, it will
greatly speed up the mechanism understanding of a vast majority
of drugs action, and narrow down the gap between the medical
indications and drug effects elucidation at molecular level (Dunkel
et al., 2008).
However, few studies attempt to address this important problem.
Dunkel et al. tackled this challenge by proposing a computational
method to classify the given compounds into ATC classification
system. Their method is based on the drug similarity in chemical
structures and physicochemical properties (Dunkel et al., 2008).
They also developed a useful web-server, which allows prognoses
about the medical indication of novel compounds and to find new
leads for known targets (Dunkel et al., 2008). Nevertheless, the
chemical structure only describes the static state of drugs. While
cells use proteins and small molecules (drugs, metabolites, or
ligands) networks to dynamically coordinate multiple biological
functions. For instance, single drug may possess different biological
functions by targeting different proteins. Therefore, if the drug
target information is integrated into the prediction, the performance
improvement can be expected. In this paper, we follow this idea to
design a new predictive method. That is, we map ATC-codes to a
given drug based not only on its chemical structure similarity with
other compounds, but also on its target proteins.
The commonly accepted assumption in drug discovery is that
drugs with similar pharmacological or therapeutic properties usually
share common functions (Yamanishi et al., 2008, 2010; Zhao and
Li, 2010; Wang et al., 2010). Existing efforts demonstrated that
chemical structure similarity is useful in classifying compounds
into ATC classification system (Dunkel et al., 2008). Here we
note that drug’s pharmacological or therapeutic similarity may
due to the fact that they interact with common or similar target
proteins. Thus it is reasonable to assume that drugs similar in
target proteins usually share common ATC-codes. Starting with
this assumption, we propose a novel computational approach called
NetPredATC to predict potential ATC-codes for drugs. Specifically,
we first construct the drug and ATC-code interaction network based
on the known drug ATC-code annotations. Then we characterize
ATC-code and drug by their similarity profiles, and define kernel
function to correlate drug with ATC-code. Finally, we infer
drug’s ATC-codes by training a machine learning model, i.e.,
support vector machines (SVMs). SVMs are motivated by statistical
learning theory and have been proven successful on many different
classification problems in bioinformatics (Scholkopf et al., 2004).
Our contributions here are not only in incorporating drug targets
information for the first time into the ATC-code prediction, but also
in designing a novel predictive model by data integration.
The performance of our method was validated on four classes
of drug target proteins, including enzymes, ICs, GPCRs, and
NRs. We show that both chemical structure and target protein
are predictive via cross-validation experiments and statistical
evaluation. Moreover, target protein information is more powerful.
By combining them, our method outperforms the chemical
similarity only based method and more experimentally observed
drug ATC-code annotations can be uncovered.
The remainder of this paper is structured as follows. In
Materials and Methods section, we construct the drug and
ATC-code interaction network by collecting available drug ATC-
code annotations. Then chemical structures and target proteins
information are extensively investigated. We characterize the drugs
and ATC-codes by their similarity profile and train the SVM-based
predictor. In Results section, we compare the predictability of
chemical structure, target protein, and their combination, and show
that the improvement in accuracy arises from drug-target network
and data integration. Lastly, the discussions and conclusions are
presented.
2 MATERIALS AND METHODS
We propose a novel computational algorithm, NetPredATC, to infer drug’s
ATC-codes by using drug-target network information. Our algorithm works
in three phases (Fig. 1): (A) Formulating known drug’s ATC annotations
as a bipartite graph. We extracted the known drug’s ATC annotations from
KEGG BRITE (Kanehisa et al., 2006) and DrugBank (Wishart et al., 2008)
databases. (B) Extracting drug-drug and ATC-code-ATC-code similarity
metrics. Drug similarity is derived from chemical structure and target protein
information. ATC-code similarity profiles are calculated by a probabilistic-
based model (Lin, 1998). (C) Feeding the similarities among drugs and
similarities among ATC-codes to kernel method and applying SVM-based
classifier to predict drug’s unknown ATC-codes.
2.1 Constructing drug and ATC-codes interaction
network
In ATC system, drugs are divided into fourteen main groups (1st level),
with one pharmacological/theraputic subgroup (2nd level). The 3rd and 4th
levels are chemical/pharmacological/theraputic subgroups and the 5th level
is the chemical substance. The hierarchical structure of ATC-codes makes
the prediction a hierarchical multi-label classification problem. Existing
models for this problem are complicated and expensive in computational
cost (Rousu et al., 2004; Cai and Hofmann, 2004). This thus greatly
restricts the application scope of such methods. Here, we propose a low
cost computational method by treating ATC-code prediction problem as a
binary classification problem. Specifically, we construct drug and ATC-
code interaction network based on available drug’s ATC annotations, which
are extracted from KEGG BRITE (Kanehisa et al., 2006) and DrugBank
(Wishart et al., 2008) databases. That is, by using the known ATC-codes
for drugs, we construct a bipartite graph (Fig. 1A), i.e., the interactions only
exist between drugs and ATC-codes. In this way drug’s ATC-code prediction
can be cast as a binary classification problem. We aim to determine whether
a given drug and ATC-code pair interacts or not. The advantage is that we
can utilize a much popular machine learning method, SVM, to handle this
high dimensional learning problem in a relatively low cost way.
2.2 Collecting chemical structure and target protein
data
Given two drug ATC-code pairs, we construct a kernel function which
correlates with their similarity. Since kernel function represents the
similarities among the training samples in some sense (Hofmann et al.,
2008), we focus on the similarity scores among drugs and similarity
scores among ATC-codes. Therefore, we construct the similarity profiles to
characterize drug and ATC-code in the following subsections.
2.2.1 Chemical structure data It is generally believed that drugs with
similar chemical structures carry out common therapeutic function, thus
likely share common ATC-codes. So each drug can be characterized by its
chemical structure similarity profile with other drugs. The chemical structure
similarity between two drugs d and d
′
is computed by SIMCOMP algorithm
(Hattori et al., 2003), which is a graph-based method for comparing pairwise
chemical structures. Suppose that we have n
c
drugs in total, a matrix
S
chem
∈ R
n
c
×n
c
is then constructed to represent chemical structure
similarity for all drug pairs. Each row (or column) of this matrix is chemical
structure similarity profile for a single drug.
2
at Periodicals Department/Lane Library on April 5, 2013http://bioinformatics.oxfordjournals.org/Downloaded from
剩余7页未读,继续阅读
资源评论
weixin_38726407
- 粉丝: 20
- 资源: 954
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功