没有合适的资源?快使用搜索试试~ 我知道了~
为了提供医学数据可视化分析工具,引入了机器学习方法,以在医学数据库MIMIC-III(美国重症监护医学信息中心)中对肺部恶性肿瘤进行分类。 选择了K最近邻(KNN),支持向量机(SVM)和随机森林(RF)作为预测工具。 根据实验结果,将机器学习预测工具集成到医学数据可视化分析平台中。 该平台软件可以为医生提供灵活的医学数据可视化分析工具。 相关实践表明,即使没有经过专门的数据分析训练,医生也可以根据简单的步骤生成可视化的分析结果,以便医生对医院中积累的数据进行一些研究工作。
资源详情
资源推荐
资源评论

Journal of Computer and Communications, 2018, 6, 299-310
http://www.scirp.org/journal/jcc
ISSN Online: 2327-5227
ISSN Print: 2327-5219
DOI:
10.4236/jcc.2018.611027 Nov. 28, 2018 299 Journal of Computer and Communications
Medical Data Visualization Analysis and
Processing Based on Machine Learning
Tong Wang, Lei Zhao, Yanfeng Cao, Zhijian Qu, Panjing Li
*
School of Computer Science and Technology, Shandong University of Technology, Zibo, China
Abstract
Trying
to provide a medical data visualization analysis tool, the machine
learning methods are introduced to classify the m
alignant neoplasm of lung
within the medical database MIMIC-III (Medical Information Mart for In-
tensive Care III, USA). The K-Nearest Neighbor (KNN)
, Support Vector
Machine (SVM) and Random Forest (RF)
are selected as the predictive tool.
Based on the experimental result, the machine learning predictive tools are
integrated into the medical data visualization analysis platform. The platform
software can provide a flexible medical data visualization analysis tool for the
doctors. The related practice indicates that visualization analysis result can be
generated based on simple steps for the doct
ors to do some research work on
the data accumulated in hospital, even they have not taken special data analy-
sis training.
Keywords
Data Visualization Analysis, Machine Learning, KNN, SVM, RF
1. Introduction
Medical data mainly include clinical trial data, biomedical data, electronic med-
ical records and diagnosis books, and individual health information [1]. The da-
ta type varies from image, text to numbers. The huge volume makes the doctors
to be drowning in medical data accumulated in hospital but starved of informa-
tion. Sometimes the doctors maybe want to reveal the rule behind the data; for
instance, if a special disease is related to sex, age, residence region, or other
things, and why. The medical data visualization analysis and processing can pro-
vide an intuitional graphical tool, and more and more methods are developed in
past decades. For instance, in 2014, Akilah L. [2] organized hierarchical data
*Corresponding author.
How to cite this paper:
Wang, T.,
Zhao,
L
., Cao, Y.F., Qu, Z.J. and Li, P.J. (2018
)
Medical Data Visualization Analysis and
Processing Based on Machine Learning.
Journal of Computer and Communications
,
6
, 299-310.
https://doi.org/10.4236/jcc.2018.611027
Received:
October 15, 2018
Accepted:
November 25, 2018
Published:
November 28, 2018
Copyright © 201
8 by authors and
Scientific
Research Publishing Inc.
This work is licensed under the Creative
Commons Attribution International
License (CC BY
4.0).
http://creativecommons.org/licenses/by/4.0/
Open Access

T. Wang et al.
DOI:
10.4236/jcc.2018.611027 300 Journal of Computer and Communications
structures by using treemaps to examine large amounts of data in one overall
view, which served as a proof that treemaps could be beneficial in assessing sur-
gical data retrospectively by allowing surgeons and healthcare administrators to
make quick visual judgments. In 2015, Gilbert Chien Liu [3] provided health
services researchers a visualization tool to construct logic models for clinical de-
cision support within an electronic health record. The mapping relationships
could be acquired based on software for social network analysis: NodeXL and
CMAP. Seonah Lee [4] developed time-oriented visualization for problems and
outcomes and Matrix visualization for problems and interventions by using
PHN-generated Omaha System data to help PHNs consume data and plan care
at the point of care. In 2016, Shahid Mahmud [5] presented a data analytics and
visualization framework for health-shocks prediction based on large-scale health
informatics dataset based on fuzzy rule summarization, which can provide in-
terpretable linguistic rules to explain the causal factors affecting health-shocks.
Usman Iqbal [6] put forward an animated visualization tool called as Cancer
Associations Map Animation (CAMA), which can depict the association of 9
major cancers with other disease over time based on 782 million outpatient data
in health insurance database. Dror G. Feitelson [7] introduced multilevel spie
chart to create a visualized combination of cancer incidence and mortality statis-
tics. In 2017, Fleur Mougin [8] reviewed the current methods and techniques
dedicated to information visualization and their current use in software devel-
opment related to omics or/and clinical data. It can be seen from the past re-
search on medical visualization that related research progress has been made on
the processing of medical big data, visualization of electronic health records and
correlation analysis of disease characteristics. But the research on medical data
visualization analyzed by fusion algorithm is still to be explored. Under the
background of this study, this paper put forward general-purpose medical data
visualization analysis tool within R and the machine learning methods, which
are taken as predict tool.
2. Machine Learning Classification Algorithms
Sometimes the special type medical data need to be classified into clusters, then
we can try to find the relationship between the cluster and disease. The cluster
analysis is an important method as the data visualization analysis. So, the typical
machine learning methods KNN, Support Vector Machine and Random Forest
are selected as predict tool for the data classification.
2.1. K-Nearest Neighbor
K-Nearest Neighbor (KNN) [9] [10] is a typical supervised machine learning
method. KNN is a non-parametric method used for classification, where the
output is a class membership. The objects are classified by a majority vote of its
neighbors, with the objects being allocated to the class most common among the
k
nearest neighbors. For the medical data object, if most of the
k
nearest samples
in the feature space belongs to certain category, which means the samples belong

T. Wang et al.
DOI:
10.4236/jcc.2018.611027 301 Journal of Computer and Communications
to this category, and the samples contain the same attributions in this category.
KNN algorithm only depends on the category of the nearest sample or several
samples to determine the categories to be classified. The selected neighbors are
objects that have been correctly classified. The distance is used as a
non-similarity index for the objects, try to address the problem of matching be-
tween objects. The commonly used distance is Euclidean distance (1) or Man-
hattan distance (2).
( )
( )
2
1
,
n
kk
k
d xy x y
=
= −
∑
(1)
( )
1
,
n
kk
k
d xy x y
=
= −
∑
(2)
KNN makes decisions based on the dominant categories of
k
objects, rather
than a single object category. The KNN algorithm could be describes as:
Step 1: Calculate the distance between the test data and each training data;
Step 2: Sort the distance according to the increasing relation;
Step 3: Select
K
points with the nearest distance;
Step 4: Determine the occurrence frequency of the category of the first
K
points;
Step 5: Return the category with the highest frequency in the
K
points as the
prediction classification of test data.
2.2. Support Vector Machines
Support Vector Machines (SVM) [11] [12] [13] [14] is a supervised learning
model proposed by Corinna Cortes and Vapnik in 1995. In SVM classification
algorithm, given a set of training sample
( ) ( )
( )
{ }
{ }
11 2 2
, , , , , , , 1, 1
mm i
D xy x y x y y= ∈− +
, based on the training set
D
, a
hyperplane founded in the sample space could be taken as the mark of the sam-
ple belonging to one or the other of two categories. For the medical data, there
exists two possible situations: linear separable data condition and non-linear se-
parable data condition. If the data is linearly separable, this Equation (3) is used
in the
n
-dimensional space to find a set of weights (4) that specify two hyper-
planes.
0
wx b⋅+=
(3)
1
1
wx b
wx b
⋅ + ≥+
⋅ + ≤−
(4)
The distance between two planes is
2
w
, where
w
stands for Euclidean
norm. Such task situations are expressed as a set of constraints (5). When the
data is non-linear and separable, the constraint condition of the task case is (6).
( )
2
1
min ,making 1
2
ii i
w y wx b x⋅− ≥∀
(5)
( )
2
1
1
min , making 1 , 0
2
n
i i i i ii
i
w C y wx b x
ξ ξξ
=
+ ⋅ − ≥− ∀ ≥
∑
(6)
剩余11页未读,继续阅读


















资源评论

weixin_38724919
- 粉丝: 5
- 资源: 992

上传资源 快速赚钱
我的内容管理 收起
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助

会员权益专享
安全验证
文档复制为VIP权益,开通VIP直接复制
