AnearlyintestinalcancerpredictionAlgorithmBasedonDeepBeliefnetwork资源-CSDN文库

研究论文

153 浏览量 2021-02-08 08:10:03 上传评论收藏 1.12MB PDF 举报

资源详情

资源评论

资源推荐

SCIENTIFIC REPORTS | (2019) 9:17418 | https://doi.org/10.1038/s41598-019-54031-2

www.nature.com/scientificreports

An Early Intestinal Cancer

Prediction Algorithm Based on

Deep Belief Network

Jing-Jing Wan

, Bo-Lun Chen

2,3*

, Yi-Xiu Kong

2,3

, Xing-Gang Ma

1,3

& Yong-Tao Yu

2,3

The incidence of colorectal cancer (colorectal cancer, CRC) in China has increased in recent years, and its

mortality rate has become one of the highest among all cancers. CRC also increasingly aects people’s

health and quality of life, and the workloads of medical doctors have further increased due to the lack

of sucient medical resources in China. The goal of this study was to construct an automated expert

system using a deep learning technique to predict the probability of early stage CRC based on the

patient’s case report and the patient’s attributes. Compared with previous prediction methods, which

are either based on sophisticated examinations or have high computational complexity, this method

is shown to provide valuable information such as suggesting potentially important early signs to assist

in early diagnosis, early treatment and prevention of CRC, hence helping medical doctors reduce the

workloads of endoscopies and other treatments.

CRC is a common malignant tumor in China. As people’s living standards have continued to improve and changes

in people’s eating habits, the incidence and mortality of CRC have continued to rise, seriously endangering the

health and quality of life of the Chinese people. According to Chinese cancer statistics from 2015, the incidence

and mortality of CRC ranked h among all malignant tumors, including nearly 400,000 new cases and nearly

200,000 deaths, a mortality of 50%

. In addition, a recently published study showed a signicant increase in the

annual rate of CRC incidence among young people

. Due to its high morbidity and mortality, CRC prevention is

an urgent problem that needs to be addressed.

CRC prognosis is closely related to its early diagnosis. Most CRC cases can be cured when they are discovered

at an early stage; the 5-year survival rate aer early diagnosis can be as high as 90%. In contrast, when discovered

only in the later stages, the 5-year survival rate is less than 10%

. In the clinic, early diagnosis and early treatment

are generally conducted by screening to reduce the incidence and mortality of CRC. Colonoscopy is the primary

means of early diagnosis. However, domestic and foreign studies have shown that CRC screening programs for

early diagnosis are not suciently accurate; only a small number of cases are screened out among a large number

of people, resulting in low screening compliance among patients

4,5

In addition, in China, the heavy workloads of medical professionals are well known

, and a series of social

and economic problems have been reported

7–10

. ese problems are mainly due to the insuciency of medical

resources in China and the inecient allocation of medical resources. Moreover, such causes will likely be dif-

cult to address in the short term. erefore, we believe that a technical approach can partially reduce doctors’

workloads—that is, by freeing doctors from repetitive work that does not require in-depth thinking. e goal of

this study is to reduce doctors’ workloads by designing an automated forecasting system to assist them to make

decisions more easily.

Previous early CRC predictions were conducted on a case-by-case basis, using either statistical analyses or

patient records. However, a generalized predictive mechanism has yet to be developed because we do not yet

fully understand the mechanism of CRC

. us, a solution to the prediction problem has great practical value.

For example, biological eld research has linked the protein interaction network and the metabolic network

node through an interaction relationship. Revealing the hidden interactions in such networks has high experi-

mental costs; however, the results of the prediction methods can guide experiments and increase their success

rates, thereby reducing their costs. Studying disease-gene network losses and predicting suspicious links aids in

Department of Gastroenterology, The Aliated Huai’an Hospital of Xuzhou Medical University, the Second People’s

Hospital of Huai’an, Huaian, 223002, China.

College of Computer Engineering, Huaiyin Institute of Technology,

Huaian, 223003, China.

These authors contributed equally: Bo-Lun Chen, Yi-Xiu Kong, Xing-Gang Ma and Yong-Tao

Yu. *email: [email protected]

OPEN

SCIENTIFIC REPORTS | (2019) 9:17418 | https://doi.org/10.1038/s41598-019-54031-2

www.nature.com/scientificreports

www.nature.com/scientificreports/

exploring the mechanism behind the disease, in predicting and evaluating corresponding treatments, and nding

new drug targets, thereby opening up new avenues for drug research and development

e medical industry has incorporated high tech solutions such as articial intelligence and sensing technol-

ogies, making medical services increasingly intelligent. e recent policy of “New Healthcare Reform” in China

has made intelligent healthcare care accessible to ordinary people. Intelligent healthcare aims to capitalize on

articial intelligence technology to assist in various types of medical decision making, including disease risk pre-

diction, intelligent healthcare consultation, medical image analysis, electronic medical record information extrac-

tion, medical health data analysis, medical insurance evaluation, and making recommendations for medication.

In 2017, Esteva developed a deep neural network that can successfully classify skin cancer from sample data

demonstrating that deep learning methods have great potential for use in medical elds. Intelligent systems that

can make early disease predictions or help provide information for doctors during the diagnosis process are val-

uable in both scientic research and clinical medicine.

In recent years, many research teams have attempted to pursue machine learning methods to classify cancer

patients as high or low risk. ese technologies can play important roles in research and treatment of cancer

diseases

. e purpose of machine learning methods is to detect key features from complex sample data and to

reveal their contributions. Machine learning methods such as articial neural networks, Bayesian networks, sup-

port vector machines (SVM), and decision trees have been widely used in cancer research and provide eective

and accurate basic models for early prediction of various types of cancers.

e dimensions of the sample data increase with the number of examination data items during the early diag-

nosis of cancer. However, because the specic examination items collected vary on a case-by-case basis, it is nat-

ural to see data sparseness in the constructed sample dataset. Consequently, the noise in the data also increases,

which inevitably negatively impacts the performances of early CRC prediction algorithms. In addition, because

of the high dimensionality of the sample data, the time complexity of traditional prediction algorithms is usually

high. erefore, we intend to devise a method to eectively address both data sparsity and high dimensionality

and to eliminate noise in prediction problems, allowing us to learn which sample features play key roles in early

CRC prediction.

Wang et al. dened the problem of feature selection as a combinatorial optimization or search problem in

intelligent healthcare, rather than the commonly used ltering, packaging and embedded feature selection meth-

ods

. ey applied several feature selection methods, including exhaustive search, heuristic search and hybrid

methods. e heuristic search methods include feature ordering metrics either with or without data extraction.

Kleogiannis et al. combined an SVM with a genetic algorithm (GA) to perform feature selection and parameter

optimization

. Duan proposed a backward elimination feature extraction method similar to the SVM recursive

feature elimination method (SVM-RFE)

. e method classies the feature ranking scores by statistically analyz-

ing the weight vectors of the plurality of linear SVMs trained on subsamples of the original training data at each

step. Zhong et al. used an SVM to analyze protein characteristics based on the Pearson correlation coecient to

eliminate redundant features

. Fong et al. combined the particle swarm optimization algorithm with three dif-

ferent classication methods—pattern network, decision tree and naive Bayes—to search for the optimal feature

subset

. e results show that the method achieves high classication precision on specic datasets. Inspired by

evolutionary algorithms, Mohapatra et al. proposed a modied cat swarm optimization (MCSO) algorithm to

extract features from datasets, applied it to several biomedical datasets, and achieved favorable results

. Metsis

et al. proposed a feature extraction method based on a structural sparse induction specication and compared it

with existing feature extraction methods on four published ACGH datasets

. Boreto et al. proposed an analytical

geometric feature extraction method to supervise variational correlation learning (suvrel) using a variational

method that determines the tensor of the metric to dene the distance-based similarity during pattern classica-

tion

. e variational method was applied to a cost function that penalizes the distance within the large class and

the distance within the preferred class. eir approach yields a metric tensor that minimizes the cost function.

Bennasar et al. introduced the joint mutual information maximization (JMIM) and the normalized joint mutual

information maximization (NJMIM) methods, both of which use the maximum value of mutual information and

minimum criteria, thus alleviating the theoretical and experimental overestimation of the meanings of features

Xu et al. used the minimum redundancy maximum correlation (MRMR) metric, forward feature extraction and

an SVM, and found that this combination outperformed other classiers such as Bayesian decision theory, K

nearest neighbor and random forest

In addition, to address the sparsity and noise of the data in such problems, the matrix decomposition tech-

nique is a commonly used method at present; its implementation is relatively simple and its prediction accu-

racy is relatively high. e most famous matrix decomposition methods include singular value decomposition

(SVD)

25,26

, principal component analysis (PCA)

, independent component analysis (ICA)

, and others. Among

these, SVD requires completing the data to avoid the sample sparseness problem; however, this operation not only

increases the required data storage space but also potentially violates the practical signicance of the sample data

in a specic environment. Meanwhile, because SVD is a highly complex algorithm, it is not applicable to networks

with large sample sizes. erefore, based on SVD, Simon Funk proposed the LFM model by optimizing the diag-

onal array of the eigenvalues of the sample data matrix into a decomposed matrix by optimizing the evaluation

index RMSE in the training matrix

. In real prediction systems, no uniform standard exists for each new data

sample; therefore, Koren added the user’s historical scores based on LFM and proposed the SVD++ model

However, the above series of feature extraction models do not consider the existence of negative values in the

sample data. In a prediction system, negative values in the sample matrix have no practical meaning in a real situ-

ation. For example, during early cancer diagnosis, a certain patient attribute or a certain indicator with a negative

value may be meaningless when reconstructing the sample data. erefore, Lee and Seung proposed a nonnega-

tive matrix factorization method (NMF)

31,32

, which nds the low rank of the matrix and then decomposes it into

a nonnegative matrix. is method not only greatly reduces the dimensionality of the matrix but also removes

SCIENTIFIC REPORTS | (2019) 9:17418 | https://doi.org/10.1038/s41598-019-54031-2

www.nature.com/scientificreports

www.nature.com/scientificreports/

redundant data, making the decomposed result more interpretable in practice. NMF technology has been widely

applied in the health care

, medical imaging

34–36

and biomedical elds

37,38

; however, this technology has not

attracted widespread attention in early cancer prediction. erefore, this paper integrates NMF and combines it

with a deep learning method to facilitate early CRC detection.

Multiple examples of deep learning applications exist in medical research, most of which focus on automat-

ically identifying tumor images or detecting gene sequences, and these algorithms have achieved good results.

Xiao et al. developed a deep learning-based 5-class model to make cancer predictions using RNA sequence data

Danaee et al. used a deep learning approach (a stacked denoising autoencoder) to analyze gene expression data

and identify genes potentially correlated with breast cancer

. Some researchers have applied deep learning tech-

niques to analyze cancer imagery. Bychkov et al. proposed a deep learning method to analyze CRC images, and

their results showed that state-of-the-art deep learning techniques are able to extract more prognostic informa-

tion from the tissue morphology of CRC than can an experienced medical professional

. Cruz-Roa et al. pre-

sented and evaluated a deep learning model for automated basal cell carcinoma cancer detection that learns the

image representation, performs image classication, and interprets the results

. Coudray et al. discovered that a

deep learning method can classify and predict the mutation of non–small cell lung cancer from histopathology

images

. Other researchers have also employed deep learning methods to investigate other types of medical data

related to cancer prediction. Mamoshina et al. used deep neural networks (DNNs) to analyze ‘omics data and

achieved state-of-the-art results

. Burke et al. used articial neural networks to analyze the American College

of Surgeons’ Patient Care Evaluation (PCE) data and obtained improved predictions of patient 5-year survival

rates

However, in real conditions, especially those in developing countries, examination data such as tumor imagery

and genetic testing data are not easily obtained. Given the constraints on patients’ economic and medical con-

ditions, numerous patients do not have access to these techniques. In addition, test procedures such as tumor

imaging and genetic testing are typically performed only for patients already strongly suspected of having cancer.

erefore, during the most important period (i.e., the prevention and early diagnosis period), these data provide

minimal help. In this paper, we attempt to use the simplest and most commonly available test data—the medical

examination report—to create a new prediction system to help doctors make decisions. e medical examination

report is a basic test that almost every patient undergoes; thus, our early cancer prediction system can be applied

to a broader range of patients.

CRC is a multifactor disease. In CRC prediction, combining data such as age, gender, family history of CRC,

BMI, past history and other attributes and patient case reports using deep learning techniques in an expert system

to predict the likelihood of early cancer will greatly reduce missed diagnoses by clinicians during endoscopy and

treatment and will also provide eective help for early diagnosis, early treatment and prevention of CRC.

is paper explores and analyzes patient data from a deep learning perspective combined with patient attrib-

utes and case reports to construct an expert system to predict the probability of early cancer. Due to its relatively

eective dimensional reduction and noise cancellation techniques, this method shows great promise for appli-

cation in real scenarios. By greatly reducing missed clinician diagnoses during endoscopy and treatment, it will

provide eective help for the early diagnosis, early treatment and prevention of CRC.

Results

e sample dataset includes each sample’s attributes (e.g., age, gender, smoking history, and drinking history),

endoscopic features (e.g., lesion location, polyp size, and no leaf) and blood attributes (e.g., white blood cells and

hemoglobin). ere are 50 features in all categories.

We compare early cancer prediction (ECP) using four classic machine learning algorithms, i.e., an (SVM),

KNN, ensembles for boosting (EB), and random forest (RF), and three deep learning methods, i.e., a CNN, a

recurrent neural network (RNN1), and a recursive neural network (RNN2). Each method’s performance is aver-

aged over 100 runs in which the data are randomly separated into a training set (containing 90% of the links)

and a test set (including 10% of the links). Normally, precision and recall are not necessarily related; however, in

large-scale datasets, these two indicators are correlated. A false negative example (FN) means that the predic-

tion model incorrectly predicted a sample from the positive category as a negative category. Specically, in this

experiment, a FN means that a sample from a cancer patient was classied as being from a noncancer patient. In

the clinic, the false negative rate (FNR) is important because it may lead to a missed diagnosis. erefore, in this

paper, we mainly use the F1_Score and FNR as the evaluation metrics of the algorithms. e experimental results

are as follows:

From Table1, we can see that our ECP algorithm achieves the highest F1_Score on the real sample data-

set. Both the Precision and Recall of our method outperform other algorithms. In addition, the FNR is the

smallest among all algorithms. Aer dimensional reduction by a nonnegative matrix, we reduced the original

50-dimensional matrix to 14 dimension and extracted the hidden features. is idea facilitates eective early

diagnosis, early treatment and prevention of cancer. erefore, our algorithm not only reduces the spatial com-

plexity of the sample but also achieves better prediction results. False negatives can also be caused by instability

in the patient’s condition, and related data may be collected during the window period of other diseases, resulting

in data noise.

Next, we analyze the multidimensional features of the original dataset. In this paper, we input m attributes

and n samples, where X

corresponds to the j

attribute eigenvalue of the i

sample. Here, k is a hypothetical

number of important features in the NMF, which is generally less than the number of attributes. Aer NMF

decomposition, W

corresponds to the correlation probability of the i

sample and the k

important feature, and

corresponds to the probabilistic correlation of the j

attribute and the k

important feature. e result of the

NMF is as follows:

剩余12页未读，继续阅读

评论收藏

内容反馈

weixin_38635682

粉丝: 0
资源: 968

An early intestinal cancer prediction Algorithm Based on Deep Be...

评论0

最新资源

An early intestinal cancer prediction Algorithm Based on Deep Be...

评论0

MULTIPHOTON MICROSCOPIC IMAGING OF MOUSE INTESTINAL MUCOSA BASED ON TWO-PHOTON EXCITED FLUORESCENCE AND SECOND HARMONIC GENERATION

A single-cell survey of the small intestinal epithelium

Effects of Cold Stress on the Intestinal Microbiota and Morphology in Quail Cecum

The Prolonged Effect of Glucagon-like Peptide 2 on Intestinal Mucosal Barrier Injury in Lipopolysaccharide-Challenged Weaned Piglets

intestinal robot

A Prebiotic for Intestinal Health

Surface charge triggered intestinal epithelial tight junction opening for insulin oral delivery

Zinc Prevents Salmonella enterica Serovar Typhimurium-Induced Loss of Intestinal Mucosal Barrier Function in broiler chickens

In vitro screening of lactobacilli with antagonistic activity against Campylobacter jejuni growth and infection in human intestinal epithelial cells

Isolation, identification and characterization of human intestinal bacteria with the ability to utilize chloramphenicol as the sole source of carbon and energy

关于人工智能的英文作文.docx

restroom-tracker

Intestinal_motility_analysis:用于肠道运动分析的Imagej插件

visual-intestinal-ecosystem:图宾根大学“生物数据可视化”小组项目

Cell Motility Analysis Package-开源

旋毛虫幼虫体外对正常小鼠肠上皮细胞侵入及发育的观察

鸡法氏囊中血管活性肠肽的表达特点 (2005年)

锚定和延伸肠道微型机器人的原型及体外实验

行人惯性导航零速检测算法

混合动力汽车基于规则的控制和ECMS与优化等效因子的实时能源管理策略

基于CORDIC的反正弦和反余弦计算的FPGA实现

BA无标度网络中的SIR模型

使用3DCNN和卷积LSTM进行手势识别学习时空特征

基于三次贝塞尔曲线的类汽车曲率连续路径平滑

基于机器学习的设备剩余寿命预测方法综述

基于无差拍预测控制和扰动观测器的永磁同步电机电流控制

基于FPGA的奇异值和特征值分解的快速实现。

基于BP神经网络的人口预测

两轮平衡车的建模与控制研究

最新资源