没有合适的资源?快使用搜索试试~ 我知道了~
人工智能-深度学习-基于深度学习的植物miRNA靶基因预测研究.pdf
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 84 浏览量
2022-06-27
06:04:45
上传
评论
收藏 7.93MB PDF 举报
温馨提示
试读
63页
人工智能-深度学习-基于深度学习的植物miRNA靶基因预测研究.pdf
资源推荐
资源详情
资源评论
符 号 说 明
3'UTR:3' Untranslated Regions,3'非编码区
5'UTR
:
5' Untranslated Regions
,
5'
非编码区
AUC:Area Under Curve,ROC 曲线下的面积
BiLSTM
:
Bi-Directional Long Short-Term Memory
,双向长短期记忆网络
CNN:Convolutional Neural Network,卷积神经网络
FN:False Negative,预测结果为负但真实值为正
FP
:
False Positive
,预测结果为正但真实值为负
FPR:False Positive Rate,假正例率
HMM:Hidden Markov Model,隐马尔可夫模型
miRNA
:
micro RNA
,一类非编码单链 RNA 分子
mRNA:messenger RNA,信使 RNA
piRNA:PIWI interacting RNA,与 Piwi 蛋白相作用的 RNA
Pol II
:
RNA polymerase II
,
RNA
聚合酶Ⅱ
PTGS:Post-TranscriptionalGeneSilencing,转录后基因沉默
RISC
:
RNA-Induced silencing complex
,
RNA
诱导沉默复合体
RNA:Ribonucleic Acid,核糖核酸
RNN
:
Recurrent Neural Network
,递归神经网络
ROC:Receiver Operating Characteristic Curve,受试者工作特征曲线
siRNA
:
Small interfering RNA
,小干扰
RNA
SVM:Support Vector Machine,支持向量机
TN
:
True Negative
,预测结果为负并且真实值也为负
TP:True Positive,预测结果为正并且真实值也为正
TPR:True Positive Rate,真正例率
XGboost
:
Extreme Gradient Boosting
,一种
tree boosting
的可扩展机器学习系统
万方数据
目 录
中文摘要.................................................................................................................................. I
英文摘要................................................................................................................................. II
1
绪论
.......................................................................................................................................1
1.1 课题背景及意义................................................................................................................1
1.2
国内外研究现状
................................................................................................................2
1.3
研究内容和技术路线
........................................................................................................5
2 基础知识介绍.......................................................................................................................7
2.1 miRNA 相关知识介绍...................................................................................................... 7
2.1.1 miRNA
概述
................................................................................................................... 7
2.1.2 植物 miRNA 合成过程.................................................................................................. 8
2.1.3 miRNA
靶基因及其作用机制
....................................................................................... 9
2.2 深度学习技术应用..........................................................................................................10
2.2.1 卷积神经网络...............................................................................................................11
2.2.2 激活函数.......................................................................................................................11
2.2.3
双向长短期记忆网络
BiLSTM................................................................................... 13
2.2.4 XGBoost........................................................................................................................15
2.3
小结
..................................................................................................................................15
3 植物 miRNA 靶基因预测数据获取和处理......................................................................16
3.1 数据获取..........................................................................................................................16
3.1.1
植物
miRNA
数据获取
................................................................................................ 16
3.1.2 miRNA-target 数据获取............................................................................................... 18
3.2 建立均衡数据集..............................................................................................................20
3.3
原始基因数据处理
..........................................................................................................22
3.3.1 碱基替换和序列补齐...................................................................................................22
3.3.2 数据预处理...................................................................................................................23
3.3.3
设置数据标签
...............................................................................................................24
3.4 小结..................................................................................................................................24
4
基于深度学习的靶基因预测
.............................................................................................25
4.1 网络模型实现..................................................................................................................25
万方数据
4.1.1 卷积阶段实现...............................................................................................................27
4.1.2 BiLSTM 阶段实现........................................................................................................29
4.1.3 全连接阶段实现...........................................................................................................29
4.2
模型参数设置和数据集构建
..........................................................................................30
4.2.1 模型参数设置...............................................................................................................30
4.2.2
数据集构建
...................................................................................................................30
4.3 深度学习框架和配置环境..............................................................................................31
4.4
实验结果与分析
..............................................................................................................32
4.4.1 验证方法与评价方法...................................................................................................32
4.4.2
训练和验证预测结果
...................................................................................................33
4.4.3 基于不同方法的结果对比...........................................................................................37
4.4.4
基于实验验证的真实数据验证结果
...........................................................................38
4.5 小结..................................................................................................................................39
5 植物 miRNA 靶基因预测系统设计与实现......................................................................40
5.1
系统设计与开发
..............................................................................................................40
5.2 系统实现..........................................................................................................................41
5.3
小结
..................................................................................................................................46
6
结论与展望
.........................................................................................................................47
6.1 研究结论..........................................................................................................................47
6.2 研究展望..........................................................................................................................48
参考文献
................................................................................................................................50
致谢........................................................................................................................................59
攻读学位期间取得的科研成果
............................................................................................60
万方数据
山东农业大学硕士专业学位论文
I
中 文 摘 要
生物体内存在着种类和功能各异的 RNA,其中非编码 RNA 是近年来新发现的一
类 RNA,它们在生物体的生命过程中主要起调控作用,而 miRNA 就是非编码 RNA 中
最有代表性的一类。在植物中,
miRNA
可以通过碱基互补匹配的方式与靶基因相互
识别,并以此对靶基因介导翻译抑制或对靶基因进行切割,进而影响基因性状的表
达。基于
miRNA
及其靶向机制对生物体的重要作用,本文通过研究植物
miRNA
和
靶基因的生物特性,采用深度学习的算法,设计了一种植物 miRNA 靶基因预测模
型:DeepMiRNA,并开发了基于 web 的植物 miRNA 靶基因预测系统。
自
miRNA
被发现,
miRNA
的相关数据量在不断攀升,对
miRNA
靶基因的预测
也从传统的单一靶基因序列验证转变为借助计算机技术中的大数据和机器学习、深度
学习等技术进行预测。并且由于一个 miRNA 一般会有多个靶基因,因此计算方法的
产生极大的推动了
miRNA
靶基因预测的效率和准确性。因此基于当前的研究现状,
本文利用在序列类自然语言处理中有出色表现的卷积神经网络(CNN)和循环神经
网络的特殊形式双向长短期记忆网络(BiLSTM),设计出针对植物 miRNA 靶基因
的预测模型
DeepMiRNA
。在数据的选择上,本文选择了拟南芥、大豆和水稻三种植
物 miRNA 数据,并将三类植物数据进行混合产生混合数据。对数据的处理中包括对
原始基因数据的碱基替换、序列补齐、数据编码等过程,从而将原始基因数据转换为
可以输入模型的数据结构。经过模型的训练和测试实验,结果表明 DeepMiRNA 模型
在基于拟南芥的数据中可以达到
93%
左右的准确率;在基于大豆的数据中可以达到
89%左右的准确率;在基于水稻的数据中可以达到 91%左右的准确率;在基于混合的
数 据 中 可 以 达 到
90%
左 右 的 准 确 率 。 在 与 其 他 分 类 算 法 进 行 对 比 后 , 显 示
DeepMiRNA 模型在植物 miRNA 靶基因的预测问题上表现优异,预测结果优于对比
的其他算法,表明本模型可以实现对该问题较好地进行分类。
为进一步推进 DeepMiRNA 模型在植物 miRNA 靶基因预测上的应用,本文开发
了植物 miRNA 靶基因预测系统,使用者可以使用该系统进行在线的靶基因预测并获
得预测结果(http://www.deepbiology.cn/deepmi/)。
关键词:植物 miRNA;靶基因预测;卷积神经网络;长短期记忆网络
万方数据
基于深度学习的植物 miRNA 靶基因预测研究
II
Plant miRNA Target Prediction Research Based on Deep
Learning
Abstract
There are various types and functions of RNA in living organisms. One type of non-
coding RNA is a newly discovered type of RNA in recent years. This type of RNA mainly
includes microRNA (miRNA), piRNA etc., they mainly play a regulatory role in the life
process of organisms, and miRNA is the most representative type of non-coding RNA.In
plants, miRNAs can recognize each other with target genes by means of complementary
base matching, and use this to mediate target gene-mediated translation inhibition or cut the
target gene, thereby affecting the expression of gene traits. Based on the important role of
miRNA and its targeting mechanism on organisms, this paper studies the biological
characteristics of plant miRNAs and target genes, and uses deep learning algorithms to
design a plant miRNA target gene prediction model:DeepMiRNA, and develops a web-
based Plant miRNA target gene prediction system.
Since the discovery of miRNA, the amount of miRNA-related data has been rising.The
prediction of miRNA target genes has also changed from the traditional single target gene
sequence verification to the use of computer technology big data and machine learning,
deep learning and other technologies for prediction. Because a miRNA generally has
multiple target genes, the generation of calculation methods has greatly promoted the
efficiency and accuracy of prediction of miRNA target genes. Therefore, based on the
current research status, this paper uses convolutional neural network (CNN) and the special
form of recurrent neural network (BiLSTM), which has excellent performance in sequence-
based natural language processing, to design a deep miRNA prediction model for plant
miRNA target genes. In terms of data selection, this paper selects miRNA data from three
plants of Arabidopsis thaliana, soybean, rice and mixed the three types of plant data to
generate mixed data. The processing of the data includes processes such as base
replacement, sequence completion, and data encoding of the original genetic data, so as to
convert the original genetic data into a data structure that can be input to a model. After
model training and testing experiments, the results show that the DeepMiRNA model can
achieve an accuracy rate of about 93% in Arabidopsis thaliana-based data; an accuracy rate
of about 88% in soybean-based data; and in rice-based data it can reach an accuracy rate of
about 91%; it can reach an accuracy rate of about 90% in mixed-based data.After
万方数据
剩余62页未读,继续阅读
资源评论
programhh
- 粉丝: 8
- 资源: 3838
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功