# ER breast cancer prediction using gradient descent logistic regression
### By Georgina Gonzalez
Copy Number Aberrations, gains and losses of genomic regions, are a hallmark of cancer. Copy number data is high-dimensional and is characterized by heavy correlated features. Often, like in this case, the number of samples is small compared to the number of features. In this work I first reduce the dimensionality using Topological Analysis of array CGH (TAaCGH) [1] detecting regions of the genome with significant aberrations in copy number for patients with over-expression in estrogen receptor (ER+). Next it is determined if each of the patients is aberrant for those particular regions creating, as a result, a set of binary variables that will be used as features in a gradient descent logistic regression model to predict ER+ breast cancer [2].
The file regularized_optim_logit_ERpos.pdf shows the results and describes each step that leads to the final model with F1 score of 71.4% for ER+ breast cancer.
### References
[1] Daniel DeWoskin, Joan Climent, I Cruz-White, Mariel Vazquez, Catherine Park, and Javier Arsuaga. Applications of computational homology to the analysis of treatment response in breast cancer patients. Topology and its Applications, 157(1):157–164, 2010.
[2] Gonzalez G, Ushakova A, Sazdanovic R, Arsuaga J. Prediction in cancer genomics using topological signatures and machine learning. The Abel Symposium “Topological Data Analysis” 2018At: Geiranger, NorwayVolume: (in Press).
Climent data set: Joan Climent, Peter Dimitrow, Jane Fridlyand, Jose Palacios, Reiner Siebert, Donna G Al- bertson, Joe W Gray, Daniel Pinkel, Ana Lluch, and Jose A Martinez-Climent. Deletion of chromosome 11q predicts response to anthracycline-based chemotherapy in early breast cancer. Cancer research, 67(2):818–826, 2007.
Horlings data set: Hugo M Horlings, Carmen Lai, Dimitry SA Nuyten, Hans Halfwerk, Petra Kristel, Erik van Beers, Simon A Joosse, Christiaan Klijn, Petra M Nederlof, Marcel JT Reinders, et al. In- tegration of dna copy number alterations and prognostic gene expression signatures in breast cancer patients. Clinical Cancer Research, 16(2):651–663, 2010.
### The scripts
- sigmoid.R - sigmoid function
- cost_reg_logit.R - cost function for logistic regression
- predict.R - prediction function
- score.R - score for logistic regression: F1 or Accuracy
- grad_reg_logit.R - gradient descent for one lambda
- grad_reg_logit_optim_iterLambda.R - gradient descent for a vector of lambdas
- curveLambdaVSscore.R - plot score (F1 or Accuracy) for every lambda
- plot_thetaVSlambda.R - plot coefficients vs lambda
- learningCurve.R - plot learning curve for the final model
MachineLearning-ERbreastCancer:带有梯度下降逻辑回归的ER乳腺癌预测
需积分: 22 127 浏览量
2021-02-16
20:09:45
上传
评论
收藏 245KB ZIP 举报
陶涵煦
- 粉丝: 27
- 资源: 4654
最新资源
- 基于keras+fasterRCNN,在VOC格式的口罩数据集上训练,检测人群中有无戴口罩python源码+模型
- push_version
- 软件自制图像批量压缩工具
- 基于深度学习的抗梯度噪声的缺陷检测器python源码+文档说明+模型的预训练
- 基于python+pytorch+mysql实现停车场车牌识别管理系统源码+文档说明
- 基于QT+MySQl+OpenCV车牌识别搭建停车场管理系统C++源码+文档说明+界面展示
- 基于深度学习的停车场收费系统-车牌识别模块python源码+文档说明+博客教学
- 空白.pages
- 基于Java+Springboot+vue的智能停车场管理系统(源代码+数据库+9000字论文) 本项目前后端不分离+部署教程
- 基于SSM写的停车场管理系统,加入了车牌识别和数据分析+源码+文档说明
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
评论0