# TFDeepSurv
Deep Cox proportional risk model and survival analysis implemented by tensorflow.
The suggested TensorFlow version is 1.15.3. And module testing was passed under TensorFlow-1.15.3.
**NOTE:** [tfdeepsurv-v2.1.0](https://github.com/liupei101/TFDeepSurv/releases) has been released. The old version is on branch `archive_v1`. Compared with version `v1.0`, current version largely improved:
- speed on building computational graph
- utilizing raw tensorflow ops to compute loss function (for handling ties)
- unified format of survival data
- code elegance and simplicity
Read FAQ below firstly if you have problems or directly send an email to me.
## 1. Differences from DeepSurv
[DeepSurv](https://github.com/jaredleekatzman/DeepSurv), a package of Deep Cox proportional risk model, is open-source on Github. But our works may shine in:
- Supporting ties of death time in your survival data, which means different loss function and estimator of survival function (`Breslow` approximation).
- Providing survival function estimation.
- Tuning hyperparameters of DNN using a scientific method - Bayesian Hyperparameters Optimization.
- Implementing by the popular deep learning framework - tensorflow
## 2. Contributors
Special thanks to those who contributed or gave helpful suggestions (Github account or nickname instead):
- [taketakeseijin](https://github.com/taketakeseijin)
- [rmaanyam](https://github.com/rmaanyam)
- [yiyansong](https://github.com/yiyansong)
- H. Quan
- F. Pang
## 3. Installation
### From source
Download TFDeepSurv package and install from the directory (**Python version >= 3.5**):
```bash
git clone https://github.com/liupei101/TFDeepSurv.git
cd TFDeepSurv
pip install .
```
## 4. Get it started:
### 4.1 Runing with simulated data
Read [Notebook - tfdeepsurv_data_simulated.ipynb](examples/tfdeepsurv_data_simulated.ipynb) for more details!
#### 4.1.1 prepare datasets
```python
from tfdeepsurv.datasets import load_simulated_data
### generate simulated data (Pandas.DataFrame)
# data configuration:
# hazard ratio = 2000
# number of features = 10
# number of valid features = 2
# No. of training data = 2000
train_data = load_simulated_data(2000, N=2000, num_var=2, num_features=10, seed=1)
# No. of training data = 800
test_data = load_simulated_data(2000, N=800, num_var=2, num_features=10, seed=1)
```
#### 4.1.2 obtain statistics of survival dataset
```python
from tfdeepsurv.datasets import survival_stats
survival_stats(train_data, t_col="t", e_col="e", plot=True)
```
result :
```txt
--------------- Survival Data Statistics ---------------
# Rows: 2000
# Columns: 10 + e + t
# Event Percentage: 74.00%
# Min Time: 0.0001404392
# Max Time: 15.0
```
![](tools/README-survival-status.png)
#### 4.1.3 transfrom survival data
The transformed survival data includes the existing covariates and a new label column. In the new label column, a negative value indicates that this one is a right-censored sample, and a positive value indicates an event occurrence. The new label column 'Y' is simply generated from the time and event columns according to the below equation.
```
Y = time, if event = 1
Y = -time, if event = 0
```
**NOTE**: In the latest version 2.1, survival data must be transformed via `tfdeepsurv.datasets.survival_df`.
```python
from tfdeepsurv.datasets import survival_df
surv_train = survival_df(train_data, t_col="t", e_col="e", label_col="Y")
surv_test = survival_df(test_data, t_col="t", e_col="e", label_col="Y")
# columns 't' and 'e' are packed into an new column 'Y'
```
#### 4.1.4 initialize your neural network
```python
from tfdeepsurv import dsnn
input_nodes = len(surv_train.columns) - 1
hidden_layers_nodes = [6, 3, 1]
# the arguments of dsnn can be obtained by Bayesian Hyperparameters Tuning
nn_config = {
"learning_rate": 0.7,
"learning_rate_decay": 1.0,
"activation": 'relu',
"L1_reg": 3.4e-5,
"L2_reg": 8.8e-5,
"optimizer": 'sgd',
"dropout_keep_prob": 1.0,
"seed": 1
}
# ESSENTIAL STEP: Pass arguments
model = dsnn(
input_nodes,
hidden_layers_nodes,
nn_config
)
# ESSENTIAL STEP: Build Computation Graph
model.build_graph()
```
#### 4.1.5 train your neural network model
```python
Y_col = ["Y"]
X_cols = [c for c in surv_train.columns if c not in Y_col]
# model saving and loading is also supported!
# read comments of `train()` function if necessary.
watch_list = model.train(
surv_train[X_cols], surv_train[Y_col],
num_steps=1900,
num_skip_steps=100,
plot=True
)
```
result :
```
Average loss at step 100: 7.07983
Average loss at step 200: 7.07982
Average loss at step 300: 7.07981
...
Average loss at step 1700: 6.29165
Average loss at step 1800: 6.29007
Average loss at step 1900: 6.28687
```
Curve of loss and CI:
Loss Value | CI
:-------------------------------:|:--------------------------------------:
![loss curve](tools/README-loss.png)|![ci curve](tools/README-ci.png)
#### 4.1.6 evaluate model performance
```python
print("CI on training data:", model.evals(surv_train[X_cols], surv_train[Y_col]))
print("CI on test data:", model.evals(surv_test[X_cols], surv_test[Y_col]))
```
result :
```txt
CI on training data: 0.8193206851448683
CI on test data: 0.8175830825866967
```
#### 4.1.7 Model prediction
Model prediction includes:
- predicting hazard ratio or log hazard ratio
- predicting survival function
```python
# predict log hazard ratio
print(model.predict(surv_test.loc[0:4, X_cols]))
# predict hazard ratio
print(model.predict(surv_test.loc[0:4, X_cols], output_margin=False))
```
result:
```txt
[[4.629786 ]
[4.8222055]
[0. ]
[1.4019105]
[0. ]]
[[102.49213 ]
[124.2388 ]
[ 1. ]
[ 4.062955]
[ 1. ]]
```
```python
# predict survival function
model.predict_survival_function(surv_test.loc[0:4, X_cols], plot=True)
```
result:
![Survival rate](tools/README-surv.png)
### 4.2 Runing with real-world data
The procedure on real-world data is similar with the described on simulated data. One we need to notice is data preparation.
More details can refer to [Notebook - tfdeepsurv_data_real.ipynb](examples/tfdeepsurv_data_real.ipynb).
## 5. More properties
We provide tools for hyperparameters tuning (Bayesian Hyperparameters Optimization) in deep neural network, which is automatic in searching optimal hyperparameters of DNN.
For more usage of Bayesian Hyperparameters Optimization, you can refer to [here](bysopt/README.md)
## 6. TODO List
Points needed to do in future version `v2.1.0`:
- requirements statement
- dockerfile and docker images of tfdeepsurv
- Github packge tools
## FAQ
update at any time.
**Q1.** How to install this package ?
> You can download or clone the latest package, and then install it using pip tools. TensorFlow would be installed as well. The version of TensorFlow requires `>=1.14.0, <2.0.0` as specified in `setup.py`.
**Q2.** My loss function curve could not converge, why?
> First of all, you can refer to [Notebook - tfdeepsurv_data_real.ipynb](examples/tfdeepsurv_data_real.ipynb) to understand the modeling procedure. Then, you can check the following items one by one: [1]. Whether your suvival data satisfies the requirement. Your original data must include covariates, time and event. And then it must be transformed as introduced in [Notebook - tfdeepsurv_data_real.ipynb](examples/tfdeepsurv_data_real.ipynb). [2]. Data normalization. The covariates should be normalized to the same magnitude if you want to get a quick convergence. [3]. Learning rate setting. It is better to set a relatively lower learning rate, such as 0.01.
没有合适的资源?快使用搜索试试~ 我知道了~
TFDeepSurv:张量流实现的COX比例风险模型和生存分析
共22个文件
py:9个
png:5个
ipynb:2个
需积分: 37 15 下载量 111 浏览量
2021-05-07
11:39:46
上传
评论 1
收藏 264KB ZIP 举报
温馨提示
TFDeepSurv 通过张量流实现Deep Cox比例风险模型和生存分析。 建议的TensorFlow版本为1.15.3。 并且模块测试在TensorFlow-1.15.3下通过了。 注意: 已发布。 旧版本位于分支archive_v1 。 与v1.0版本相比,当前版本有了很大的改进: 建立计算图的速度 利用原始的tensorflow操作来计算损失函数(用于处理关系) 生存数据的统一格式 代码优雅而简单 如果您有任何问题,请先阅读以下常见问题解答,或直接发送电子邮件给我。 1.与DeepSurv的区别 是Deep Cox比例风险模型的软件包,在Github上开源。 但是我们的作品可能会发光: 在您的生存数据中支持死亡时间的联系,这意味着不同的损失函数和生存函数的估计量( Breslow近似)。 提供生存函数估计。 使用科学方法-贝叶斯超参数优化来调整DNN的超参数。 通
资源详情
资源评论
资源推荐
收起资源包目录
TFDeepSurv-master.zip (22个子文件)
TFDeepSurv-master
setup.py 1KB
.gitignore 1KB
tfdeepsurv
vision.py 2KB
__init__.py 52B
datasets.py 6KB
dsl.py 13KB
utils.py 4KB
version.py 63B
simulator.py 6KB
LICENSE.txt 1KB
tools
README-survival-status.png 17KB
README-surv.png 16KB
README-loss.png 11KB
README-learning-curve.png 10KB
README-ci.png 11KB
examples
tfdeepsurv_data_real.ipynb 8KB
tfdeepsurv_data_simulated.ipynb 114KB
bysopt
README.md 4KB
log_hpopt.json 14KB
hpopt.py 5KB
simulated_data_train.csv 238KB
README.md 7KB
共 22 条
- 1
刘岩Lyle
- 粉丝: 41
- 资源: 4680
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 转载使用许可协议范本(互联网行业)模版.doc
- 软件产业运行情况调研问卷模版.doc
- 软件产品发布管理流程.doc
- 软件仿真多机串行通信.doc
- Python大作业:音乐播放软件(爬虫+可视化+数据分析+数据库)
- 课程设计-python爬虫-爬取日报,爬取日报文章后存储到本地,附带源代码+课程设计报告
- 软件和信息技术服务行业投资与前景预测.pptx
- 课程设计-基于SpringBoot + Mybatis+python爬虫NBA球员数据爬取可视化+源代码+文档+sql+效果图
- 软件品质管理系列二项目策划规范.doc
- 基于TensorFlow+PyQt+GUI的酒店评论情感分析,支持分析本地数据文件和网络爬取数据分析+源代码+文档说明+安装教程
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0