# hd_knn_tree
Decision Tree and K-Nearest Neighbors analysis of Heart Disease dataset using RStudio. Also compare with [Logistic Regression](https://github.com/danypark91/hd_log_reg) to figure which is better model to predict the dataset.
### Tech/Framework used
* Rstudio
* Rmarkdown
### RStudio Library used
* library(caTools)
* library(class)
* library(kknn)
* library(caret)
* library(ROCR)
* library(rpart)
* library(rpart.plot)
* library(MASS)
* library(tidyverse)
* library(ggsci)
### Installation of R packages
`rpack <- c("kknn", "caret", "class","caTools", "ROCR", "rpart", "rpart.plot", "MASS", "tidyverse", "ggsci")`
`install.packages(rpack)`
### Dataset
The [original dataset](https://archive.ics.uci.edu/ml/datasets/Heart+Disease) from UCI contained 76 attributes which represent a patient's condition. The dataset for this article is from [Kaggle - Heart Disease UCI](https://www.kaggle.com/ronitf/heart-disease-uci). The subset of 14 attributes with every incident represents a patient.
### Project Description
This project is to apply decision tree and k-nearest neighbor to the dataset. It begins with the importation of the dataset from the local device and checks if it requires data cleansing. The cleansed data divides into train and test sets with a ratio of 3 to 1. The best-fit model gets derived by using train_df. The model undergoes statistical tests to determine scientific accuracy. The model is applied to the test_df to check the predictability of the models. Finally, the predictability of the discovered models are compared with the logistic regression model.
This project does not consist data cleansing and visualization. The [hd_log_reg](https://github.com/danypark91/hd_log_reg) notebook has two of required steps for the same dataset used for this project.
### Reference
* [Classification of Decision Tree](https://pages.mtu.edu/~shanem/psy5220/daily/Day12/classification.html)
* [Classification of KNN](https://pages.mtu.edu/~shanem/psy5220/daily/Day13/treesforestsKNN.html)
* James, G., Witten, D., Hastie, T., & Tibshirani, R. (2017). An introduction to statistical learning with applications in R. New York, N.Y: Springer.
没有合适的资源?快使用搜索试试~ 我知道了~
hd_knn_tree:使用心脏病数据集的决策树和K最近邻居
共19个文件
png:13个
md:2个
rproj:1个
需积分: 46 8 下载量 10 浏览量
2021-03-10
13:41:43
上传
评论 1
收藏 104KB ZIP 举报
温馨提示
hd_knn_tree 使用RStudio对心脏病数据集进行决策树和K最近邻分析。 还要与进行比较,以找出哪种模型可以更好地预测数据集。 使用的技术/框架 Rstudio Rmarkdown 使用的RStudio库 库(caTools) 图书馆(班) 图书馆(kknn) 图书馆(插入符号) 图书馆(ROCR) 库(rpart) 库(rpart.plot) 图书馆(MASS) 图书馆(tidyverse) 图书馆(ggsci) 安装R软件包 rpack <- c("kknn", "caret", "class","caTools", "ROCR", "rpart", "rpart.plot", "MASS", "tidyverse", "ggsci") install.packages(rpack) 数据集 来自UCI的包含76个代表患者状况的属性。 本文的数据集来
资源详情
资源评论
资源推荐
收起资源包目录
hd_knn_tree-main.zip (19个子文件)
hd_knn_tree-main
hd_knn_tree.md 22KB
hd_lda_knn_fullcode.R 8KB
hd_knn_tree_files
figure-gfm
For Loop KNN from 1 to 15-1.png 5KB
Sensitivity-1.png 8KB
KNN ROC and AUC-1.png 5KB
DT ROC and AUC-1.png 7KB
ROC Comparison-1.png 6KB
Fitting the model-1.png 6KB
F1-1.png 8KB
Logistic Regression-1.png 6KB
ROC and AUC-1.png 6KB
Specificity-1.png 8KB
AUC Score-1.png 8KB
Prune-1.png 14KB
Accuracy Comparison-1.png 8KB
README.md 2KB
hd_knn_tree.Rmd 20KB
.gitignore 86B
hd_lda_knn.Rproj 205B
共 19 条
- 1
谁家扁舟子
- 粉丝: 27
- 资源: 4680
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0