基于机器学习的冠心病诊断内含数据集和运行环境说明.zip资源-CSDN文库

共180个文件

py：98个

png：32个

mat：16个

版权申诉

机器学习

深度学习

数据集

项目研究

106 浏览量 2024-05-01 22:05:15 上传评论收藏 25.94MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

基于机器学习的冠心病诊断内含数据集和运行环境说明.zip （180个子文件）

heart.csv 11KB

iris.csv 5KB

messy_data.data 1KB

tree_gini_Wine_data.gvz 6KB

tree_deviance.gvz 1KB

tree_entropy.gvz 1KB

tree_gini.gvz 1KB

Project 3 - Unsupervised Learning.ipynb 1.74MB

Project 1 - Data feature extraction and visualisation.ipynb 618KB

Project 2 - Supervised Learning.ipynb 458KB

wildfaces_grayscale.mat 11.5MB

digits.mat 3.68MB

zipdata.mat 3.68MB

wine.mat 123KB

wine2.mat 117KB

synth4.mat 49KB

synth5.mat 8KB

synth7.mat 8KB

synth3.mat 7KB

synth1.mat 7KB

synth2.mat 7KB

xor.mat 7KB

body.mat 2KB

iris.mat 2KB

faithful.mat 2KB

synth6.mat 594B

README.md 84KB

README.md 56KB

README.md 32KB

README.md 2KB

tree_gini_wine_data.png 3.03MB

output_32_0.png 805KB

tree_entropy.png 472KB

tree_gini.png 451KB

output_39_0.png 170KB

output_73_0.png 110KB

output_42_0.png 99KB

output_27_0.png 85KB

output_13_0.png 67KB

output_37_1.png 60KB

output_40_1.png 60KB

output_38_1.png 60KB

output_39_1.png 59KB

output_21_1.png 54KB

output_39_2.png 24KB

output_40_2.png 24KB

output_38_2.png 23KB

output_36_0.png 22KB

output_37_2.png 21KB

output_17_4.png 21KB

output_17_3.png 20KB

output_17_1.png 20KB

output_17_5.png 19KB

output_17_2.png 19KB

output_69_0.png 18KB

output_24_1.png 18KB

output_33_1.png 15KB

output_30_1.png 14KB

output_17_6.png 13KB

output_70_1.png 10KB

output_64_1.png 10KB

output_67_1.png 9KB

__init__.py 50KB

ANN.py 9KB

LinReg Lambda.py 8KB

#9 - ANN - binary classification using ANN and logistic regression.py 7KB

#2 - ML Data Loading - Cleaning messy data.py 7KB

#9 - Regularization - Finding best regularization factor and weights for a linear regression, and compare test errors.py 6KB

#9 - ANN - regression using ANN and showing diagram of best neural net.py 6KB

#7 - Cross-Validation - Multi-level cross-validation with linear regression and feature selection of attributes.py 5KB

#9 - ANN - binary classification using ANN and showing diagram of best neural net.py 4KB

#2 - ML Data Loading - Plotting of classification and regression problems.py 4KB

#12 - KDE, KNN and ARD - Plotting outliers of hand-written digits using KDE, KNN density and ARD density methods.py 4KB

#1 - Basic Python operations - Matrix operations.py 4KB

#5 - Data Visualization - Identifying and removing outliers based on data visualization.py 3KB

#2 - ML Data Loading - CSV format.py 3KB

#3 - PCA - Standardized vs Non-standardized - Cumulative Variance, Projections and Attribute Coefficient - Plots.py 3KB

#10 - Boosting - Using AdaBoost technique upon an ensemble of logistic regressions.py 3KB

#9 - Regularization - Finding best regularization strength for logistic regression.py 3KB

similarity.py 3KB

#9 - ANN - multinominal classification using ANN and softmax function.py 3KB

#8 - KNN - Classification, plotting and confusion matrix plot for KNN model.py 3KB

#7 - Cross-Validation - Model selection based on their mean error difference, using cross-validation.py 3KB

#3 - Digit PCA - Visualize digit reconstruction images according to different PCx.py 2KB

#7 - Cross-Validation - Test & training error across K cross-validation folds of Decision Tree model.py 2KB

#11 - K-means clustering - Cluster images and show their corresponding centroids.py 2KB

#12 - GMM - Fitting a GMM and showing its cluster plot.py 2KB

#4 - Measures of similarity - Similarity scores in digit recognizing (using all measures!).py 2KB

#12 - GMM - Selecting the number of GMM components K using CV, BIC and AIC.py 2KB

#5 - Data Visualization - Pairplot showing best dataset class-separation features.py 2KB

#8 - Naive Bayes - Multinominal NB error rate, using one-level CV and one-hot encoded categorical variables.py 2KB

#9 - Multinominal log regression - Histogram of label classifications and decision boundary plot.py 2KB

#4 - Bag-of-words - Creation of document-term matrix.py 2KB

#10 - Bagging - Using bagging technique upon an ensemble of logistic regressions.py 2KB

#5 - Multivariate Normal Distribution - Generating new handwritten digits using uni and multivariate normal distribution.py 2KB

#4 - Bag-of-words - Stemming of document-term matrix.py 2KB

#7 - Model error - Misclassification rate, test & training error of Decision Tree model, dependent on model complexity.py 2KB

共 180 条

## <i> Project 2: Supervised Machine Learning </i> ### **_by Sebastian Sbirna_** *** In this report, we will evaluate the performance and characteristics of various types of supervised learning models upon our chosen Heart Disease dataset. For more information upon the actual data dictionary and a description of the properties of our observations and attributes, please refer to our former project [1]. ### I. Regression Models ```python import numpy as np import pandas as pd import matplotlib.pyplot as plt from matplotlib.pyplot import figure, plot, xlabel, ylabel, legend, show, clim, semilogx, loglog, title, subplot, grid import sklearn.linear_model as lm from sklearn.linear_model import Ridge, LinearRegression, LogisticRegression from sklearn import model_selection, tree from scipy import stats import torch from toolbox_02450 import feature_selector_lr, bmplot, rlr_validate, train_neural_net import warnings warnings.filterwarnings('ignore') %matplotlib inline ``` ### Regression, part A Our dataset was collected for classification purposes, having 14 attributes: 5 numerical and 9 categorical. Out of the 14 attributes, there exists one ‘target’ variable which should be predicted in a classification setting. Otherwise, there are no variables which were collected specifically for a regression purpose within this dataset, therefore model results within this section may be prone to more errors. Still, we will use one of the five numerical (_of ratio type_) attributes as our criterion (_i.e. dependent_) variable for a regression analysis, where the other 13 attributes (_which increase to 20 attributes after a one-out-of-K encoding_) will all act as predictor (_i.e. independent_) variables. Now, we must decide which variable is best suited to serve as criterion. However, our dataset has one important peculiarity which makes regression extremely difficult: most of our attributes are either uncorrelated or weakly correlated with each other, resulting in very little predictive power being stored in our dataset for any variable. #### Loading the dataset and performing data wrangling: ```python df = pd.read_csv('heart.csv') ``` ```python df.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>age</th> <th>sex</th> <th>cp</th> <th>trestbps</th> <th>chol</th> <th>fbs</th> <th>restecg</th> <th>thalach</th> <th>exang</th> <th>oldpeak</th> <th>slope</th> <th>ca</th> <th>thal</th> <th>target</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>63</td> <td>1</td> <td>3</td> <td>145</td> <td>233</td> <td>1</td> <td>0</td> <td>150</td> <td>0</td> <td>2.3</td> <td>0</td> <td>0</td> <td>1</td> <td>1</td> </tr> <tr> <th>1</th> <td>37</td> <td>1</td> <td>2</td> <td>130</td> <td>250</td> <td>0</td> <td>1</td> <td>187</td> <td>0</td> <td>3.5</td> <td>0</td> <td>0</td> <td>2</td> <td>1</td> </tr> <tr> <th>2</th> <td>41</td> <td>0</td> <td>1</td> <td>130</td> <td>204</td> <td>0</td> <td>0</td> <td>172</td> <td>0</td> <td>1.4</td> <td>2</td> <td>0</td> <td>2</td> <td>1</td> </tr> <tr> <th>3</th> <td>56</td> <td>1</td> <td>1</td> <td>120</td> <td>236</td> <td>0</td> <td>1</td> <td>178</td> <td>0</td> <td>0.8</td> <td>2</td> <td>0</td> <td>2</td> <td>1</td> </tr> <tr> <th>4</th> <td>57</td> <td>0</td> <td>0</td> <td>120</td> <td>354</td> <td>0</td> <td>1</td> <td>163</td> <td>1</td> <td>0.6</td> <td>2</td> <td>0</td> <td>2</td> <td>1</td> </tr> </tbody> </table> </div> ```python df.drop(index = (df[df.ca == 4]).index, inplace = True) df.drop(index = (df[df.thal == 0]).index, inplace = True) df.loc[df.thal == 1, 'thal'] = 6 df.loc[df.thal == 3, 'thal'] = 7 df.loc[df.thal == 2, 'thal'] = 3 df.loc[df.cp == 0, 'cp'] = 4 df.loc[df.cp == 3, 'cp'] = 7 df.loc[df.cp == 2, 'cp'] = 3 df.loc[df.cp == 1, 'cp'] = 2 df.loc[df.cp == 7, 'cp'] = 1 df.loc[df.slope == 2, 'slope'] = 3 df.loc[df.slope == 1, 'slope'] = 2 df.loc[df.slope == 0, 'slope'] = 1 ``` ```python numerical_columns = ['age', 'trestbps', 'chol', 'thalach', 'oldpeak'] ``` ```python df['sex_male'] = df.sex df.drop(columns = 'sex', inplace = True) df = pd.get_dummies(data = df, columns = ['cp'], drop_first=True) df.rename({'cp_2': 'cp_atypical', 'cp_3' : 'cp_non_anginal', 'cp_4': 'cp_asymptomatic'}, axis = 'columns', inplace = True) df['fbs_true'] = df.fbs df.drop(columns = 'fbs', inplace = True) df = pd.get_dummies(data = df, columns = ['restecg'], drop_first=True) df.rename({'restecg_1': 'restecg_st_t', 'restecg_2' : 'restecg_hypertrophy'}, axis = 'columns', inplace = True) df['exang_yes'] = df.exang df.drop(columns = 'exang', inplace = True) df = pd.get_dummies(data = df, columns = ['slope'], drop_first=True) df.rename({'slope_2': 'slope_flat', 'slope_3' : 'slope_downsloping'}, axis = 'columns', inplace = True) df = pd.get_dummies(data = df, columns = ['ca'], drop_first=True) df = pd.get_dummies(data = df, columns = ['thal'], drop_first=True) df.rename({'thal_6': 'thal_fixed', 'thal_7' : 'thal_reversible'}, axis = 'columns', inplace = True) df['target_true'] = df.target df.drop(columns = 'target', inplace = True) df.rename({'target_true': 'target'}, axis = 'columns', inplace = True) ``` ```python df.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>age</th> <th>trestbps</th> <th>chol</th> <th>thalach</th> <th>oldpeak</th> <th>sex_male</th> <th>cp_atypical</th> <th>cp_non_anginal</th> <th>cp_asymptomatic</th> <th>fbs_true</th> <th>...</th> <th>restecg_hypertrophy</th> <th>exang_yes</th> <th>slope_flat</th> <th>slope_downsloping</th> <th>ca_1</th> <th>ca_2</th> <th>ca_3</th> <th>thal_fixed</th> <th>thal_reversible</th> <th>target</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>63</td> <td>145</td> <td>233</td> <td>150</td> <td>2.3</td> <td>1</td> <td>0</td> <td>0</td> <td>0</td> <td>1</td> <td>...</td> <td>0</td> <td>0</td> <td>0</td> <td>0</td> <td>0</td> <td>0</td> <td>0</td> <td>1</td> <td>0</td> <td>1</td> </tr> <tr> <th>1</th> <td>37</td> <td>130</td> <td>250</td> <td>187</td> <td>3.5</td> <td>1</td> <td>0</td> <td>1</td> <td>0</td> <td>0</td> <td>...</td> <td>0</td> <td>0</td> <td>0</td> <td>0</td> <td>0</td> <td>0</td> <td>0</td> <td>0</td> <td>0</td> <td>1</td> </tr> <tr> <th>2</th> <td>41</td> <td>130</td> <td>204</td> <td>172</td> <td>1.4</td> <td>0</td> <td>1</td> <td>0</td> <td>0</td> <td>0</td> <td>...</td> <td>0</td> <td>0

评论收藏

内容反馈

版权申诉