CTG胎儿健康分类，数据集，测试集，训练集以及实现代码

共4个文件

xlsx：2个

py：1个

csv：1个

需积分: 5 60 浏览量 2023-11-05 15:40:53 上传评论 2 收藏 91KB ZIP 举报

在IT行业中，机器学习是一个非常重要的领域，它利用统计学方法让计算机系统通过经验自我改进。在这个场景中，我们关注的是一个与胎儿健康相关的CTG（Cardiotocography）分类问题。CTG是一种监测胎儿心率和宫缩的医疗技术，用于评估胎儿的健康状况。数据集被分为训练集和测试集，这是机器学习模型建立的标准流程。 `train.csv`文件很可能包含了用于训练机器学习模型的数据。这个数据集通常包括多个特征，如胎儿的心率变化、宫缩频率、胎动等，以及一个目标变量，可能是一个二元分类（例如，正常或异常）或者更细致的分类。为了训练模型，我们需要对这些特征进行预处理，例如填充缺失值、标准化数值、编码类别变量等。同时，我们会使用部分数据（训练集）来拟合模型，而保留另一部分（验证集）用于调整模型参数和防止过拟合。 `new_baby.py`很可能是实现这一过程的Python脚本。在这个脚本中，开发者可能会导入相关的库，如Pandas用于数据处理，Numpy进行数值计算，Scikit-learn构建和训练模型。模型的选择会基于问题的特性，可能是逻辑回归、支持向量机、随机森林或者神经网络。模型训练完成后，会使用交叉验证来评估其性能，并可能使用网格搜索或随机搜索来优化超参数。 `test.xlsx`文件可能是测试数据集，用于在模型训练完成后检验模型的泛化能力。这个数据集通常是模型开发者未曾见过的新数据，用来模拟模型在实际应用中的表现。测试数据的评估指标可能包括准确率、精确率、召回率、F1分数等。 `predict.xlsx`文件则可能包含模型预测的结果，对比实际结果可以进一步分析模型的性能。它也可能用于医生或医疗专业人员的参考，帮助他们做出决策。这个项目涉及了数据预处理、模型选择、训练、验证和测试等多个步骤，是机器学习在医疗领域的一个典型应用。通过这样的方法，我们可以构建出一个能够根据CTG信号预测胎儿健康状况的智能系统，从而提升医疗服务的效率和质量。这样的工作需要深入理解机器学习算法、数据处理技巧以及医学背景知识，体现了IT行业与医疗领域的紧密融合。

资源推荐

资源详情

资源评论

收起资源包目录

CTG.zip （4个子文件）

new_baby.py 11KB

predict.xlsx 7KB

test.xlsx 48KB

train.csv 133KB

import pandas as pd import numpy as np import seaborn import matplotlib.pyplot as plt from sklearn.feature_selection import SelectFromModel import seaborn as sns import xlwt as xlwt from sklearn import svm from sklearn.decomposition import PCA from sklearn.preprocessing import StandardScaler from sklearn.model_selection import cross_val_score pd.set_option('display.max_columns', None) pd.set_option('display.expand_frame_repr', False) # 训练数据 data_train = pd.read_csv("E:\\mltest1\\train.csv") # 测试数据 data_test = pd.read_excel("E:\\mltest1\\test.xlsx") # 显示中文标题 plt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = False '''# 查看各列属性的数据量和缺失情况 print(data_train.info()) print(data_test.info()) # 查看各列属性的基本统计信息， print(data_train.describe()) print(data_test.describe()) # 查看胎儿的健康分布情况 print(data_train['fetal_health'].value_counts()) #查看直方图趋势的数量 print(data_train['histogram_tendency'].value_counts()) # 绘图 fig = plt.figure() # 基线值分布 plt.subplot2grid((2, 4), (0, 0)) data_train['baseline value'].hist() plt.ylabel(u"人数") plt.xlabel(u'基线值') plt.title(u'基数值分布') # 加速分布 plt.subplot2grid((2, 4), (0, 1)) data_train['accelerations'].hist() plt.ylabel(u"人数") plt.xlabel(u'加速') plt.title(u'加速分布') # 胎动分布 plt.subplot2grid((2, 4), (0, 2)) data_train['fetal_movement'].hist() plt.xlabel(u'胎动') plt.title(u'胎动分布') # 子宫收缩分布 plt.subplot2grid((2, 4), (0, 3)) data_train['uterine_contractions'].hist() plt.xlabel(u'子宫收缩') plt.title(u'子宫收缩分布') # 轻度减速分布 plt.subplot2grid((2, 4), (1, 0)) data_train['light_decelerations'].hist() plt.xlabel(u'轻度减速') plt.title(u'轻度减速分布') # 重度分布情况 plt.subplot2grid((2, 4), (1, 1)) data_train['severe_decelerations'].hist() plt.xlabel(u'重度减速') plt.title(u'重度减速分布') #持续减速分布情况 plt.subplot2grid((2, 4), (1, 2)) data_train['prolongued_decelerations'].hist() plt.xlabel(u'持续减速') plt.title(u'持续减速分布') #异常短期变异性分布情况 plt.subplot2grid((2, 4), (1, 3)) data_train['abnormal_short_term_variability'].hist() plt.xlabel(u'异常短期变异性') plt.title(u'异常短期变异性情况') # 短期变异性的平均值分布 plt.subplot2grid((3, 4), (0, 0)) data_train['mean_value_of_short_term_variability'].hist() plt.ylabel(u"人数") plt.xlabel(u'短期变异性的平均值') plt.title(u'短期变异性的平均值分布') # 异常长期变异性的时间百分比分布 plt.subplot2grid((3, 4), (0, 1)) data_train['percentage_of_time_with_abnormal_long_term_variability'].hist() plt.ylabel(u"人数") plt.xlabel(u'异常长期变异性的时间百分比') plt.title(u'异常长期变异性的时间百分比分布') # 长期变异性的平均值分布 plt.subplot2grid((3, 4), (0, 2)) data_train['mean_value_of_long_term_variability'].hist() plt.xlabel(u'长期变异性的平均值') plt.title(u'长期变异性的平均值分布') # 直方图宽度分布 plt.subplot2grid((3, 4), (0, 3)) data_train['histogram_width'].hist() plt.xlabel(u'直方图宽度') plt.title(u'直方图宽度分布') # 直方图最小值分布 plt.subplot2grid((3, 4), (1, 0)) data_train['histogram_min'].hist() plt.xlabel(u'直方图最小值') plt.title(u'直方图最小值分布') # 直方图最大值分布情况 plt.subplot2grid((3, 4), (1, 1)) data_train['histogram_max'].hist() plt.xlabel(u'直方图最大值') plt.title(u'直方图最大值分布') # 直方图峰值数量分布情况 plt.subplot2grid((3, 4), (1, 2)) data_train['histogram_number_of_peaks'].hist() plt.xlabel(u'直方图峰值数量') plt.title(u'直方图峰值数量分布') # 直方图零值数量分布情况 plt.subplot2grid((3, 4), (1, 3)) data_train['histogram_number_of_zeroes'].hist() plt.xlabel(u'直方图零值数量') plt.title(u'直方图零值数量情况') # 直方图模式分布情况 plt.subplot2grid((3, 4), (2, 0)) data_train['histogram_mode'].hist() plt.xlabel(u'直方图模式') plt.title(u'直方图模式情况') # 直方图均值分布情况 plt.subplot2grid((3, 4), (2, 1)) data_train['histogram_mean'].hist() plt.xlabel(u'直方图均值') plt.title(u'直方图均值情况') # 直方图中位数分布情况 plt.subplot2grid((3, 4), (2, 2)) data_train['histogram_median'].hist() plt.xlabel(u'直方图中位数') plt.title(u'直方图中位数情况') # 直方图方差分布情况 plt.subplot2grid((3, 4), (2, 3)) data_train['histogram_variance'].hist() plt.xlabel(u'直方图方差') plt.title(u'直方图方差情况') plt.show() ''' df = data_test[['baseline value', 'accelerations', 'fetal_movement', 'uterine_contractions', 'light_decelerations', 'severe_decelerations', 'prolongued_decelerations', 'abnormal_short_term_variability', 'mean_value_of_short_term_variability', 'percentage_of_time_with_abnormal_long_term_variability', 'mean_value_of_long_term_variability', 'histogram_width', 'histogram_min', 'histogram_max', 'histogram_number_of_peaks', 'histogram_number_of_zeroes', 'histogram_mode', 'histogram_mean', 'histogram_median', 'histogram_variance', 'histogram_tendency']] """# 属性间相关系数 cor = df.corr() print(cor) # 属性间相关系数热力图 seaborn.heatmap(cor) plt.show()""" '''# 选择关键数据 key_data = df[['baseline value', 'accelerations', 'fetal_movement', 'uterine_contractions', 'light_decelerations', 'severe_decelerations', 'prolongued_decelerations', 'abnormal_short_term_variability', 'mean_value_of_short_term_variability', 'percentage_of_time_with_abnormal_long_term_variability','mean_value_of_long_term_variability', 'histogram_width', 'histogram_min', 'histogram_max','histogram_number_of_peaks', 'histogram_number_of_zeroes', 'histogram_mode','histogram_mean','histogram_median', 'histogram_variance','histogram_tendency']] # 标准化数据 scaler = StandardScaler() scaled_data = scaler.fit_transform(key_data) # 使用PCA进行降维 pca = PCA(n_components=2) # 选择保留2个主成分 transformed_data = pca.fit_transform(scaled_data) # 绘制降维后的数据散点图 plt.scatter(transformed_data[:, 0], transformed_data[:, 1]) plt.xlabel('Principal Component 1') plt.ylabel('Principal Component 2') plt.show() ''' '''# 训练集： y_train = np.array(data_train) y_train = np.asarray(y_train) # 使用np.asarray将其转换为NumPy数组 y_train1 = y_train[:1000, 22] y_test1 = y_train[1000:, 22] y_train1 = np.array(y_train1) print(y_train) x_train = np.array(data_train) x_train1 = np.asarray(x_train) # 使用np.asarray将其转换为NumPy数组 x_train1 = x_train[:1000, 1:22] x_test1 = x_train[1000:, 1:22] # 测试集： x_test = np.array(data_test) x_test = np.asarray(x_test) # 使用np.asarray将其转换为NumPy数组 x_test = x_test[:, 1:22] # SVM分类器参数设置 clf = svm.SVC(C=1, kernel='linear', decision_function_shape='ovr') # 模型训练 def train(clf, x_train, y_train): clf.fit(x_train, y_train.ravel()) # 训练集目标值 # 训练SVM模型 train(clf, x_train1, y_train1) # 输出准确率 # 训练集： print('training prediction:%.3f' % (clf.score(x_train1, y_train1))) print('training_test prediction:%.3f' % (clf.score(x_test1, y_test1))) test_predict = clf.predict(x_test) print(test_predict) print((clf.predict(x_test) == 1).sum()) print((clf.predict(x_test) == 2).sum()) print((clf.predict(x_test) == 3).sum())

评论收藏

内容反馈