预测抑郁症：使用CDCNHANES网站上的医疗数据通过机器学习预测抑郁症的项目。使用Streamlit创建了一个供用户浏览此项目中数据的配套仪表板。使用JupyterNotebook用python编写的主要项目流程分析和VisualStudio代码，用于编写自定义功能和创建仪表板

共42个文件

ipynb：19个

csv：9个

pyc：3个

data-science

python3

healthcare

machinelearning

streamlit-dashboard

需积分: 49 79 浏览量 2021-02-23 08:04:27 上传评论 7 收藏 105.86MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

Predicting-Depression-main.zip （42个子文件）

Predicting-Depression-main

first_notebook.ipynb 2.52MB

Images

Tuned SGD Linear Model.png 28KB

Depression.jpg 1.76MB

Most Important Features.png 55KB

Dashboard

depression_app.py 28KB

app_cleaning.ipynb 777KB

FullData.csv 60.03MB

.ipynb_checkpoints

app_cleaning-checkpoint.ipynb 777KB

plotly_figures-checkpoint.ipynb 353KB

project_functions

__pycache__

oi.cpython-36.pyc 417B

custom_functions.cpython-36.pyc 190B

__init__.cpython-36.pyc 12KB

__init__.py 13KB

scratch

.ipynb_checkpoints

first_notebook-Copy1-checkpoint.ipynb 1.65MB

main_notebook-checkpoint.ipynb 18.36MB

first_notebook_copy-checkpoint.ipynb 1.85MB

main_notebook.ipynb 18.36MB

second_notebook-copy.ipynb 16.32MB

first_notebook_copy.ipynb 1.85MB

PredictingDepressionSlides.pdf 834KB

second_notebook.ipynb 16.63MB

requirements.txt 47B

.gitignore 6B

CSVFiles

yTrain.csv 228KB

XTrainFinal.csv 92.27MB

yTrainResample.csv 148KB

XTestFinal.csv 23.08MB

XTrainResample.csv 39.14MB

yTest.csv 57KB

FullData.csv 60.03MB

.ipynb_checkpoints

first_notebook-Copy1-checkpoint.ipynb 1.65MB

plotly_figures-checkpoint.ipynb 353KB

first_notebook-checkpoint.ipynb 2.52MB

Covid-checkpoint.ipynb 17KB

second_notebook-checkpoint.ipynb 16.63MB

main_notebook-checkpoint.ipynb 18.36MB

main_notebook1-checkpoint.ipynb 231KB

old_notebook-checkpoint.ipynb 3.78MB

README.md 14KB

StreamlitData.csv 65.24MB

.gitattributes 66B

stethoscope.jpg 1MB

# Predicting Depression Using Health Care Data **Author**: Vivienne DiFrancesco A companion dashboard for exploring the data used in this project can be found [here](https://share.streamlit.io/heythatsviv/predicting-depression/main/Dashboard/depression_app.py) The contents of this repository is an analysis of using machine learning models to predict depression in people using health care data. This analysis is detailed in hopes of making the work accessible and replicable. ![Depression](https://raw.githubusercontent.com/HeyThatsViv/Predicting-Depression-Using-Health-Care-Data/main/Images/Depression.jpg) ## Repository Structure - README.md: The top level README for reviewers of this project - first_notebook.ipynb: Beginning narrative documentation of analysis in jupyter notebook up through the data cleaning stages - second_notebook.ipynb: Continuation of the narriative documentation that begins after data cleaning at the explore stage of the project - PredictingDepressionSlides.pdf: PDF version of project presentation slides - project_functions folder: Contains the custom functions written for use in the first_notebook and second_notebook - Dashboard folder: Folder containing files for creating the companion dashboard for this project ## Abstract Millions of people globally suffer from depression and it is a debilitating condition. At best it can be difficult for people to live their lives normally and happily, and at worst it leads to death by suicide. Primary care doctors are overwhelmingly finding that they are faced with the need to treat mental health conditions such as depression without any particular training of how to handle such cases. There is evidence that an integrated approach where physicians regularly screen patients for mental health disorders and work together with psychologists and other mental health professionals to treat patients leads to reduced costs and better patient outcomes. However, this approach can require a lot of buy-in from many individuals, require extra training, and is often not logistically feasible. Using data from the CDC National Health and Examination Survey, machine learning was applied to predict patients who may have depression based on information that could typically be found in a medical file. These predictions could be used to put patients in touch with experienced mental health professionals sooner and easier. The results show that 71% of those who have depression and 80% of those who don't have depression can be correctly identified. Though more work needs to be done to create a more accurate model, this shows proof of concept that this is a realistic prediction task. Better results could be yielded by adding more patient information to the data or testing more types of models. ## Introduction According to the World Health Organization, more that 264 million people globally have depression. Many suicides each year are caused by depression with suicide being among the leading causes of death for young people especially.[1](https://www.who.int/news-room/fact-sheets/detail/depression) The National Institute of Mental Health found that the prevalence of a major depressive episode among U.S. adults in 2017 was 7.1% of people with young adults being the most affected.[2](https://www.nimh.nih.gov/health/statistics/major-depression.shtml) The American Psychological Association identified that primary care physicians are often being asked to diagnose mental disorders such as depression without adequate training on how to handle such treatments. According to their numbers, 70% of primary care visits are because of patients’ psychological problems, more than 80% of patients who have symptoms with no diagnosis receive psychological treatment by a physician, and only 10% follow up to a mental health professional. Patients are not getting the care they desperately need as 70% of individuals with depression go undiagnosed. Among people who commit suicide, 90% of people had a mental disorder and 40% of people had visited their doctor within the last month.[3](https://www.apa.org/health/briefs/primary-care.pdf) In a study published in JAMA, doctors looked at patient outcomes, cost of care, and other factors between patients that were provided more overt diagnosing and treatment for mental health at standard doctor appointments versus patients that were not. They found that for patients that receive mental health intervention, costs went down, health care services were better utilized, patient outcomes improved, primary care doctor visits declined, treatment interventions were started earlier, and hospital and emergency care visits declined.[4](https://jamanetwork.com/journals/jama/fullarticle/2545685) ## Goal The goal of this project is to gather data about people that would typically be in a patient’s medical record to predict depression. Many clinics or doctors may find it impossible to have such integrated mental health services as cited in the previously mentioned study. Having standard services where patients are constantly screened for mental health disorders and treatment is tightly integrated with teams of physicians and psychological professionals can be expensive, requires a lot of training, requires participation from many individual doctors that may feel too overwhelmed, and may also not be possible in certain areas due to various logistical factors. Using machine learning and data that may otherwise be in a patient’s medical file, the goal is to predict who may have depression in a way that requires very little human participation from doctors and has lower time and money costs associated. The patients who are predicted to have depression could potentially be referred straight to mental health professionals in their area or who accept their health care coverage. The patient’s file could also be flagged to alert the medical staff the next time they have any kind of physician appointment to prompt doctors to start the conversation with patients. At the very least information and resources could be sent to patients directly to encourage them to take action on their own behalf. ## Data The data for this project is from the Centers for Disease Control and Prevention National health and Nutrition Examination Survey. This data includes a vast array of health data done on a sample of the American population each year and is released every two years. The data can be found at this website: https://wwwn.cdc.gov/nchs/nhanes/default.aspx. For this project, data was taken from the years between 2005 and 2018 and comprised of 36259 entries total of U.S. adults. Only data that was consistent across years was used and there was effort to only include data that would be reasonably found in a patient's medical file. Using as little data as possible while still being able to have accurate predictions is desirable as it would catch more people who may not have a very deep medical history and also puts less burden on providers to have to capture so much information. ## Approach The target was calculated using the PHQ-9 depression screening tool that was asked of all participants in the NHANES data. A study showed that this screening tool has a specificity and sensitivity of 88% for major depression at a threshold score of 10 or more.[5](https://pubmed.ncbi.nlm.nih.gov/11556941/) People were divided into “depressed” and “not depressed” categories based on the score for their answers in the screening tool with a score of 10 or more being “depressed”. The approach for this project was to create many different model types to see what performs the best and to compare and contrast the different types of models. The modeling effort was done starting with simpler models and moving to more complex models. The OSEMiN process is the overarching structure of this project. ## Methods The way the data was preprocessed with feature engi

评论收藏

内容反馈