# Predicting Depression Using Health Care Data
**Author**: Vivienne DiFrancesco
<b> A companion dashboard for exploring the data used in this project can be found [here](https://share.streamlit.io/heythatsviv/predicting-depression/main/Dashboard/depression_app.py)</b>
The contents of this repository is an analysis of using machine learning models to predict depression in people using health care data. This analysis is detailed in hopes of making the work accessible and replicable.
![Depression](https://raw.githubusercontent.com/HeyThatsViv/Predicting-Depression-Using-Health-Care-Data/main/Images/Depression.jpg)
## Repository Structure
- <b>README.md:</b> The top level README for reviewers of this project
- <b>first_notebook.ipynb:</b> Beginning narrative documentation of analysis in jupyter notebook up through the data cleaning stages
- <b>second_notebook.ipynb:</b> Continuation of the narriative documentation that begins after data cleaning at the explore stage of the project
- <b>PredictingDepressionSlides.pdf:</b> PDF version of project presentation slides
- <b>project_functions folder:</b> Contains the custom functions written for use in the first_notebook and second_notebook
- <b>Dashboard folder:</b> Folder containing files for creating the companion dashboard for this project
## Abstract
Millions of people globally suffer from depression and it is a debilitating condition. At best it can be difficult for people to live their lives normally and happily, and at worst it leads to death by suicide. Primary care doctors are overwhelmingly finding that they are faced with the need to treat mental health conditions such as depression without any particular training of how to handle such cases.
There is evidence that an integrated approach where physicians regularly screen patients for mental health disorders and work together with psychologists and other mental health professionals to treat patients leads to reduced costs and better patient outcomes. However, this approach can require a lot of buy-in from many individuals, require extra training, and is often not logistically feasible.
Using data from the CDC National Health and Examination Survey, machine learning was applied to predict patients who may have depression based on information that could typically be found in a medical file. These predictions could be used to put patients in touch with experienced mental health professionals sooner and easier.
The results show that 71% of those who have depression and 80% of those who don't have depression can be correctly identified. Though more work needs to be done to create a more accurate model, this shows proof of concept that this is a realistic prediction task. Better results could be yielded by adding more patient information to the data or testing more types of models.
## Introduction
According to the World Health Organization, more that 264 million people globally have depression. Many suicides each year are caused by depression with suicide being among the leading causes of death for young people especially.[1](https://www.who.int/news-room/fact-sheets/detail/depression) The National Institute of Mental Health found that the prevalence of a major depressive episode among U.S. adults in 2017 was 7.1% of people with young adults being the most affected.[2](https://www.nimh.nih.gov/health/statistics/major-depression.shtml)
The American Psychological Association identified that primary care physicians are often being asked to diagnose mental disorders such as depression without adequate training on how to handle such treatments. According to their numbers, 70% of primary care visits are because of patients’ psychological problems, more than 80% of patients who have symptoms with no diagnosis receive psychological treatment by a physician, and only 10% follow up to a mental health professional. Patients are not getting the care they desperately need as 70% of individuals with depression go undiagnosed. Among people who commit suicide, 90% of people had a mental disorder and 40% of people had visited their doctor within the last month.[3](https://www.apa.org/health/briefs/primary-care.pdf)
In a study published in JAMA, doctors looked at patient outcomes, cost of care, and other factors between patients that were provided more overt diagnosing and treatment for mental health at standard doctor appointments versus patients that were not. They found that for patients that receive mental health intervention, costs went down, health care services were better utilized, patient outcomes improved, primary care doctor visits declined, treatment interventions were started earlier, and hospital and emergency care visits declined.[4](https://jamanetwork.com/journals/jama/fullarticle/2545685)
## Goal
The goal of this project is to gather data about people that would typically be in a patient’s medical record to predict depression.
Many clinics or doctors may find it impossible to have such integrated mental health services as cited in the previously mentioned study. Having standard services where patients are constantly screened for mental health disorders and treatment is tightly integrated with teams of physicians and psychological professionals can be expensive, requires a lot of training, requires participation from many individual doctors that may feel too overwhelmed, and may also not be possible in certain areas due to various logistical factors. Using machine learning and data that may otherwise be in a patient’s medical file, the goal is to predict who may have depression in a way that requires very little human participation from doctors and has lower time and money costs associated. The patients who are predicted to have depression could potentially be referred straight to mental health professionals in their area or who accept their health care coverage. The patient’s file could also be flagged to alert the medical staff the next time they have any kind of physician appointment to prompt doctors to start the conversation with patients. At the very least information and resources could be sent to patients directly to encourage them to take action on their own behalf.
## Data
The data for this project is from the Centers for Disease Control and Prevention National health and Nutrition Examination Survey. This data includes a vast array of health data done on a sample of the American population each year and is released every two years. The data can be found at this website: https://wwwn.cdc.gov/nchs/nhanes/default.aspx.
For this project, data was taken from the years between 2005 and 2018 and comprised of 36259 entries total of U.S. adults. Only data that was consistent across years was used and there was effort to only include data that would be reasonably found in a patient's medical file. Using as little data as possible while still being able to have accurate predictions is desirable as it would catch more people who may not have a very deep medical history and also puts less burden on providers to have to capture so much information.
## Approach
The target was calculated using the PHQ-9 depression screening tool that was asked of all participants in the NHANES data. A study showed that this screening tool has a specificity and sensitivity of 88% for major depression at a threshold score of 10 or more.[5](https://pubmed.ncbi.nlm.nih.gov/11556941/) People were divided into “depressed” and “not depressed” categories based on the score for their answers in the screening tool with a score of 10 or more being “depressed”.
The approach for this project was to create many different model types to see what performs the best and to compare and contrast the different types of models. The modeling effort was done starting with simpler models and moving to more complex models. The OSEMiN process is the overarching structure of this project.
## Methods
The way the data was preprocessed with feature engi
没有合适的资源?快使用搜索试试~ 我知道了~
预测抑郁症:使用CDC NHANES网站上的医疗数据通过机器学习预测抑郁症的项目。 使用Streamlit创建了一个供用户浏览此...
共42个文件
ipynb:19个
csv:9个
pyc:3个
需积分: 49 27 下载量 23 浏览量
2021-02-23
08:04:27
上传
评论 7
收藏 105.86MB ZIP 举报
温馨提示
使用卫生保健数据预测抑郁 作者:Vivienne DiFrancesco 可以在找到用于探索该项目中使用的数据的配套仪表板 该存储库的内容是对使用机器学习模型来预测使用医疗保健数据的人的抑郁症的分析。 希望可以使这项工作易于访问和复制,因此对这种分析进行了详细说明。 储存库结构 README.md:此项目审阅者的顶级自述文件 first_notebook.ipynb:从数据清理阶段开始在jupyter笔记本中进行分析的叙述性文档 second_notebook.ipynb:在项目的探索阶段清理数据之后开始的叙述性文档的延续 PredictingDepressionSlides.pdf:项目演示幻灯片的PDF版本 project_functions文件夹:包含编写用于first_notebook和second_notebook的自定义函数 仪表板文件夹:包含用于创建此项目的配套仪表板的文件
资源推荐
资源详情
资源评论
收起资源包目录
Predicting-Depression-main.zip (42个子文件)
Predicting-Depression-main
first_notebook.ipynb 2.52MB
Images
Tuned SGD Linear Model.png 28KB
Depression.jpg 1.76MB
Most Important Features.png 55KB
Dashboard
depression_app.py 28KB
app_cleaning.ipynb 777KB
FullData.csv 60.03MB
.ipynb_checkpoints
app_cleaning-checkpoint.ipynb 777KB
plotly_figures-checkpoint.ipynb 353KB
project_functions
__pycache__
oi.cpython-36.pyc 417B
custom_functions.cpython-36.pyc 190B
__init__.cpython-36.pyc 12KB
__init__.py 13KB
scratch
.ipynb_checkpoints
first_notebook-Copy1-checkpoint.ipynb 1.65MB
main_notebook-checkpoint.ipynb 18.36MB
first_notebook_copy-checkpoint.ipynb 1.85MB
main_notebook.ipynb 18.36MB
second_notebook-copy.ipynb 16.32MB
first_notebook_copy.ipynb 1.85MB
PredictingDepressionSlides.pdf 834KB
second_notebook.ipynb 16.63MB
requirements.txt 47B
.gitignore 6B
CSVFiles
yTrain.csv 228KB
XTrainFinal.csv 92.27MB
yTrainResample.csv 148KB
XTestFinal.csv 23.08MB
XTrainResample.csv 39.14MB
yTest.csv 57KB
FullData.csv 60.03MB
.ipynb_checkpoints
first_notebook-Copy1-checkpoint.ipynb 1.65MB
plotly_figures-checkpoint.ipynb 353KB
first_notebook-checkpoint.ipynb 2.52MB
Covid-checkpoint.ipynb 17KB
second_notebook-checkpoint.ipynb 16.63MB
main_notebook-checkpoint.ipynb 18.36MB
main_notebook1-checkpoint.ipynb 231KB
old_notebook-checkpoint.ipynb 3.78MB
README.md 14KB
StreamlitData.csv 65.24MB
.gitattributes 66B
stethoscope.jpg 1MB
共 42 条
- 1
资源评论
活宝spring
- 粉丝: 31
- 资源: 4686
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功