Data Visualization Assignment- R
Name: Abhishek Dubey
Student ID: D20123718
Full time: MSc Data Science 2020-21
Class: TU59 Full Time
Date: 15/12/2020
Analysis of Suicide Trends Globally
Title: Analysis of Suicide Trends Globally
Introduction:
Suicide is the act of causing your own death intentionally, there are so many reasons for it. We are not discussing the reasons here but we will discuss how is this trend growing in numbers.
As per the sources 8,28,000 cases reported in 2015 in the world, which is almost 7,12,000 more cases from 1990.
This makes suicide 10th leading cause of death worldwide.
From 1985 to 2016 we will see suicide rates with respect to every country. And further we will see the relation of rich countries with the suicide cases.
My Goal
My goal is to only provide awareness to people about these cases from the visualization I prepared. so that we can safe more human being.
STOP SUICIDE
Problem:
Suicide is the major problem since long back, all governments are taking steps to control it. Now in this analysis we will see the global trend of suicide cases from 1985 to 2016.
We are using global suicide data per country from World Health Organization. And also, we are using country per capita data from World Bank to analyze which countries per capita shows what relations to suicide cases?
Which Rich countries shows maximum suicide rate till date? This will lead to many more questions like why in rich countries suicide rates are so high?
What is the global trend of suicide cases each year from 1985 to 2015?
Description of intended audience:
Governments use these visualizations and can check the stand of their country in global cases. compare suicide cases between countries so that government can implement awareness program to overcome these cases.
Health Ministry of every country can visualize these insights and can implement consultation program organized by different countries to reduce this rate. Providing psychologist one to one consultation to the people who feel stress. According to the study 99% cases of suicide are because of some kind of stress. And lots of governments are organizing different consultations program so that people can feel open and these programs shows best results.
Research Students can use this study and visualizations to organize some specific kind of technique to overcome these cases.
General population should check these visualizations so that they scan help their friends, relatives, family person by providing extra support and care.
Dataset
I am using 2 datasets from public resources
Dataset 1:
Contains data related to suicide cases by country every year from 1985 to 2016
This data set taken from World Health Organization
Reference:
World Health Organization. (2018). Suicide prevention.
Retrieved from http://www.who.int/mental_health/suicide-prevention/en/
Dataset 2:
Contains data related to per capita income of country from 1985 to 2016
This data will be helpful in analyzing which countries are very rich and which are poor.
This dataset taken from World Bank
Reference:
World Bank. (2018). World development indicators: GDP (current US$) by country:1985 to 2016.
Retrieved from http://databank.worldbank.org/data/source/world-development-indicators#
PRE- PROCESSING, CLEANING AND WRANGLING OF DATA
1. Both datasets are uploaded in my GitHub so that we can use them directly and anywhere. So, no need to import from computer as data is already in cloud.
Country_df = Data related to country with per capita income from 1985-2016
Country_df<- read_csv("https://raw.githubusercontent.com/Abhidubey96/Analysis-of-Suicide-Trends-Globally-in-R/main/Country%20Data.csv")
Suicide_df = Data related to suicide cases by each country from 1985-2016
Suicide_df <- read_csv("https://raw.githubusercontent.com/Abhidubey96/Analysis-of-Suicide-Trends-Globally-in-R/main/Suicide%20Data%20by%20Country.csv")
2. After importing them in R Studio we will merge them in final data frame.
df <- cbind(Suicide_df, Country_df)
3. Removing variable “HDI per Year” from data frame as variable is showing 75% missing values. No need of this variable
4. Removing variable “suicides/100k pop” from data frame as field calculated is wrong. We will recalculate this later and use it in our visualization
5. Renaming some variables for our statistics like “gdp_per_year”, “gdp_per_capita”, “country-year”
Extensive Cleaning in Data:
6. As we can see in our data frame that every year should have 12 observations. As per 6 age groups and 2 genders. So, we need 12 for every year. But 2016 year shows less entries. So, we are not taking it further, we will remove 2016 year as data is very less.
7. In our final data frame, we need continent data so we will use country code library to impute the continent field
8. Some variables have wrong datatypes so we will correct the datatype for age to be Ordinal using factor function
9. Similarly, variable generation has wrong data type so we will correct it with Ordinal datatype using factor function
** for code please check .Rmd File attached
10. Calculating global suicide rates over time from 1985 to 2015
Global <- (sum(as.numeric(df$suicides_no)) / sum(as.numeric(df$population))) * 100000
VISUALIZATION 1
What will be the World heat map according to the suicide per 100k population for each country?
Also, which countries and continents shows higher cases?
To achieve this goal first we make normal world map with total suicide cases.
Iteration 1
This Graph Develops the confusion about the countries and their suicide cases.
Countries which are not present in dataset shows no colour, which makes hard for audience to analyze.
Also, we need to calculate the suicide cases per 100k population.
Iteration 2:
Calculating suicide per 100k population
Variable “suicide_per_100k” = (sum(as.numeric(suicides_no)) / sum(as.numeric(population))) * 100000
And giving missing countries to gray colour
missingCountryCol="gray"
Map shows countries in colour from white to dark red according to the suicide cases per 100k population of the country.
Scale is there from 0 to 45.
Being 0 be lowest in suicide cases per 100k population
Being 45 means highest in suicide cases per 100k population
**Some of the countries are gray means their data is missing, they are not part of this analysis.
Some countries from ASIA and AFRICA doesn’t have sufficient data for analysis.
Blue colour shows the ocean.
Insights from Map:
1. Russia and Lithuania show highest number of suicide cases per 100k population from 1985 to 2015. Margin is very close to 45 index.
2. Because of insufficient data from some continents, we can say majority cases are coming from Europe. But is we have more data then this statement will be wrong.
Code for Map
As we can see most of the cases are coming from Europe and north America
Visualizing suicide per 100k population cases in that region
MapRegion = “north America” MapRegion = “europe”
VISUALIZATION 2:
What is the relation between age and generation over years from 1985 to 2015 suicide cases per 100k population?
Is it True that new generation suicide cases globally are growing rapidly?
To Answer above question, we will follow below iterations
Iteration 1
Analyzing age variable with suicide cases per 100k population. To identify which age group shows more cases
This shows age group 75+ years having the greatest number of suicide cases. And age group 5-14 years shows a smaller number of cases.
Now we have relation between age and suicide pe 100k population globally
Let’s find out the relation between generation and suicide cases
Iteration 2:
Let’s Analyze the relation between generation with suicide per 100k population
For Knowledge purpose generation groups belongs to:
G I Gen: Born 1996 – TBD
Gen Z: Current
Millennials: Born 1977 – 1995
Generation X: Born 1965 – 1976
Boomers: Born 1946 – 1964
Silent: Born 1945 and before
Ins
Mrrunsen
- 粉丝: 9800
- 资源: 515
最新资源
- 利用黑科技工具,几十秒生成一条必过原创精品视频,零基础适合.mp4
- TE分类 公众号描述: 搬运论文内容,如有错误,敬请指正
- 计组课设-Quartus操作笔记.7z
- 基于分布式驱动电动汽车的车辆状态估计,采用的是无迹卡尔曼(ukf)观测器,可估计包括纵向速度,质心侧偏角,横摆角速度,以及四个车轮角速度七个状态 模型中第一个模块是四轮驱动电机;第二个模块是cars
- GDX1 MICROII外部主机和辅机接线图
- 免费无限制,AI一键生成小红书原创视频,商单+带货,单账号日收益1000+.mp4
- 汽水音乐人计划单号月入5000+可放大.mp4
- 评论区私信暴力引流法,每天精准引流300+创业粉,全平台已打通,简单粗暴.mp4
- 实体商家必学:同城到店团购策略:抖音外卖与工厂自卖运营技巧.mp4
- SSA-CNN-GRU麻雀算法优化卷积门控循环单元时间序列预测(Matlab) 所有程序经过验证,保证有效运行 可有偿替数据及其他服务 2.输入数据为单变量时间序列数据,即一维数据; 3.运行环境
- 中颖无刷 BLDC 方案 芯片: Sh79f1611 Sh79f2202A 电压平台: 18V 36V; 状态: 量产阶段; 功能点: ADC方案,堵转失步,转速环电流环,电压电流保护等; 适配: 枪
- 淘宝无人直播最新玩法,不违规不封号,轻松月入3W+.mp4
- C# OPC DA 协议同步及异步读取数据,支持局域网访问其他OPC server
- 不同品种桃子叶片图像分类数据集【已标注,约2500张数据】
- Notepad++文本编辑器(64位) v8.6.4
- 头条搬运知乎文章教程:从注册到发布,全流程详解.mp4
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈