Data Visualization Assignment- R
Name: Abhishek Dubey
Student ID: D20123718
Full time: MSc Data Science 2020-21
Class: TU59 Full Time
Date: 15/12/2020
Analysis of Suicide Trends Globally
Suicide is the act of causing your own death intentionally, there are so many reasons for it. We are not discussing the reasons here but we will discuss how is this trend growing in numbers.
As per the sources 8,28,000 cases reported in 2015 in the world, which is almost 7,12,000 more cases from 1990.
This makes suicide 10th leading cause of death worldwide.
From 1985 to 2016 we will see suicide rates with respect to every country. And further we will see the relation of rich countries with the suicide cases.
My Goal
My goal is to only provide awareness to people about these cases from the visualization I prepared. so that we can safe more human being.
Suicide is the major problem since long back, all governments are taking steps to control it. Now in this analysis we will see the global trend of suicide cases from 1985 to 2016.
We are using global suicide data per country from World Health Organization. And also, we are using country per capita data from World Bank to analyze which countries per capita shows what relations to suicide cases?
Which Rich countries shows maximum suicide rate till date? This will lead to many more questions like why in rich countries suicide rates are so high?
What is the global trend of suicide cases each year from 1985 to 2015?
Description of intended audience:
Governments use these visualizations and can check the stand of their country in global cases. compare suicide cases between countries so that government can implement awareness program to overcome these cases.
Health Ministry of every country can visualize these insights and can implement consultation program organized by different countries to reduce this rate. Providing psychologist one to one consultation to the people who feel stress. According to the study 99% cases of suicide are because of some kind of stress. And lots of governments are organizing different consultations program so that people can feel open and these programs shows best results.
Research Students can use this study and visualizations to organize some specific kind of technique to overcome these cases.
General population should check these visualizations so that they scan help their friends, relatives, family person by providing extra support and care.
I am using 2 datasets from public resources
Dataset 1:
Contains data related to suicide cases by country every year from 1985 to 2016
This data set taken from World Health Organization
World Health Organization. (2018). Suicide prevention.
Retrieved from
Dataset 2:
Contains data related to per capita income of country from 1985 to 2016
This data will be helpful in analyzing which countries are very rich and which are poor.
This dataset taken from World Bank
World Bank. (2018). World development indicators: GDP (current US$) by country:1985 to 2016.
Retrieved from
1. Both datasets are uploaded in my GitHub so that we can use them directly and anywhere. So, no need to import from computer as data is already in cloud.
Country_df = Data related to country with per capita income from 1985-2016
Country_df<- read_csv("")
Suicide_df = Data related to suicide cases by each country from 1985-2016
Suicide_df <- read_csv("")
2. After importing them in R Studio we will merge them in final data frame.
df <- cbind(Suicide_df, Country_df)
3. Removing variable “HDI per Year” from data frame as variable is showing 75% missing values. No need of this variable
4. Removing variable “suicides/100k pop” from data frame as field calculated is wrong. We will recalculate this later and use it in our visualization
5. Renaming some variables for our statistics like “gdp_per_year”, “gdp_per_capita”, “country-year”
Extensive Cleaning in Data:
6. As we can see in our data frame that every year should have 12 observations. As per 6 age groups and 2 genders. So, we need 12 for every year. But 2016 year shows less entries. So, we are not taking it further, we will remove 2016 year as data is very less.
7. In our final data frame, we need continent data so we will use country code library to impute the continent field
8. Some variables have wrong datatypes so we will correct the datatype for age to be Ordinal using factor function
9. Similarly, variable generation has wrong data type so we will correct it with Ordinal datatype using factor function
** for code please check .Rmd File attached
10. Calculating global suicide rates over time from 1985 to 2015
Global <- (sum(as.numeric(df$suicides_no)) / sum(as.numeric(df$population))) * 100000
What will be the World heat map according to the suicide per 100k population for each country?
Also, which countries and continents shows higher cases?
To achieve this goal first we make normal world map with total suicide cases.
Iteration 1
This Graph Develops the confusion about the countries and their suicide cases.
Countries which are not present in dataset shows no colour, which makes hard for audience to analyze.
Also, we need to calculate the suicide cases per 100k population.
Iteration 2:
Calculating suicide per 100k population
Variable “suicide_per_100k” = (sum(as.numeric(suicides_no)) / sum(as.numeric(population))) * 100000
And giving missing countries to gray colour
Map shows countries in colour from white to dark red according to the suicide cases per 100k population of the country.
Scale is there from 0 to 45.
Being 0 be lowest in suicide cases per 100k population
Being 45 means highest in suicide cases per 100k population
**Some of the countries are gray means their data is missing, they are not part of this analysis.
Some countries from ASIA and AFRICA doesn’t have sufficient data for analysis.
Blue colour shows the ocean.
Insights from Map:
1. Russia and Lithuania show highest number of suicide cases per 100k population from 1985 to 2015. Margin is very close to 45 index.
2. Because of insufficient data from some continents, we can say majority cases are coming from Europe. But is we have more data then this statement will be wrong.
Code for Map
As we can see most of the cases are coming from Europe and north America
Visualizing suicide per 100k population cases in that region
MapRegion = “north America” MapRegion = “europe”
What is the relation between age and generation over years from 1985 to 2015 suicide cases per 100k population?
Is it True that new generation suicide cases globally are growing rapidly?
To Answer above question, we will follow below iterations
Iteration 1
Analyzing age variable with suicide cases per 100k population. To identify which age group shows more cases
This shows age group 75+ years having the greatest number of suicide cases. And age group 5-14 years shows a smaller number of cases.
Now we have relation between age and suicide pe 100k population globally
Let’s find out the relation between generation and suicide cases
Iteration 2:
Let’s Analyze the relation between generation with suicide per 100k population
For Knowledge purpose generation groups belongs to:
G I Gen: Born 1996 – TBD
Gen Z: Current
Millennials: Born 1977 – 1995
Generation X: Born 1965 – 1976
Boomers: Born 1946 – 1964
Silent: Born 1945 and before
