这份报告主要关于全球自杀趋势，使用了两个来自公共资源的数据集，一个是关于1985年至2016年每个国家的自杀案例数据R语言资源-CSDN文库

共7个文件

csv：2个

pdf：1个

rmd：1个

数据集

r语言

174 浏览量 2023-10-05 13:51:08 上传评论收藏 2.18MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

Analysis-of-Suicide-Trends-Globally-in-R-main.zip （7个子文件）

Data Visualization Assignment - R.pdf 1.61MB

Country Data.csv 1.34MB

.gitattributes 24B

Project.Rmd 9KB

Project.nb.html 863KB

Suicide Data by Country.csv 1.27MB

README.md 11KB

Data Visualization Assignment- R Name: Abhishek Dubey Student ID: D20123718 Full time: MSc Data Science 2020-21 Class: TU59 Full Time Date: 15/12/2020 Analysis of Suicide Trends Globally Title: Analysis of Suicide Trends Globally Introduction: Suicide is the act of causing your own death intentionally, there are so many reasons for it. We are not discussing the reasons here but we will discuss how is this trend growing in numbers. As per the sources 8,28,000 cases reported in 2015 in the world, which is almost 7,12,000 more cases from 1990. This makes suicide 10th leading cause of death worldwide. From 1985 to 2016 we will see suicide rates with respect to every country. And further we will see the relation of rich countries with the suicide cases. My Goal My goal is to only provide awareness to people about these cases from the visualization I prepared. so that we can safe more human being. STOP SUICIDE Problem: Suicide is the major problem since long back, all governments are taking steps to control it. Now in this analysis we will see the global trend of suicide cases from 1985 to 2016. We are using global suicide data per country from World Health Organization. And also, we are using country per capita data from World Bank to analyze which countries per capita shows what relations to suicide cases? Which Rich countries shows maximum suicide rate till date? This will lead to many more questions like why in rich countries suicide rates are so high? What is the global trend of suicide cases each year from 1985 to 2015? Description of intended audience: Governments use these visualizations and can check the stand of their country in global cases. compare suicide cases between countries so that government can implement awareness program to overcome these cases. Health Ministry of every country can visualize these insights and can implement consultation program organized by different countries to reduce this rate. Providing psychologist one to one consultation to the people who feel stress. According to the study 99% cases of suicide are because of some kind of stress. And lots of governments are organizing different consultations program so that people can feel open and these programs shows best results. Research Students can use this study and visualizations to organize some specific kind of technique to overcome these cases. General population should check these visualizations so that they scan help their friends, relatives, family person by providing extra support and care. Dataset I am using 2 datasets from public resources Dataset 1: Contains data related to suicide cases by country every year from 1985 to 2016 This data set taken from World Health Organization Reference: World Health Organization. (2018). Suicide prevention. Retrieved from http://www.who.int/mental_health/suicide-prevention/en/ Dataset 2: Contains data related to per capita income of country from 1985 to 2016 This data will be helpful in analyzing which countries are very rich and which are poor. This dataset taken from World Bank Reference: World Bank. (2018). World development indicators: GDP (current US$) by country:1985 to 2016. Retrieved from http://databank.worldbank.org/data/source/world-development-indicators# PRE- PROCESSING, CLEANING AND WRANGLING OF DATA 1. Both datasets are uploaded in my GitHub so that we can use them directly and anywhere. So, no need to import from computer as data is already in cloud. Country_df = Data related to country with per capita income from 1985-2016 Country_df<- read_csv("https://raw.githubusercontent.com/Abhidubey96/Analysis-of-Suicide-Trends-Globally-in-R/main/Country%20Data.csv") Suicide_df = Data related to suicide cases by each country from 1985-2016 Suicide_df <- read_csv("https://raw.githubusercontent.com/Abhidubey96/Analysis-of-Suicide-Trends-Globally-in-R/main/Suicide%20Data%20by%20Country.csv") 2. After importing them in R Studio we will merge them in final data frame. df <- cbind(Suicide_df, Country_df) 3. Removing variable “HDI per Year” from data frame as variable is showing 75% missing values. No need of this variable 4. Removing variable “suicides/100k pop” from data frame as field calculated is wrong. We will recalculate this later and use it in our visualization 5. Renaming some variables for our statistics like “gdp_per_year”, “gdp_per_capita”, “country-year” Extensive Cleaning in Data: 6. As we can see in our data frame that every year should have 12 observations. As per 6 age groups and 2 genders. So, we need 12 for every year. But 2016 year shows less entries. So, we are not taking it further, we will remove 2016 year as data is very less. 7. In our final data frame, we need continent data so we will use country code library to impute the continent field 8. Some variables have wrong datatypes so we will correct the datatype for age to be Ordinal using factor function 9. Similarly, variable generation has wrong data type so we will correct it with Ordinal datatype using factor function ** for code please check .Rmd File attached 10. Calculating global suicide rates over time from 1985 to 2015 Global <- (sum(as.numeric(df$suicides_no)) / sum(as.numeric(df$population))) * 100000 VISUALIZATION 1 What will be the World heat map according to the suicide per 100k population for each country? Also, which countries and continents shows higher cases? To achieve this goal first we make normal world map with total suicide cases. Iteration 1 This Graph Develops the confusion about the countries and their suicide cases. Countries which are not present in dataset shows no colour, which makes hard for audience to analyze. Also, we need to calculate the suicide cases per 100k population. Iteration 2: Calculating suicide per 100k population Variable “suicide_per_100k” = (sum(as.numeric(suicides_no)) / sum(as.numeric(population))) * 100000 And giving missing countries to gray colour missingCountryCol="gray" Map shows countries in colour from white to dark red according to the suicide cases per 100k population of the country. Scale is there from 0 to 45. Being 0 be lowest in suicide cases per 100k population Being 45 means highest in suicide cases per 100k population **Some of the countries are gray means their data is missing, they are not part of this analysis. Some countries from ASIA and AFRICA doesn’t have sufficient data for analysis. Blue colour shows the ocean. Insights from Map: 1. Russia and Lithuania show highest number of suicide cases per 100k population from 1985 to 2015. Margin is very close to 45 index. 2. Because of insufficient data from some continents, we can say majority cases are coming from Europe. But is we have more data then this statement will be wrong. Code for Map As we can see most of the cases are coming from Europe and north America Visualizing suicide per 100k population cases in that region MapRegion = “north America” MapRegion = “europe” VISUALIZATION 2: What is the relation between age and generation over years from 1985 to 2015 suicide cases per 100k population? Is it True that new generation suicide cases globally are growing rapidly? To Answer above question, we will follow below iterations Iteration 1 Analyzing age variable with suicide cases per 100k population. To identify which age group shows more cases This shows age group 75+ years having the greatest number of suicide cases. And age group 5-14 years shows a smaller number of cases. Now we have relation between age and suicide pe 100k population globally Let’s find out the relation between generation and suicide cases Iteration 2: Let’s Analyze the relation between generation with suicide per 100k population For Knowledge purpose generation groups belongs to: G I Gen: Born 1996 – TBD Gen Z: Current Millennials: Born 1977 – 1995 Generation X: Born 1965 – 1976 Boomers: Born 1946 – 1964 Silent: Born 1945 and before Ins

评论收藏

内容反馈