A Descriptive Analysis of the Forbes Richest Athletes Dataset
(1990-2020)
2023-03-04
Introduction
The purpose of this report is to conduct exploratory analysis on the data set of Forbes’
richest athletes (1990-2020). We will define our own research questions and use summary
statistics and data visualization to find answers. By preprocessing the data and selecting
appropriate visualization methods, we will draw a conclusion about the income of the
athletes with the highest income and athletes from all countries. In the analysis process, we
will consider processing missing data, recoding variables and aggregating data. If
necessary, we will make reasonable assumptions and prove the rationality of methods and
assumptions. Finally, we will explain and summarize the analysis results in detail.
Data
library(tidyverse)
library(VIM)
library(corrplot)
df <- read.csv("Forbes Richest Atheletes (1990-2020).csv")
str(df)
## 'data.frame': 301 obs. of 7 variables:
## $ name : chr "Ayrton Senna" "Alain Prost" "Michael Jordan"
"Mike Tyson" ...
## $ nationality : chr "Brazil" "France" "USA" "USA" ...
## $ current_rank : int 4 5 8 1 2 3 8 6 7 8 ...
## $ previous_year_rank: chr "" "" "" "" ...
## $ sport : chr "Auto Racing" "Auto Racing" "Basketball"
"Boxing" ...
## $ year : int 1990 1990 1990 1990 1990 1990 1990 1990 1990
1990 ...
## $ earnings : num 10 9 8.1 28.6 26 13 8.1 8.6 8.5 8.1 ...
This data set is about the top ten athletes with the highest income in the world released by
Forbes from 1990 to 2020. It includes the athlete’s name, nationality, current global
ranking, ranking of the previous year, sport type, year and income (in millions of dollars).
Due to the change of the reporting period from the whole year to June to June, the data set
for 2001 is missing.
Descriptive statistics
Data pre-processing is an important step in the data analysis process as it helps to ensure
the quality and validity of the data being analyzed. In the case of the Forbes richest athletes
dataset, the following pre-processing steps may be necessary: