没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Cleaning, Analyzing, and Visualizing
Survey Data inPython
A tutorial using pandas, matplotlib, and seaborn to
produce digestible insights from dirtydata
If you work in data at a D2C startup, there’s a good chance you will be
asked to look at survey data at least once. And since SurveyMonkey is
one of the most popular survey platforms out there, there’s a good
chance it’ll be SurveyMonkey data.
The way SurveyMonkey exports data is not necessarily ready for
analysis right out of the box, but it’s pretty close. Here I’ll
demonstrate a few examples of questions you might want to ask of
your survey data, and how to extract those answers quickly. We’ll
even write a few functions to make our lives easier when plotting
future questions.
Charlene Chambliss
F
o
ll
ow
Mar 31
·
10 min read
We’ll be using pandas , matplotlib , and seaborn to make sense
of our data. I used Mockaroo to generate this data; specifically, for
the survey question fields, I used "Custom List" and entered in the
appropriate fields. You could achieve the same effect by using
random.choice in the random module, but I found it easier to let
Mockaroo create the whole thing for me. I then tweaked the data in
Excel so that it mirrored the structure of a SurveyMonkey export.
Your first reaction to this might be “Ugh. It’s horrible.” I mean, the
column names didn’t read in properly, there are a ton of NaNs,
instead of numerical representations like 0/1 or 1/2/3/4/5 we have
the actual text answers in each cell…And should we actually be
reading this in with a MultiIndex?
But don’t worry, it’s not as bad as you might think. And we’re going to
ignore MultiIndexes in this post. (Nobody really likes working with
them anyway.) The team needs those insights ASAP — so we’ll come
up with some hacky solutions.
1
2
3
4
5
6
7
8
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set_style('ticks')
1
2
survey_data = pd.read_csv('MOCK_DATA.csv')
survey_data.head()
Oh boy…here wego
First order of business: we’ve been asked to find how the answers to
these questions vary by age group. But age is just an age--we don't
have a column for age groups! Well, luckily for us, we can pretty
easily define a function to create one.
But if we try to run it like this, we’ll get an error! That’s because we
have that first row, and its value for age is the word “age” instead of a
number. Since the first step is to convert each age to an int , this
will fail.
We need to remove that row from the DataFrame, but it’ll be useful
for us later when we rename columns, so we’ll save it as a separate
variable.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def age_group(age):
"""Creates an age bucket for each participant usin
Meant to be used on a DataFrame with .apply().
# Convert to an int, in case the data is read in a
age = int(age)
if age < 30:
bucket = '<30'
# Age 30 to 39 ('range' excludes upper bound)
if age in range(30, 40):
bucket = '30-39'
if age in range(40, 50):
bucket = '40-49'
1
2
3
4
5
# Save it as headers, and then later we can access it v
headers = survey_data.loc[0]
# .drop() defaults to axis=0, which refers to dropping
survey_data = survey_data.drop(0)
You will notice that, since removing headers , we've now lost some
information when looking at the survey data by itself. Ideally, you will
have a list of the questions and their options that were asked in the
survey, provided to you by whoever wants the analysis. If not, you
should keep a separate way to reference this info in a document or
note that you can look at while working.
OK, now let’s apply the age_group function to get our age_group
column.
Great. Next, let’s subset the data to focus on just the first question.
How do the answers to this first question vary by age group?
1
2
3
survey_data['age_group'] = survey_data['What is your ag
survey_data['age_group'].head(3)
1
2
3
# Subset the columns from when the question "What was t
# through to all the available answers. Easiest to use
survey_data.iloc[:5, 3:7]
剩余18页未读,继续阅读
资源评论
tox33
- 粉丝: 64
- 资源: 304
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功