没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
1
DATA SCIENCE
INTERVIEW QUESTIONS
120
COMPILED AND CREATED BY:
CARL SHAN, MAX SONG, HENRY WANG, AND WILLIAM CHEN
Sold to
yanzhi830131@gmail.com
2
INTRODUCTION
This guide is meant to bridge the gap between the knowledge of a recent graduate and the skillset
required to become a data scientist. By reading this guide and learning how to answer these ques-
tions, recent graduates will equip themselves with the expected knowledge and skills of a data scien-
tist.
To help readers with these goals, we’ve gathered 120 interview questions in product metrics, pro-
gramming and databases, probability, experimentation and inference, data analysis, and predictive
modeling. These questions are all either real data science interview questions or inspired by real data
science interview questions, and should help readers develop the skills needed to succeed in a data
science role.
The role of a data scientist is highly malleable and company dependent. However, the general skillset
needed is similar. Candidates need:
• Technical skills - data analysis and programming
• Business/product intuition - metrics and identifying opportunities for impact
• Communication ability - clarity in explaining ndings and insights
To prepare for your interview, you may want to brush up by reviewing some probability, data anal-
ysis, SQL, coding, and experimental design. The questions in this guide should help you do so. The
background of data science applicants varies wildly, so interviews may generally be more holistic and
test your intuition, analytic, and communication abilities rather than focusing on specic technical
concepts.
Prepare to discuss your past work involving analyzing large and complicated datasets, defending
your approaches and communicating what you learned during your project. Expect questions in-
volving how to measure “goodness” of a feature on the company’s product, and be sure to approach
these problems in a scientic and principled way. You have a good chance of getting a product
metrics or experimentation question based on some actual questions the company is tackling at this
time.
Check up on your company’s engineering / data blog and see if anything’s relevant. Be familiar with
A/B testing and common metrics that companies similar to the one you are interviewing for may
use. Brush up on your Python (especially iPython notebook) and/or R abilities to prepare for a po-
tential live data analysis problem.
And nally, of course, follow the general interview advice. Prepare to elaborate on related proj-
ects from your resume. Be enthusiastic. Share your thoughts with your interviewer as you’re going
through a problem or doing a piece of analysis. And be sure to answer the question!
You have our best wishes!
Carl, Max, Henry, and William
Please feel free to reach out to us with questions, comments and suggestions at www.datasciencehandbook.me
4
DATA SCIENCE INTERVIEW QUESTIONS
1 (Given a Dataset) Analyze this dataset and give me a mod-
el that can predict this response variable.
2 What could be some issues if the distribution of the test
data is signicantly dierent than the distribution of the
training data?
3 What are some ways I can make my model more robust
to outliers?
4 What are some dierences you would expect in a model
that minimizes squared error, versus a model that min-
imizes absolute error? In which cases would each error
metric be appropriate?
5 What error metric would you use to evaluate how good
a binary classier is? What if the classes are imbalanced?
What if there are more than 2 groups?
6 What are various ways to predict a binary response vari-
able? Can you compare two of them and tell me when
one would be more appropriate? What’s the dierence
between these? (SVM, Logistic Regression, Naive Bayes,
Decision Tree, etc.)
7 What is regularization and where might it be helpful?
What is an example of using regularization in a model?
8 Why might it be preferable to include fewer predictors
over many?
9 Given training data on tweets and their retweets, how
would you predict the number of retweets of a given tweet
after 7 days after only observing 2 days worth of data?
10 How could you collect and analyze data to use social me-
dia to predict the weather?
PREDICTIVE MODELING
If asked to predict a response
variable during your interview,
you should favor simpler models
that run quickly and which you
can easily explain. If the task is
specically a predictive model-
ing task, you should try to do,
or at least mention, cross-vali-
dation as it really is the golden
standard to evaluate the qual-
ity of one’s model. Talk about
and justify your approach while
you’re doing it, and leave some
time to plot and visualize the
data.
PRO TIP
剩余18页未读,继续阅读
资源评论
sinat_21301703
- 粉丝: 1
- 资源: 9
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 激光所基于图像处理的QC代码,用MATLAB编写+源代码+文档说明
- 3018A-VB一款N-Channel沟道SOT23的MOSFET晶体管参数介绍与应用说明
- 案例研究源代码(按章节).zip
- 基于qt+C++开发的截图工具+实现了截图后进行多种编辑类似QQ和微信的截图编辑功能(期末大作业&课设&项目开发)
- 2SJ345-VB一款SOT23封装P-Channel场效应MOS管
- 基于MATLAB的数字图像处理 学习资料整理+源代码+文档说明
- ios系统管理知识系统总结
- 根据B站‘’打浦桥程序员‘’发布的MATLAB GUI基础课程,自学完成了属于自己的一个matlab gui小工具+源代码+文档
- poco-1.12.5-msvc-x86
- 2SJ343-VB一款P-Channel沟道SOT23的MOSFET晶体管参数介绍与应用说明
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功