没有合适的资源?快使用搜索试试~ 我知道了~
Think Stats Exploratory Data Analysis 第二版 PDF
5星 · 超过95%的资源 需积分: 9 38 下载量 113 浏览量
2019-01-05
11:47:53
上传
评论
收藏 6.27MB PDF 举报
温馨提示
试读
345页
If you know how to program, you have the skills to turn data into knowledge, using tools of probability and statistics. This concise introduction shows you how to perform statistical analysis computationally, rather than mathematically, with programs written in Python. By working with a single case study throughout this thoroughly revised book, you’ll learn the entire process of exploratory data analysis—from collecting data and generating statistics to identifying
资源推荐
资源详情
资源评论
ThinkStats
AllenB.Downey
Preface
Thisbookisanintroductiontothepracticaltoolsofexploratorydataanalysis.
TheorganizationofthebookfollowstheprocessIusewhenIstartworkingwith
adataset:
Importingandcleaning:Whateverformatthedataisin,itusuallytakessome
timeandefforttoreadthedata,cleanandtransformit,andcheckthat
everythingmadeitthroughthetranslationprocessintact.
Singlevariableexplorations:Iusuallystartbyexaminingonevariableata
time,findingoutwhatthevariablesmean,lookingatdistributionsofthe
values,andchoosingappropriatesummarystatistics.
Pair-wiseexplorations:Toidentifypossiblerelationshipsbetweenvariables,I
lookattablesandscatterplots,andcomputecorrelationsandlinearfits.
Multivariateanalysis:Ifthereareapparentrelationshipsbetweenvariables,I
usemultipleregressiontoaddcontrolvariablesandinvestigatemorecomplex
relationships.
Estimationandhypothesistesting:Whenreportingstatisticalresults,itis
importanttoanswerthreequestions:Howbigistheeffect?Howmuch
variabilityshouldweexpectifwerunthesamemeasurementagain?Isit
possiblethattheapparenteffectisduetochance?
Visualization:Duringexploration,visualizationisanimportanttoolfor
findingpossiblerelationshipsandeffects.Thenifanapparenteffectholdsup
toscrutiny,visualizationisaneffectivewaytocommunicateresults.
Thisbooktakesacomputationalapproach,whichhasseveraladvantagesover
mathematicalapproaches:
IpresentmostideasusingPythoncode,ratherthanmathematicalnotation.In
general,Pythoncodeismorereadable;also,becauseitisexecutable,readers
candownloadit,runit,andmodifyit.
Eachchapterincludesexercisesreaderscandotodevelopandsolidifytheir
learning.Whenyouwriteprograms,youexpressyourunderstandingincode;
whileyouaredebuggingtheprogram,youarealsocorrectingyour
understanding.
Someexercisesinvolveexperimentstoteststatisticalbehavior.Forexample,
youcanexploretheCentralLimitTheorem(CLT)bygeneratingrandom
samplesandcomputingtheirsums.Theresultingvisualizationsdemonstrate
whytheCLTworksandwhenitdoesn’t.
Someideasthatarehardtograspmathematicallyareeasytounderstandby
simulation.Forexample,weapproximatep-valuesbyrunningrandom
simulations,whichreinforcesthemeaningofthep-value.
Becausethebookisbasedonageneral-purposeprogramminglanguage
(Python),readerscanimportdatafromalmostanysource.Theyarenot
limitedtodatasetsthathavebeencleanedandformattedforaparticular
statisticstool.
Thebooklendsitselftoaproject-basedapproach.Inmyclass,studentsworkon
asemester-longprojectthatrequiresthemtoposeastatisticalquestion,finda
datasetthatcanaddressit,andapplyeachofthetechniquestheylearntotheir
owndata.
Todemonstratemyapproachtostatisticalanalysis,thebookpresentsacase
studythatrunsthroughallofthechapters.Itusesdatafromtwosources:
TheNationalSurveyofFamilyGrowth(NSFG),conductedbytheU.S.
CentersforDiseaseControlandPrevention(CDC)togather“informationon
familylife,marriageanddivorce,pregnancy,infertility,useofcontraception,
andmen’sandwomen’shealth.”(Seehttp://cdc.gov/nchs/nsfg.htm.)
TheBehavioralRiskFactorSurveillanceSystem(BRFSS),conductedbythe
NationalCenterforChronicDiseasePreventionandHealthPromotionto
“trackhealthconditionsandriskbehaviorsintheUnitedStates.”(See
http://cdc.gov/BRFSS/.)
OtherexamplesusedatafromtheIRS,theU.S.Census,andtheBoston
Marathon.
ThissecondeditionofThinkStatsincludesthechaptersfromthefirstedition,
manyofthemsubstantiallyrevised,andnewchaptersonregression,timeseries
analysis,survivalanalysis,andanalyticmethods.Thepreviouseditiondidnot
usepandas,SciPy,orStatsModels,soallofthatmaterialisnew.
HowIWroteThisBook
Whenpeoplewriteanewtextbook,theyusuallystartbyreadingastackofold
textbooks.Asaresult,mostbookscontainthesamematerialinprettymuchthe
sameorder.
Ididnotdothat.Infact,IusedalmostnoprintedmaterialwhileIwaswriting
thisbook,forseveralreasons:
Mygoalwastoexploreanewapproachtothismaterial,soIdidn’twant
muchexposuretoexistingapproaches.
SinceIammakingthisbookavailableunderafreelicense,Iwantedtomake
surethatnopartofitwasencumberedbycopyrightrestrictions.
Manyreadersofmybooksdon’thaveaccesstolibrariesofprintedmaterial,
soItriedtomakereferencestoresourcesthatarefreelyavailableonthe
Internet.
Someproponentsofoldmediathinkthattheexclusiveuseofelectronic
resourcesislazyandunreliable.Theymightberightaboutthefirstpart,butI
thinktheyarewrongaboutthesecond,soIwantedtotestmytheory.
TheresourceIusedmorethananyotherisWikipedia.Ingeneral,thearticlesI
readonstatisticaltopicswereverygood(althoughImadeafewsmallchanges
alongtheway).IincludereferencestoWikipediapagesthroughoutthebookand
Iencourageyoutofollowthoselinks;inmanycases,theWikipediapagepicks
upwheremydescriptionleavesoff.Thevocabularyandnotationinthisbookare
generallyconsistentwithWikipedia,unlessIhadagoodreasontodeviate.Other
resourcesIfoundusefulwereWolframMathWorldandtheRedditstatistics
forum.
剩余344页未读,继续阅读
资源评论
- Iamhappy2019-09-06不错,正是我要的,用Python的
小陈老师
- 粉丝: 0
- 资源: 6
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功