Python Data Visualization Cookbook Copyright C 2013 Packt Publishing All rights reserved No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: November 2013 Production Reference: 1191113 Published by Packt Publishing Ltd Livery Place 35 Livery Street Birmingham B3 2PB, UK SBN978-1-78216-336-7 CoverImagebygorkeeBhardwaj( Credits Author Project Coordinator Igor Milovanovic Rahul dixit Reviewers Proofreaders Tarek am Amy Johnson Simeone fran klin Lindsey Thomas Jayesh K. Gupta Indexer Kostiantyn Kucher Mariam mal Chettiya Kenneth emeka odoh Graphi Acquisition Editor Abhinash sahu James Jones Production Coordinator Shantanu Agade Ankita shashi Cover work Technical editors Shantanu Zagade Pratik more Amit Ramadas Ritika Singl Copy Editors Brandt D'Mello Janbal dharmaraj Deepa Nambiar Kirti Pa About the author Igor Milovanovic is an experienced developer with a strong background in Linux system and software engineering. He has skills in building scalable data-driven distributed software-rich systems He is an Evangelist for high-quality systems design who holds strong interests in softwar architecture and development methodologies. He is al ways persistent on advocating methodologies that promote high-quality software, such as test-driven development, one-step builds, and continuous integration He also possesses solid knowledge of product development. Having field experience and official training, he is capable of transferring knowledge and communication flow from business to developers and vice versa. I am most grateful to my fiance for letting me spend endless hours on the work instead with her and for being an avid listener to my endless book monologues. I want to also thank my brother for always being my strongest supporter. I am thankful to my parents for letting me develop myself in various ways and become the person I am today I could not write this book without enormous energy from open source community that developed Python, matplotlib, and all libraries that we have used in this book. I owe the most to the people behind all these projects. Thank you About the reviewers the University of East Anglia. He has about 10 years' experience in Software Development n Tarek Amr achieved his postgraduate degree in Data Mining and Information Retrieval fror He has been volunteering in global Voices online (gvo) since 2007, and currently he is the local ambassador of the Open Knowledge Foundation(OKFN) in Egypt. Words such as Open Data, Government 2.0, Data Visualisation, Data Journalism, Machine Learning, and Natural Language processing are like music to his ears Tarek's Twitter handle is agr3 3ndata and his homepage is Jayesh K Gupta is the Lead Developer of Matlab Toolbox for Biclustering Analysis (MTBA) He is currently an undergraduate student and researcher at lIT Kanpur. His interests lie in the field of pattern recognition. His interests also lie in basic sciences, recognizing them as the means of analyzing patterns in nature. Coming to lIT, he realized how this analysis is being augmented by Machine Learning algorithms with various diverse applications. He believes that augmenting human thought with machine intelligence is one of the best ways to advance human knowledge. He is a long time technophile and a free-software Evangelist. He usually goes by the handle, rejuvyesh online He is also an avid reader and his books can be checked outatGoodreads.CheckouthisprojectsatBitbucketandgithub.Foralllinksvisithttp:// Kostiantyn Kucher was born in Odessa, Ukraine. he received his Master's degree in Computer Science from Odessa National Polytechnic University in 2012. He used Python as well as matplotlib and pill for Machine learning and image recognition purposes Currently, Kostiantyn is a Phd student in Computer Science specializing in Information Visualization He conducts his research under the supervision of Prof Dr. Andreas Kerren with the IsovIs group at the Computer Science Department of Linnaeus University (Vaxjo, Sweden) Kenneth Emeka Odoh performs research on state of the art Data Visualization techniques. His research interest includes exploratory search where the users are guided to their search results using visual clues Kenneth is proficient in Python programming. He has presented a python conference talk at Pycon, Finland in 2012 where he spoke about data visualization in django to a packed audience He currently works as a graduate Researcher at the University of Regina, Canada He is a polyglot with experience in developing applications in C, C++, Python, and Java programming languages When Kenneth is not writing source codes, you can find him singing at the campion college chant choir Support files, eBooks, discount offers and more Youmightwanttovisitwww.Packtpub.comforsupportfilesanddownloadsrelatedto your book Did you know that packt offers e Book versions of every book published, with PDF and ePub filesavailableYoucanupgradetotheebooKversionatwww.Packtpub.comandasaprint book customer, you are entitled to a discount on the e book copy. Get in touch with us at servicepacktpub. com for more details Atwww.packtpub.comyoucanalsoreadacollectionoffreetechnicalarticlessignup for a range of free newsletters and receive exclusive discounts and offers on Packt books and e books PACKTLIB Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books Why Subscribe? Fully searchable across every book published by packt Copy and paste, print and bookmark content k on demand and accessible via web browser Free Access for packt account holders IfyouhaveanaccountwithPacktatwww.packtpub.comyoucanusethistoaccess PacktLib today and view nine entirely free books Simply use your login credentials for immediate access Table of contents Preface Chapter 1: Preparing Your Working Environment 5 Introduction Installing matplotlib, NumPy, and sciPy 6 Installing virtualenv and virtualenvwrapper Installing matplotlib on Mac Os x 10 Installling matplotlib on Windows 11 Installing Python Imaging Library(PIL) for image processing 12 Installing a requests module 14 Customizing matplotlib's parameters in code 14 Customizing matplotlib's parameters per project 16 Chapter 2: Knowing Your Data 19 Introductlon 19 Importing data from CSV 20 Importing data from Microsoft Excel files 22 Importing data from fixed-width datafiles 25 Importing data from tab-delimited files 27 Importing data from a jSoN resource 28 Exporting data to JsoN, csv, and Excel 31 Importing data from a database 36 Cleaning up data from outliers 40 Reading files in chunks 46 Reading streaming data sources 48 Importing image data into NumPy arrays 50 Generating controlled random datasets 56 Smoothing the noise in real-world data 64

