Text Analytics
with Python
A Practical Real-World Approach to
Gaining Actionable Insights from
Your Data
—
Dipanjan Sarkar
Text Analytics
with Python
A Practical Real-World
Approach to Gaining Actionable
Insights from your Data
Dipanjan Sarkar
Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable
Insights from Your Data
Dipanjan Sarkar
Bangalore, Karnataka
India
ISBN-13 (pbk): 978-1-4842-2387-1 ISBN-13 (electronic): 978-1-4842-2388-8
DOI 10.1007/978-1-4842-2388-8
Library of Congress Control Number: 2016960760
Copyright © 2016 by Dipanjan Sarkar
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole
or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical
way, and transmission or information storage and retrieval, electronic adaptation, computer
software, or by similar or dissimilar methodology now known or hereafter developed.
Trademarked names, logos, and images may appear in this book. Rather than use a trademark
symbol with every occurrence of a trademarked name, logo, or image we use the names, logos,
and images only in an editorial fashion and to the benefit of the trademark owner, with no
intention of infringement of the trademark.
The use in this publication of trade names, trademarks, service marks, and similar terms, even
if they are not identified as such, is not to be taken as an expression of opinion as to whether or
not they are subject to proprietary rights.
While the advice and information in this book are believed to be true and accurate at the
date of publication, neither the authors nor the editors nor the publisher can accept any legal
responsibility for any errors or omissions that may be made. The publisher makes no warranty,
express or implied, with respect to the material contained herein.
Managing Director: Welmoed Spahr
Lead Editor: Mr. Sarkar
Technical Reviewer: Shanky Sharma
Editorial Board: Steve Anglin, Pramila Balan, Laura Berendson, Aaron Black,
Louise Corrigan, Jonathan Gennick, Robert Hutchinson, Celestin Suresh John,
Nikhil Karkal, James Markham, Susan McDermott, Matthew Moodie, Natalie Pao,
Gwenan Spearing
Coordinating Editor: Sanchita Mandal
Copy Editor: Corbin Collins
Compositor: SPi Global
Indexer: SPi Global
Artist: SPi Global
Distributed to the book trade worldwide by Springer Science+Business Media New York,
233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505,
e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com. Apress Media, LLC is a
California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc
(SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.
For information on translations, please e-mail rights@apress.com, or visit www.apress.com.
Apress and friends of ED books may be purchased in bulk for academic, corporate,
or promotional use. eBook versions and licenses are also available for most titles.
For more information, reference our Special Bulk Sales–eBook Licensing web page at
www.apress.com/bulk-sales.
Any source code or other supplementary materials referenced by the author in this text are
available to readers at www.apress.com. For detailed information about how to locate your
book’s source code, go to www.apress.com/source-code/. Readers can also access source code
at SpringerLink in the Supplementary Material section for each chapter.
Printed on acid-free paper
is book is dedicated to my parents, partner, well-wishers,
and especially to all the developers, practitioners, and
organizations who have created a wonderful and thriving
ecosystem around analytics and data science.
v
Contents at a Glance
About the Author ����������������������������������������������������������������������������� xv
About the Technical Reviewer ������������������������������������������������������� xvii
Acknowledgments �������������������������������������������������������������������������� xix
Introduction ������������������������������������������������������������������������������������ xxi
■Chapter 1: Natural Language Basics ���������������������������������������������� 1
■Chapter 2: Python Refresher �������������������������������������������������������� 51
■Chapter 3: Processing and Understanding Text �������������������������� 107
■Chapter 4: Text Classification ����������������������������������������������������� 167
■Chapter 5: Text Summarization �������������������������������������������������� 217
■Chapter 6: Text Similarity and Clustering ����������������������������������� 265
■Chapter 7: Semantic and Sentiment Analysis ���������������������������� 319
Index ���������������������������������������������������������������������������������������������� 377
评论0
最新资源