FEATURE ENGINEERING FOR
MACHINE LEARNING AND
DATA ANALYTICS
Chapman & Hall/CRC
Data Mining and Knowledge Series
Series Editor: Vipin Kumar
RapidMiner
Data Mining Use Cases and Business Analytics Applications
Markus Hofmann and Ralf Klinkenberg
Computational Business Analytics
Subrata Das
Data Classication
Algorithms and Applications
Charu C. Aggarwal
Healthcare Data Analytics
Chandan K. Reddy and Charu C. Aggarwal
Accelerating Discovery
Mining Unstructured Information for Hypothesis Generation
Scott Spangler
Event Mining
Algorithms and Applications
Tao Li
Text Mining and Visualization
Case Studies Using Open-Source Tools
Markus Hofmann and Andrew Chisholm
Graph-Based Social Media Analysis
Ioannis Pitas
Data Mining
A Tutorial-Based Primer, Second Edition
Richard J. Roiger
Data Mining with R
Learning with Case Studies, Second Edition
Luís Torgo
Social Networks with Rich Edge Semantics
Quan Zheng and David Skillicorn
Large-Scale Machine Learning in the Earth Sciences
Ashok N. Srivastava, Ramakrishna Nemani, and Karsten Steinhaeuser
Data Science and Analytics with Python
Jesus Rogel-Salazar
Feature Engineering for Machine Learning and Data Analytics
Guozhu Dong and Huan Liu
For more information about this series please visit:
https://www.crcpress.com/Chapman--HallCRC-Data-Mining-and-Knowledge-Discovery-Series/book-series/CHDAMINODIS
FEATURE ENGINEERING FOR
MACHINE LEARNING AND
DATA ANALYTICS
Edited by
Guozhu Dong and Huan Liu
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2018 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed on acid-free paper
Version Date: 20180301
International Standard Book Number-13: 978-1-1387-4438-7 (Hardback)
is book contains information obtained from authentic and highly regarded sources. Reasonable
eorts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. e authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microlming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access
www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-prot organization
that provides licenses and registration for a variety of users. For organizations that have been granted
a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identication and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
MATLAB
•
is a trademark of e MathWorks, Inc. and is used with permission. e MathWorks
does not warrant the accuracy of the text or exercises in this book. is book’s use or discussion
of MATLAB
•
software or related products does not constitute endorsement or sponsorship by e
MathWorks of a particular pedagogical approach or particular use of the MATLAB
•
software.