Practical Hive

所需积分/C币:11 2019-05-17 16:14:01 9.34MB PDF

Practical Hive Practical Hive Practical Hive Practical Hive
Practical Hive: A Guide to Hadoop's Data Warehouse System Scott shaw Andreas Francois Vermeulen Saint louis, missouri, usa West Kilbride north ayrshire, United Kingdom Ankur Gupta David Kierrumgaard Uxbridge, United Kingdom Henderson, Nevada, USA ISBN-13(pbk):978-1484202722 ISBN-13( electronic:978-1-4842-0271-5 DOI10.1007/978-14842-0271-5 Library of Congress Control Number: 2016951940 Copyright o 2016 by Scott Shaw, Andreas Francois Vermeulen, Ankur Gupta, David Kjerrumgaard This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software or by similar or dissimilar methodology now known or hereafter developed Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher's location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center Violations are liable to prosecution under the respective Copyright Law Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner with no intention of infringement of the trademark The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights While the advice and information in this book are believed to be true and accurate at the date of publication, neither he authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein Managing Director: Welmoed Spahr Acquisitions Editor: Robert hutchinson Developmental Editor: Matt Moodie Technical Reviewer: Ancil McBarnett, Chris hillman Editorial Board: Steve Anglin, Pramila Balen, Laura Berendson, Aaron Black, Louise Corrigan Jonathan Gennick, Robert Hutchinson, Celestin Suresh John, Nikhil Karkal, James Markham Susan McDermott, Matthew Moodie, Natalie Pao, Gwenan Spearing Coordinating editor: Rita fernando Copy Editor: Kezia Endsley Compositor: SPi Global Indexer: SPi Global Cover Image: Designed by FreePik Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax(201)348-4505, e-mail orders-ny@springer-sbm com orvisitwww.springercomApressMedia,LlcisaCaliforniaLlcandthesolemember(owner)isSpringer Science Business Media Finance Inc(SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation Forinformationontranslationspleasee-mailrights@apress.comorvisitwww.apress.com Apress and friends of ed books may be purchased in bulk for academic, corporate, or promotional use e Book versions and licenses are also available for most titles For more information, reference our Special bulk Sales-ebookLicensingwebpageatwww.apress.com/bulk-sales Any source code or other supplementary materials referenced by the author in this text is available to readers at www.apress.com.Fordetailedinformationabouthowtolocateyourbookssourcecodegoto www.Apress.com/source-code/ Printed on acid-free paper I dedicate this book to my family. They put up with me being on the computer everyday and yet they have no idea what i do for a living. Love you! -Scott shaw I dedicate this book to my family and wise mentors for their support. Special thanks to denise and laurence Andreas francois vermeulen I would like to express my gratitude to the many people who saw me through this book Above all i want to thank my wife, Jasveen, and the rest of my family, who supported and encouraged me in spite of all the time it took me away from them Ankur gupta By perseverance, study, and eternal desire, any man can become great. -George S Patton -David kjerrumgaard Contents at a glance About the authors About the technical reviewers xvii Acknowledgments XIX Introduction maxXi Chapter 1: Setting the Stage for Hive: Hadoop B RBEBBRRBRRIRI Chapter 2: Introducing Hive amman 23 Chapter 3: Hive Architecture a m RB IRaa ERa RB RR REaR aaa anaa ERa aRa InL 37 Chapter 4: Hive Tables DDL. 49 Chapter 5: Data Manipulation Language(DML)maman ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■口■■■■■口■■■■■口■■ Chapter 6: Loading data into Hive .mma aaammmmmmmmmmaan 99 Chapter 7: Querying Semi-Structured Data mamma 115 Chapter8: Hive Analytics,,,,…,,,…,,,,…,,,,,m,,,133 Chapter 9: Performance Tuning: Hive mamma n219 Chapter 10: Hive Security.n ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■國■■■■■■■■■■■國■■■a■■a■圆■a■ 233 Chapter 11: The Future of Hive amaamamamammmmammmmma 245 Appendix A: Building a Big Data Team 249 Appendix B: Hive Functions. mmmmant ga253 Index 263 Contents About the authors About the technical reviewers ummxvii Acknowledgments Introduction…nm mXX Chapter 1: Setting the Stage for Hive: Hadoop mm mmn. An Elephant ls born Hadoop Mechanic…..,.,.,.,.,.,.,.,… 1236 Data Redundancy. Traditional High availability.. Hadoop High Availability Processing with MapReduce Beyond mapreduce 16 YarN and the modern data architecture 7 Hadoop and the open Source community Where Are we now 22 Chapter 2: Introducing Hive ■■■■■■■■■■■■■■■■■■國■■■■■■■■■■■■■■口■■■■■■■■■■■■■■■■■■■■國■■■■■■■■■■■■■■■■■■■■■■■■■■ 3 Hadoop Distributions.. 24 Cluster Hive Installation Finding Your Way Around 31 Hive cli 34 CONTENTS Chapter3: Hive Architecture,,…,,…,…,,,,,,,,,,37 Hive Components 37 CAtalog 38 Hiveserver2 Client Tools 43 Execution Engine: Tez 46 Chapter 4: Hive Tables dDLummmmmmmmammmammmmmm 49 Schema-on-Read 9 Hive Data model 50 Schemas database Why Use Multiple Schemas /Databases 50 Creating Databases 50 Altering Databases. ..................................................................................................51 Dropping databases....,.,…,,…,,…………….51 List databases…152 Data types in Hive 52 Primitive Data Types 52 Choosing Data Types 52 Complex data Types Tables… 54 Creating Tables..................... Listing Tables… Internal/ External Tables .mm.m. 56 Internal or Managed Tables. External /Internal Table Example... 57 Table properties 61 Generating a Create Table Command for Existing Tables Partitioning and Bucketing 62 Partitioning Considerations............... ,64 Efficiently Partitioning on Date Columns V11 CONTENTS Bucketing Considerations Altering Tables…....,,…….68 0 RC File format.,,,,……… 日日面日日日a日自日面日日自日日自日面日日自日日日面日自日面日日自日日面日自 Altering Table Partitions 70 Modifying columns… 74 Dropping tables/ Partitions................….……….74 Protecting Tables/ Partitions.….........,,…,……,……,…………………75 other create Table Command options 75 Chapter 5: Data Manipulation Language(Ml)mn ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■国a■ 77 Loading data into Tables 77 Loading Data Using Files Stored on the Hadoop Distributed File System 78 Loading Data Using Queries................. Writing Data into the File System from Queries Inserting values Directly into Tables Updating Data Directly in Tables 3568 Deleting Data Directly in Tables Creating a table with the same Structure 89 Joins 90 Using equality Joins to Combine Tables. Using outer Joins Using Left Semi-Joins Using Join with Single Map............. 95 Using Largest Table Last…… 96 Transactions What Is ACID and Why Use It? .... 97 Hive Configuration 97 Chapter6: Loading Data into Hive,,…,…,,…,,,,,,,…,…99 Design Considerations Before Loading Data 99 Loading Data into HDFS 100 Ambari files view 100 Hadoop command line IX

...展开详情
img
caofeng891102

关注 私信 TA的资源

上传资源赚积分,得勋章
最新资源