Apache Sqoop Cookbook

所需积分/C币:25 2016-11-14 21:41:03 8.89MB PDF
10
收藏 收藏
举报

Sqoop(发音:skup)是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql、postgresql...)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中。 Sqoop项目开始于2009年,最早是作为Hadoop的一个第三方模块存在,后来为了让使用者能够快速部署,也为了让开发人员能够更快速的迭代开发,Sqoop独立成为一个Apache项目。
OREILLY Learn how to turn Strata data into decisions. Making data Work From startups to the Fortune 500, smart companies are betting on data-driven insight, seizing the opportunities that are emerging from the convergence of four powerful trends New methods of collecting managing and analyzing data Cloud computing that offers inexpensive storage and flexible on-demand computing power for massive data sets Visualization techniques that turn complex data into images I a compelling story Tools that make the power of data available to anyone Get control over big data and turn it into insight with O'Reilly's Strata offerings. Find the inspiration and information to create new products or revive existing ones, understand customer behavior and get the data edge ○ REILLY Visit oreilly. com/data to learn more. Apache sgoop〔ookb0ok Kathleen Ting and Jarek Jarcec Cech ORE|LLY° Beijing Cambridge. Farnham. KoIn. Sebastopol Tokyo Apache Sgoop Cookbook by Kathleen Ting and Jarek Jarcec Cecho Copyright o 2013 Kathleen Ting and Jarek Jarcec Cecho. All rights reserved Printed in the United States of america Published by o reilly media, InC., 1005 Gravenstein Highway North, Sebastopol, CA 95472 OReilly books may be purchased for educational, business, or sales promotional use Online editions are alsoavailableformosttitles(http://my.safaribooksonline.com).fOrmoreinformation,contactourcorporate institutionalsalesdepartment800-998-9938orcorporate@oreilly.com Editor: Courtney Nash Proofreader: Julie Van Keuren Production Editor: Rachel Steely Cover Designer: Randy Comer Copyeditor BIM Proofreading Services Interior Designer: David Futato July 2013 First edition Revision history for the first Edition: 2013-06-28: First release Seehttp://oreilly.com/catalog/errata.csp?isbn=9781449364625forreleasedetails Nutshell Handbook, the Nutshell Handbook logo, and the O Reilly logo are registered trademarks ofO Reilly Media, Inc. Apache Sqoop Cookbook, the image of a great White Pelican, and related trade dress are trade marks of o reilly media, Inc Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and o reilly Media, Inc, was aware of a trade mark claim, the designations have been printed in caps or initial caps Apache,Sqoop, Apache Sqoop, and the Apache feather logos are registered trademarks or trademarks of The apache software Foundation While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein ISBN:978-1-449-36462-5 Table of contents Foreword Preface 1. Getting Started.… 1.1. Downloading and Installing Sqoop 1. 2. Installing DBC Drivers 13. Installing Specialized Connectors 1. 4. Starting Sqoop 1.5. Getting Help with Sqoop 34569 2. Importing Data....... 2. 1. Transferring an Entire Table 2. 2. Specifying a Target Directory 2.3. Importing Only a Subset of Data 2.4. Protecting Your Password 01335 2.5. USing a File Format Other Than CSV 2.6. Compressing Imported Data 16 2. 7. Speeding Up Transfers 17 2.8. Overriding Type Mapping 18 2.9. Controlling Parallelism 19 2.10. Encoding null values 21 2. 11. Importing all Your tables 22 3. Incremental Import.…….25 3. 1. Importing Only New Data 25 3.2. Incrementally Importing Mutable Data 26 3.3. Preserving the Last Imported value 27 3.4. Storing Passwords in the Metastore 28 3.5. Overriding the arguments to a saved job 29 3.6. Sharing the Metastore Between Sqoop Clients 30 4. Free-Form Query Import...... 33 4.1. Importing data from Two Tables 34 4. 2. Using Custom Boundary Queries 35 4.3. Renaming Sqoop job Instances 37 4.4. Importing Queries with Duplicated Columns 37 5. Export. 39 5.1. Transferring Data from Hadoop 39 5.2. Inserting Data in Batches 5.3. Exporting with All-or-Nothing Semantics 42 5.4. Updating an Existing Data Set 43 5.5. Updating or Inserting at the Same Time 5.6. Using Stored Procedures 45 5.7. Exporting into a Subset of columns 5.8. Encoding the null value differently 47 5.9. Exporting Corrupted Data 48 6. Hadoop Ecosystem Integration........................5 6. 1. Scheduling sgoop jobs with Oozie 51 6. 2. Specifying Commands in Oozie 52 6.3 Using Property Parameters in Oozie 53 6.4. Installing DBC Drivers in Oozie 54 6.5. Importing data Directly into Hive 55 6.6. Using Partitioned Hive Tables 56 6.7. Replacing Special Delimiters During Hive Import 6.8. Using the Correct NULL String in Hive 59 6.9. Importing Data into HBase 60 6.10. Importing All Rows into HBase 61 6. 11. Improving Performance When Importing into HBase 62 7. Specialized Connectors 63 7.1. Overriding Imported boolean Values in PostgreSQL Direct Import 63 7. 2. Importing a Table Stored in Custom Schema in PostgreSQL 7.3. Exporting into PostgreSQL USing pg_ bulkload 65 7.4. Connecting to MySQL 66 7.5. Using Direct MyS lySQL Import into Hive 66 7.6. Using the upsert Feature When Exporting into MySQL 7.7. Importing from Oracle 68 7.8. Using Synonyms in Oracle 79. Faster Transfers with Oracle ⅵi| Table of contents 7.10. Importing into Avro with OraOop 70 7. 11. Choosing the Proper Connector for Oracle 7. 12. Exporting into Teradata 73 7.13. Using the Cloudera Teradata Connector 7.14. Using Long Column Names in Teradata 74 Table of Contents

...展开详情
试读 94P Apache Sqoop Cookbook
立即下载
限时抽奖 低至0.43元/次
身份认证后 购VIP低至7折
一个资源只可评论一次,评论内容不能少于5个字
您会向同学/朋友/同事推荐我们的CSDN下载吗?
谢谢参与!您的真实评价是我们改进的动力~
  • 签到新秀

关注 私信
上传资源赚钱or赚积分
最新推荐
Apache Sqoop Cookbook 25积分/C币 立即下载
1/94
Apache Sqoop Cookbook第1页
Apache Sqoop Cookbook第2页
Apache Sqoop Cookbook第3页
Apache Sqoop Cookbook第4页
Apache Sqoop Cookbook第5页
Apache Sqoop Cookbook第6页
Apache Sqoop Cookbook第7页
Apache Sqoop Cookbook第8页
Apache Sqoop Cookbook第9页
Apache Sqoop Cookbook第10页
Apache Sqoop Cookbook第11页
Apache Sqoop Cookbook第12页
Apache Sqoop Cookbook第13页
Apache Sqoop Cookbook第14页
Apache Sqoop Cookbook第15页
Apache Sqoop Cookbook第16页
Apache Sqoop Cookbook第17页
Apache Sqoop Cookbook第18页
Apache Sqoop Cookbook第19页

试读结束, 可继续读1页

25积分/C币 立即下载