If you are a system or application developer interested in learning how to solve practical problems using the Hadoop framework, then this book is ideal for you. You are expected to be familiar with the Unix/Linux command-line interface and have some experience with the Java programming language. Familiarity with Hadoop would be a plus.
Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.
Written by the developers of Spark, this book will have data scientists and engineers up and running in no time. You'll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning.
Quickly dive into Spark capabilities such as distributed datasets, in-memory caching, and the interactive shell
Leverage Spark's powerful built-in libraries, including Spark SQL, Spark Streaming, and MLlib
Use one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and Storm
Learn how to deploy interactive, batch, and streaming applications
Connect to data sources including HDFS, Hive, JSON, and S3
Master advanced topics like data partitioning and shared variables
Spark GraphX in Action starts out with an overview of Apache Spark and the GraphX graph processing API. This example-based tutorial then teaches you how to configure GraphX and how to use it interactively. Along the way, you'll collect practical techniques for enhancing applications and applying machine learning algorithms to graph data.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
高可用性的HDFS:Hadoop分布式文件系统深度实践专注于Hadoop分布式文件系统(HDFS)的主流HA解决方案,内容包括:HDFS元数据解析、Hadoop元数据备份方案、Hadoop Backup Node方案、AvatarNode解决方案以及最新的HA解决方案Cloudrea HA Name Node等。其中有关Backup Node方案及AvatarNode方案的内容是本书重点,尤其是对AvatarNode方案从运行机制到异常处理方案的步骤进行了详尽介绍,同时还总结了各种异常情况下AvatarNode的各种处理方案。
高可用性的HDFS:Hadoop分布式文件系统深度实践从代码入手并结合情景分析、案例解说对HDFS的元数据以及主流的HDFS HA解决方案的运行机制进行了深入剖析,力求使读者在解决问题时做到心中有数,不仅知其然还知其所以然。
高可用性的HDFS:Hadoop分布式文件系统深度实践主要为云计算相关领域的研发人员、云计算系统管理维护人员,也适合作为高校研究生和高年级本科生的专业课辅助教材。