【免费】www.cn-ki.net_基于Kubemetes的大数据流式计算Spark平台设计与实现1资源-CSDN文库

需积分: 0 80 浏览量 2022-08-03 22:24:37 上传评论收藏 4.44MB PDF 举报

资源详情

资源评论

资源推荐

单位代

码

：



１０２９３



密

级：



．

部

｜

考

此

嗲傈

领

士

讼

式

论

文

题

目

：



基

于

Ｋ

ｕｂ

ｅ

ｍ

ｅ

ｔ

ｅｓ

的

大

数据流

式

计

算



Ｓｐａ

ｒ

ｋ

平

台

设

计

与

实

现



学



号





１２１４０４３００９



姓



名



杜

威

科



导



师



肖

甫

教

授



专

业

学位类别



工

程

硕

士



类



型



全



日



制

专

业

（

领

域

）



计

算

机

技

术



论文提交曰

期



二

零

—

ｂ年

三

月



摘要

目前，云平台主要是基于传统的虚拟机技术来实现底层物理资源的管理和弹性伸缩，在

启停速度、资源利用率、运维监控以及性能上有较大的资源开销。大数据计算框架部署在云

平台上是一种典型的应用场景，面对海量大数据的增长，传统云平台构架和处理方式无法有

效应对大数据的处理环境。

作为新兴的轻量级虚拟化容器技术，以

Docker

容器作为基本单位为开发人员提供快速构

建、部署和移植分布式应用，极大的简化开发者的部署运维流程，降低服务器成本。

Kubernetes

是

Google

自动部署和管理大规模

Docker

容器应用的开源系统，对容器化的应用提供资源调

度、自动部署、服务发现、弹性伸缩等一整套功能，对大数据分布式计算框架

Map-Reduce

也提供良好的支持。当然，

Docker

有关安全、存储等方面还有不足之处，在成为云平台构建

基础上还处于快速发展的阶段。

本文重点设计实现了以虚拟化

Docker

容器作为大数据的底层承载平台，以

Kubernetes

作为容器管理、调度系统，部署了基于

Dokcer

容器的

Spark

大数分布式计算框架。容器化的

大数据平台可以极大的提高资源利用率和计算并行度，简化了运维管理成本，并能够应对实

时负载，弹性伸缩

Spark

计算节点。针对基于

Kubernetes

部署

Spark

集群，本文的主要工作

如下：

（

）实现

Docker

容器跨主机通信。

Docker

本身不具备跨主机通信能力，利用

flannel

建

立一个叠加网络（

Overlay Network

），实现了不同物理主机上的容器通信的能力。

（

）基于

Kubernetes

系统设计实现了

Spark

集群。本文分析了

Spark

集群的通信机制，

使用

dockerfile

构建

Spark

镜像，设计实现了基于

kubernetes

平台的大数据流式计算

Spark

集

群，可以快速部署并横向扩展

Spark

集群。

（

）设计实现了基于负载的

Spark

节点弹性伸缩。针对

Docker

容器的资源监控，采集

各个

Node

节点上的容器资源使用数据，根据实时负载对

Spark

节点执行响应的伸缩活动。

（

）对该平台进行了部署和测试。实验表明，使用

Docker

容器构建

Spark

框架，能够

提高资源利用率、简化运维流程等，验证了该系统的可行性和有效性。

关键词：云计算， Docker， Kubernetes， Spark，弹性伸缩

Abstract

Nowadays, the cloud platform is based on the traditional virtual machine technology (VM) to

achieve the hardware resources management and flexible scalability.There is a greater resource

overhead on the speed of starting and stopping, resource utilization, operational monitoring and

performance .The big data computing framework deployed in the cloud platform is a typical

application scenario.With the rapid growth of the amount of data,the traditional cloud platform

architecture and processing methods can not effectively adapt to the big data processing

environment.

With the advent of lightweight container technology, Docker Container provides developers

with the ability to rapidly build, deploy and migrate distributed applications ,and greatly simplifies

the deployment process and reduces the server costs. Kubernetes is an open source system for

automating the deployment and management of large-scale Docker container applications.It

provides the scheduling of resources, automatic deployment, service discovery, and flexible scaling

for the containerized applications, and it offers support for the big data distributed computing

frameworks Map-Reduce.Of course, Docker is deficiency in security, storage and other aspects, and

it is still in the stage of rapid development for the cloud platform.

This paper focuses on the realization of the deployment of Spark distributed computing

framework based on Docker containers, with the virtualization container Docker as the lowwer

bearing platform and the Kubernetes as a container management and scheduling system.The

containerized big data platform can greatly improve resource utilization and computational

parallelism, simplify operation and maintenance management costs, and be able to automatically

scales the Spark computing nodes according to the real-time load. For the deployment of Spark

clusters based on Kubernetes, the main research of this paper is as follows:

(1)To realize the communication between the Docker container of different host. Docker itself

does not have the communication capabilities between the host computer. The use of flannel build

an overlay network to achieve the communication capabilities between the different host computer

container.

(2)To design and implementation of the Spark cluster based on the Kubernetes system. This

paper analyzes the communication mechanism of Spark cluster, constructs Spark image using

dockerfile, designs and implements Spark cluster based on kubernetes system, which can rapidly

剩余67页未读，继续阅读

评论收藏

内容反馈

罗小熙

粉丝: 17
资源: 319

www.cn-ki.net_基于Kubemetes的大数据流式计算Spark平台设计与实现1

评论0

最新资源

www.cn-ki.net_基于Kubemetes的大数据流式计算Spark平台设计与实现1

评论0

www.cn-ki.net_基于Nginx的安全管理系统的设计与实现1

www.cn-ki.net_基于Docker的分布式Web平台的研究与实现1

www.cn-ki.net_基于微课和慕课的翻转课堂教学设计研究1

www.cn-ki.net_基于Nginx的Web服务器负载均衡策略改进与实现1

www.cn-ki.net_基于Docker的资源调度及应用容器集群管理系统设计与实现1

www.cn-ki.net_北京餐饮中华老字号的分类、空间格局以及消费者网络评价.pdf

www.cn-ki.net_驾校综合服务平台的设计与实现.pdf

www.cn-ki.net_虚拟现实技术在数字图书馆的应用研究.caj

www.cn-ki.net_基于云计算平台的分布式架构设计1

www.cn-ki.net_基于微课的“翻转课堂”教学模式设计和实践1

www.cn-ki.net_基于Docker的平台即服务架构研究1

www.cn-ki.net_基于翻转课堂模式的教学设计及应用研究1

www.cn-ki.net_基于Nginx高并发Web服务器的改进与实现1

www.cn-ki.net_RESTful Web服务在云平台下的设计与实现1

www.cn-ki.net_基于Jenkins的项目持续集成方案研究与实现1

www.cn-ki.net_基于翻转课堂理念的教学应用模型研究1

www.cn-ki.net_基于Nginx服务器集群负载均衡技术的研究与改进1

www.cn-ki.net_基于J2EE架构的分布式企业级Web应用研究1

www.cn-ki.net_基于翻转课堂的学习者知识建构策略与效果研究1

www.cn-ki.net_云平台的资源监控与弹性伸缩技术研究与实现1

www.cn-ki.net_基于翻转课堂理念的云教室教学应用模型构建1

BurpLoaderKeygen.jar.zip

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

Goby红队版-win-x64-2.4.7版本

Chrome Header Editor 插件

ISO SAE 21434-2021 中文版.pdf

OpenVAS GVM 中文翻译补丁

安全认证cisp教材全套

最新资源