LocoFS：分布式文件系统的松耦合元数据服务资源-CSDN文库

90 浏览量 2021-03-18 22:16:46 上传评论收藏 745KB PDF 举报

LocoFS是一种分布式文件系统，它引入了一个松耦合的元数据服务模型，旨在提升文件系统元数据的性能。在分布式存储系统中，元数据管理通常是一个重要环节，因为它涉及到数据的索引、目录树结构以及数据访问模式。传统的关系数据库或目录树结构方式在处理大量数据时会遇到性能瓶颈，这是因为它们并不完全适应于键值存储模式（Key-Value stores）。元数据通常包括文件和目录的属性，例如权限、位置、所有者等信息，以及目录树结构等。在现有的分布式文件系统中，元数据通常被组织在一个层次化的目录结构中，这需要进行多层查找才能访问数据。这种结构与键值存储模式不匹配，因此限制了性能。为了解决这个问题，LocoFS设计了一种新的分布式文件系统架构，它通过松耦合的元数据服务来改善性能。在LocoFS中，提出了两个主要的技术手段： 1. 分离目录内容和结构：LocoFS将文件和目录索引节点组织在一个平面空间中，同时反向索引目录项。这种分离允许系统在不需要完整目录结构的情况下访问文件元数据，从而减少了查找和遍历目录树的时间。通过这种方式，LocoFS可以更快地处理元数据请求。 2. 分离文件元数据：这个技术进一步改进了键值访问模式下的性能。通过将文件元数据从目录结构中分离出来，系统能够更快速地响应客户端请求，从而提高了整体的访问效率。在LocoFS的研究中，评估了包含八个节点的系统，结果表明其元数据吞吐量提升了5倍。与当前最先进的IndexFS相比，LocoFS接近了单节点键值存储93%的吞吐量，而IndexFS只能达到18%。这种性能提升主要得益于LocoFS在元数据服务上的创新设计。 LocoFS的关键词包括Key-value stores（键值存储）、Filesystems management（文件系统管理）、Distributed storage（分布式存储）以及 Distributed architectures（分布式架构）。这些关键词指出了LocoFS的核心技术领域和应用背景。在参考文献格式方面，LocoFS研究论文按照ACM标准格式编排，包括了作者信息、出版年份、出版地、出版商等，并提供了数字对象标识符（DOI）。此外，论文作者还声明了复制和分发的版权政策，允许个人或教室使用而无需支付费用，但复制或分发用于营利或商业目的则是被禁止的。对于不属于作者所有的工作组件，仍需尊重版权。在引用时，允许摘要并附上引文，但未经授权的复制或再版则是不允许的。

资源推荐

资源详情

资源评论

LocoFS: A Loosely-Coupled Metadata Service for Distributed

File Systems

Siyang Li

∗

Tsinghua University

lisiyang@tsinghua.edu.cn

Youyou Lu

Tsinghua University

luyouyou@tsinghua.edu.cn

Jiwu Shu

†

Tsinghua University

shujw@tsinghua.edu.cn

Yang Hu

University of Texas, Dallas

huyang.ece@u.edu

Tao Li

University of Florida

taoli@ece.u.edu

ABSTRACT

Key-Value stores provide scalable metadata service for distributed

le systems. However, the metadata’s organization itself, which is

organized using a directory tree structure, does not t the key-value

access pattern, thereby limiting the performance. To address this

issue, we propose a distributed le system with a loosely-coupled

metadata service, LocoFS, to bridge the performance gap between

le system metadata and key-value stores. LocoFS is designed to

decouple the dependencies between dierent kinds of metadata

with two techniques. First, LocoFS decouples the directory content

and structure, which organizes le and directory index nodes in a

at space while reversely indexing the directory entries. Second,

it decouples the le metadata to further improve the key-value

access performance. Evaluations show that LocoFS with eight nodes

boosts the metadata throughput by 5 times, which approaches 93%

throughput of a single-node key-value store, compared to 18% in

the state-of-the-art IndexFS.

KEYWORDS

Key-value stores, File systems management, Distributed storage,

Distributed architectures

ACM Reference Format:

Siyang Li, Youyou Lu, Jiwu Shu, Yang Hu, and Tao Li. 2017. LocoFS: A

Loosely-Coupled Metadata Service for Distributed File Systems. In Proceed-

ings of SC17 . ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/

3126908.3126928

∗

Also with State Key Laboratory of Mathematical Engineering and Advanced

Computing.

†

Jiwu Shu is the corresponding author.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than the

author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior specic permission

and/or a fee. Request permissions from permissions@acm.org.

SC17, November 12–17, 2017, Denver, CO, USA

2017 Copyright held by the owner/author(s). Publication rights licensed to Associa-

tion for Computing Machinery.

ACM ISBN 978-1-4503-5114-0/17/11... $15.00

https://doi.org/10.1145/3126908.3126928

1 INTRODUCTION

As clusters or data centers are moving from Petabyte level to Ex-

abyte level, distributed le systems are facing challenges in meta-

data scalability. The recent work IndexFS [

] uses hundreds of

metadata servers to achieve high-performance metadata operations.

However, most of the recent active super computers only deploy

1 to 4 metadata servers to reduce the complexity of management

and guarantee reliability. Besides, previous work [

] has also

revealed that metadata operations consume more than half of all

operations in le systems. The metadata service plays a major role

in distributed le systems. It is important to support parallel pro-

cessing large number of les with a few number of metadata servers.

Unfortunately, inecient scalable metadata service cannot utilize

the performance of key-value (KV) stores in each metadata server

node, thus degrades throughput.

On the other hand, KV stores have been introduced to build le

systems [

]. They not only export a simple interface

(i. e. ,

get

and

put

) to users, but also use ecient data organization

(e. g. , Log-Structured Merge Tree [

]) in the storage. Since data

values are independent and are organized in such a simple way, KV

stores enable small objects to be accessed eciently and provide

excellent scalability, which makes them a promising technique for

le system metadata servers. The advantage of KV stores have been

leveraged in le system metadata (e. g. ,

inode

and

dirent

) for

small objects [18, 38].

However, we observe that there is a huge performance gap be-

tween KV stores and le system metadata, even for those le sys-

tems that have been optimized using KV stores. For example, in a

single server, the LevelDB KV store [

] can achieve performance at

128K IOPS for random put operations and 190K IOPS for random

get operations [

]. Nevertheless, IndexFS [

], which stores le

system metadata using LevelDB and show much better scalability

than traditional distributed le systems, only achieves 6K IOPS that

is 1. 7% of LevelDB for create operations per node.

We identify that le metadata accesses have strong dependen-

cies due to the semantics of a directory tree. The limitation is

transferred from local KV store to network latency because of

the complicated communication within metadata operation. For

instance, a le

create

operation needs to write at least three lo-

cations in the metadata: its le

inode

, its

dirent

and

inode

. In a

local le system, these update operations occur in one node, the

cost of the le operation in software layer is mainly in the data

organization itself. However, in distributed le system, the main

SC17, November 12–17, 2017, Denver, CO, USA Siyang Li, Youyou Lu, Jiwu Shu, Yang Hu, and Tao Li

performance bottleneck is caused by the network latency among

dierent nodes. Recent distributed le systems distribute metadata

either to data servers [

] or to a metadata server cluster

(MDS cluster) [

] to scale the metadata service. In such

distributed metadata services, a metadata operation may need to

access multiple server nodes. Considering these accesses have to

be atomic [

] or performed in correct order [

] to keep

consistency, a le operation needs to traverse dierent server nodes

or access a single node many times. Under such circumstance, the

network latency can severely impact the inter-node access perfor-

mance of distributed metadata service.

Our goal in this paper is two-fold, (1) to reduce the network

latency within metadata operation; (2) to fully utilize KV-store’s

performance benets. Our key idea is to reduce the dependencies

among le system metadata (i. e. , the logical organization of le

system metadata), ensuring that important operation only commu-

nicates with one or two metadata servers during its life cycle.

To such an end, we propose LocoFS a loosely-coupled metadata

service in a distributed le system, to reduce network latency and

improve utilization of KV store. LocoFS rst cuts the le metadata

(i. e. , le

inode

) from the directory tree. These le metadata are

organized independently and they form a at space, where the

dirent-inode

relationship for les are kept with the le

inode

using the form of reverted index. This attened director y tree struc-

ture matches the KV access patterns better. LocoFS also divides

the le metadata into two parts: access part and content part. This

partition in the le metadata further improves the utilization of

KV stores for some operations which only use part of metadata.

In such ways, LocoFS reorganizes the le system directory tree

with reduced dependency, enabling higher eciency in KV based

accesses. Our contributions are summarized as follows:

(1)

We propose a attened directory tree structure to decouple the

le metadata and directory metadata. The attened directory

tree reduces dependencies among metadata, resulting in lower

latency.

(2)

We also further decouple the le metadata into two parts to

make their accesses in a KV friendly way. This separation fur-

ther improves le system metadata performance on KV stores.

(3)

We implement and evaluate LocoFS. Evaluations show that Lo-

coFS achieves 100K IOPS for le

create

and

mkdir

when using

one metadata server, achieve 38% of KV-store’s performance.

LocoFS also achieves low latency and maintains scalable and

stable performance.

The rest of this paper is organized as follows. Section 2 discusses

the implication of directory structure in distributed le system

and the motivation of this paper. Section 3 describes the design

and implementation of the proposed loosely-coupled metadata ser-

vice, LocoFS. It is evaluated in Section 4. Related work is given in

Section 5, and the conclusion is made in Section 6.

2 MOTIVATION

In this section, we rst demonstrate the huge performance gap

between distributed le system (DFS) metadata and key-value (KV)

stores. We then explore the design of current DFS directory tree

to identify the performance bottlenecks caused by the latency and

scalability.

14k

17k

29k

48k

92k

260k

95%

93%

89%

82%

65%

CephFS Gluster Lustre IndexFS Single-KV

File Creation (OPS)

100k

200k

300k

Number of Metadata Servers

1 2 4 8 16

Figure 1: Performance Gap between File System Metadata

(Lustre, CephFS and IndexFS) and KV Stores (Kyoto Cabinet

(Tree DB)).

2.1 Performance Gap Between FS Metadata and

KV Store

There are four major schemes for metadata management in dis-

tributed le system: single metadata server (single-MDS) scheme

(e.g., HDFS), multi-metadata servers schemes (multi-MDS) with

hash-based scheme(e.g., Gluster [

]), directory-based scheme (e.g.,

CephFS [

]), stripe-based scheme (e.g., Lustre DNE, Giga+ [

]).

Comparing with directory-based scheme, hash-based and stripe-

based schemes achieve better scalability but sacrice the locality

on single node. One reason is that the multi-MDS schemes issue

multiple requests to the MDS even if these requests are located in

the same server. As shown in gure 1 compared with the KV, the le

system with single MDS on both one node (95% IOPS degradation)

and mulitple nodes (65% IOPS degradation on 16 nodes).

From the gure 1 , we can also see that IndexFS, which stores

metadata using LevelDB, achieves an IOPS that is only 1. 6% of

LevelDB [

], when using one single server. To achieve the Kyoto

Cabinet’s performance on a single server, IndexFS needs to scale-out

to 32 servers. Therefore, there is still a large headroom to exploit the

performance benets of key-value stores in the le system metadata

service.

2.2 Problems with File System Directory Tree

We further study the le system directory tree structure to under-

stand the underline reasons of the huge performance gap. We nd

that the cross-server operations caused by strong dependencies

among DFS metadata dramatically worsen the metadata perfor-

mance. We identify two major problems as discussed in the follow-

ing.

2.2.1 Long Locating Latency. Distributed le systems spread

metadata to multiple servers to increase the metadata processing

capacity. A metadata operation needs to communicate with multiple

metadata servers, and this may lead to high latency of metadata

operations. Figure 2 shows an example of metadata operation in

a distributed le system with distributed metadata service. In this

example,

inodes

are distributed in server n1 to n4. If there is a

request to access le 6, the le system client has to access

1 rst

to read dir 0, and then read dir 1, 5 and 6 sequentially. When these

剩余11页未读，继续阅读

评论收藏

内容反馈

weixin_38625448

粉丝: 8
资源: 956

LocoFS：分布式文件系统的松耦合元数据服务

阿里云 专有云企业版 V3.5.2 分布式文件系统DFS 产品简介 - 20180831.pdf

2009系统架构师大会PPT：田逸：分布式文件系统moosefs

linux运维笔记：分布式文件系统GlusterFS.docx

分布式存储系统：HBase：分布式存储系统概论.docx

分布式存储系统：HDFS：分布式存储系统概论.docx

分布式存储系统：Google Cloud Storage：分布式存储系统概论.docx

分布式文件系统综述文档

分布式文件系统需求

开源分布式文件系统

分布式文件系统ppt

第3章-分布式文件系统HDFS.pdf

分布式文件系统的历史与现状.

布式系统期末大作业：模拟一个简单的分布式文件系统.zip

一个java实现的分布式文件存储系统，可以实现文件分布存储在不同的服务器中，进行上传、下载、删除

分布式文件系统 c++编写

分布式文件系统概念

Hadoop分布式文件系统：架构和设计.doc

一种分布式文件存储系统的探索与应用.pdf

四：分布式框架专题.rar

分布式文件系统简要对比与分析.pdf

使用分布式文件系统实现数据同步.pdf

分布式文件系统与传统文件系统的比较研究.pdf

Hadoop分布式文件系统——翻译

分布式文件系统综述.pdf

计算机等考四级数据库工程师资料：分布式ORACLE系统简介.docx

最新资源

阿里云专有云企业版 V3.5.2 分布式文件系统DFS 产品简介 - 20180831.pdf