Abstract
III
Abstract
With the exponential growth of the amount of data, traditional centralized storage has
been unable to meet business needs. Unlike centralized storage, distributed storage
consolidates the storage resources of each common computer through the network and
consolidates it into a virtual storage device that provides storage services in the form of a
storage interface over the Internet.So distributed storage has a strong usability, scalability.
Ceph is a distributed storage system based on object storage. As a result of the use of
object storage, the data processing process is highly parallel, by adding a common server to
the cluster, Ceph can easily expand the storage scale to the PB level, and as one of the core
algorithm CRUSH (Controlled Replication Under Scalable Hashing) can Dynamic
computing data storage location, that make it a system without a single point of failure.
However, the CRUSH algorithm does not take into account the storage efficiency issues
under different network conditions, and the default use of the StrawBucker storage type in
the CRUSH algorithm does not apply to the use of a large number of query operations,
where the Treebucket type, when data is migrated, has too big migration. In view of these
shortcomings, this paper has done the following works:
1. Ceph default to use the Primary-Replica model to read and write operations, which
will make the system bottlenecks when in write operation, and the weight of the node
weight is only related to capacity, without considering the impact of network latency on
storage performance. This paper proposes a CEPH storage performance optimization
method based on network delay for this problem. The method can dynamically adjust the
node weight when the node network is in poor condition, control the amount of data
flowing into the node, and make the data larger probability into the node with small
network delay, so as to improve the performance of the system.
2. Nodes in the Ceph must be in the same network segment, when a server is attacked,
it is likely that all the servers have been attacked, the data’s security is not effective
protection. This paper considers the data need to secure storage, protection of user privacy
and other reasons, design and initial realize a multi-cloud center for the distributed storage
system. Data files through the system can be cutted into several blocks and stored in
multiple cloud centers, in order to ensure that if a cloud center is broken and it’s data file
can not be restored into a complete file and the migration efficiency of data when the node
fails. Two kinds of storage topology construction and disaster recovery scheme are studied.
It can improve the data migration speed and reduce the migration amount respectively:
(1) The two-tier topology construction program based on the cloud center.
万方数据
评论0
最新资源