没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
外文原文
A Comparative Study of MongoDB and Document-Based MySQL
for Big Data Application Data Management
By: Győrödi Cornelia A.,DumşeBurescu Diana V.,Zmaranda Doina R.,Győrödi Robert Ş.
Source: [J]Big Data and Cognitive ComputingVolume 6, Issue 2. 2022. PP 49-49
1. Introduction
Currently, an explosion of data to be stored has been observed to originate from
social media, cloud computing services, and Internet of Things (IoT). The term,
“Internet of Things” actually refers to the combination of three distinct ideas: a large
number of ‘‘smart’’ objects, all connected to the Internet, with applications and
services using the data from these objects to create interactions. Nowadays, IoT
applications can be made to be very complex by using interdisciplinary approaches
and integrating several emerging technologies such as human–computer interactions,
machine learning, pattern recognition, and ubiquitous computing. Additionally,
several approaches and environments for conducting out analytics on clouds for Big
Data applications have appeared in recent years.
The widespread deployment of IoT drives the high growth of data, both in
quantity and category, thus leading to a need for the development of Big Data
applications. The large volume of data from IoT has three characteristics that conform
to the Big Data paradigm: (i) Abundant terminals that generate a large volume of data;
(ii) the data generated from IoT is usually semi-structured or unstructured; (iii) the
data of IoT is only useful when it is analyzed.
As the volume of data has increased exponentially and applications must handle
millions of users simultaneously and process a huge volume of unstructured and
complex data sets, a relational database model has serious limitations when it has to
handle that huge volume of data. These limitations have led to the development of
non-relational databases, also commonly known as NoSQL (Not Only SQL). This
huge number of unstructured and complex data sets, typically indicated with the term
Big Data, are characterized by a large volume, velocity, and variety, and cannot be
managed efficiently by using relational databases, due to their static structure. For this
reason, software developers have also begun to consider NoSQL data storage
solutions. In today’s context of Big Data, the developments in NoSQL databases have
achieved the right infrastructure which can very much be well-adapted to support the
heavy demands of Big Data.
NoSQL databases are extensively useful when they are needed to access and
analyze huge amounts of unstructured data or data that are stored remotely on
multiple virtual servers. A NoSQL database does not store information in the
traditional relational format. NoSQL databases are not built on tables and, in some
cases, they do not fully satisfy the properties of atomicity, consistency, isolation, and
durability (ACID). A feature that is common to almost all NoSQL databases is that
they handle individual items, which are identified by unique keys. Additionally, their
structures are flexible, in the sense that schemas are often relaxed or free schemas. A
classification that is based on different data models has been proposed in [6,8], it
groups NoSQL databases into four major families, each based on a different data
model: Key–value-stores databases (Redis, Riak, Amazon’s DynamoDB, and Project
Voldemort), column-oriented databases (HBase and Cassandra), document-based
databases (MongoDB, CouchDB, and the document-based MySQL), and graph
databases (Neo4j, OrientDB and Allegro Graph). From the several NoSQL databases
that we have today, this paper focuses on document-based model databases, choosing
two well-known NoSQL databases, MongoDB and document-based MySQL, and
analyzing their behavior in terms of the performance of CRUD operations.
To perform performance analysis, a server application has been developed and
presented in this paper. The application serves as a backend for streamlining the
activity of small service providers, using the two document-based MongoDB and
MySQL data-bases, with an emphasis on how to use query operations through which
the CRUD operations are performed and tested, the analysis being performed on the
response times of these for a data volume of up to 100,000 items.
The paper is organized as follows: The first section contains a short introduction
emphasizing the motivation of the paper, followed by Section 2, which gives a short
overview of the two databases features, followed by Section 3, which reviews the
related work. The structure of the databases, methods, and the testing architecture
used in this work is described in Section 4. The experimental results and their
analysis on the two databases in an application that uses large amounts of data are
presented in Section 5. Discussions and the analysis of the obtained results are made
in Section 6, followed by some conclusions in Section 7.
2. Overview of MongoDB and the Document-Based MySQL
MongoDB is the most popular type of NoSQL database, with a continuous and
secure rise in popularity since its launch. It is a cross-platform, open-source NoSQL
database that is document-based (which is written in C++), completely schema-free,
and manages JSON-style documents. Improvements to each version, and its flexible
structure, which can change quite often during its development, provides automatic
scaling with high performance and availability. The document-based MySQL is not so
popular yet, with MySQL providing a solution for non-relational databases only since
2018, starting with version 8.0, which has several similarities but also several
differences regarding the model approach to MongoDB, as shown in Table 1.
The structure of both databases is especially suitable for flexible applications
whose structure is not static from the beginning, and it is expected that there will be
many changes along the way. When it comes to large volumes of data—in the order
of millions, even if thousands of queries per second are allowed in any type of
database, the way in which they manage operations and the optimizations that come
with the package define their efficiency, both being optimized to operate upon a large
volume of data. However, in MongoDB, access is based on the roles defined for each
user, and in document-based MySQL, access is achieved by defining a username and
password, benefiting from all of the security features available in MySQL. Both
databases are available and free of charge, and can be used to develop individual or
small projects at no extra cost. In the case of large applications, monthly or annual
subscriptions appear for MongoDB, which involve a cost of several thousand dollars.
For document-based MySQL, this is not specified.
In terms of security, both databases provide security mechanisms.
Document-based MySQL is a relatively new database, but it benefits from all the
security mechanism features offered by MySQL: encryption, audit, authentication,
and firewalls; in addition, MongoDB adds role-based authentication, encryption, and
TLS/SSL configuration for clients.
3. Related Work
There are many studies that have been conducted to compare different relational
databases with NoSQL databases in terms of the implementation language, replication,
transactions, and scalability. The authors provide an overview of the different NoSQL
databases, in terms of the data model, query model, replication model, and
consistency model, without testing the CRUD operations performed upon them. In the
authors outlined the differences between the MySQL relational database and
MongoDB, a NoSQL database, through their integration in an online platform and
then through various operations being performed in parallel by many users. The
advantage of using the MongoDB database compared to relational MySQL was
highlighted by performed tests, concluding that the query times of the MongoDB
database were much lower than those of the relational MySQL.
The authors present in a comparative analysis between the NoSQL databases,
such as HBase, MongoDB, BigTable, and SimpleDB, and relational databases such as
MySQL, highlighting their limits in implementing a real application by performing
some tests on the databases, analyzing both simple and more complex queries. In the
Open Archival Information System (OAIS) was presented, which exploits the NoSQL
column-oriented Database (DB) Cassandra. As a result of the tests performed, they
noticed that in an undistributed infrastructure, Cassandra does not perform very well
compared to MySQL. Additionally, the authors propose a framework that aims at
analyzing semi-structured data applications using the MongoDB database. The
proposed framework focuses on the key aspects needed for semi-structured data
analytics in terms of data collection, data parsing, and data prediction. In the paper,
the authors focused mainly on comparing the execution speed of writes/insert and
update/read operations upon different benchmark workloads for seven NoSQL
database engines such as Redis, Memcached, Voldemort, OrientDB, MongoDB,
HBase, and Cassandra.
The Cassandra and MongoDB database systems were described, presenting a
comparative study of both systems by performing the tests on various workloads. The
study involved testing the operations—reading and writing, through progressive
increases in client numbers to perform the operations, in order to compare the two
solutions in terms of performance.
The authors performed a comparative analysis of the performance of three
non-relational databases, Redis, MongoDB, and Cassandra, by utilizing the YCSB
(Yahoo Cloud Service Benchmark) tool. The purpose of the analysis was to evaluate
the performance of the three non-relational databases when primarily performing
inserts, updates, scans, and reads using the YCSB tool by creating and running six
workloads. YCSB (Yahoo Cloud Service Benchmark Client) is a tool that is available
under an open-source license, and it allows for the benchmarking and comparison of
multiple systems by creating “workloads”.
An analysis of the state of the security of the most popular open-source databases,
representing both the relational and NoSQL databases, is described, and includes
MongoDB and MySQL. From a security point of view, both these databases need to
be properly configured so as to significantly reduce the risks of data exposure and
intrusion.
Between MongoDB and MySQL, several comparisons exist in the literature, most
of them focusing on a comparison with relational MySQL, and not with
document-based MySQL; for example, a login system project developed using Python
programming language was used to analyze the performance of MongoDB and
relational MySQL, based on the data-fetching speed from databases. This paper
performed an analysis of the two databases to decide which type of database was
more suitable for a login-based system. The paper presented presents information on
the upsides of the NoSQL databases over the relational databases during the
investigation of Big Data, by making a performance comparison of various queries
and commands in the MongoDB and relational MySQL. Additionally, the concepts of
NoSQL and the relational databases, together with their limitations. Consequently,
despite the fact that MongoDB has been approached in many scientific papers, to our
knowledge, at the time of writing this paper, no paper has focused directly on
comparing it with the document-based MySQL.
4. Method and Testing Architecture
For each database considered, an application was created in Java using IntelliJ
IDEA Community Edition (4 February 2020), which aims to develop a server for the
processing and storage of data on the frontend. When creating the testing architecture
setup, it was considered that it is very important to test the types of databases that
exactly fit the criteria that are imposed in an application that is similar to the one to be
developed, and not just by using their tools; such as for MongoDB, the MongoDB
web interface, or the Mongo shell, because there are differences, both in how to use
them and with regard to the response times, which if tested directly may seem easy
and fast, but in practice itself are found to be slower or more difficult to achieve.
The two applications are identical in terms of structure, with both containing the
objects that we need and a service class for each object, annotated with @Service. In
addition to these classes, each application contains a class within which there is a cron
(a process by which a method can be called automatically and repeatedly at a range
set by us, taking as a parameter a string that is composed of six digits separated by a
剩余24页未读,继续阅读
资源评论
Q2643365023
- 粉丝: 868
- 资源: 45
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功