Acit49673 2020 9208849
Acit49673 2020 9208849
Acit49673 2020 9208849
Abstract – The reliable distributed data storage system based RAID (Redundant Array of Independent Disks)
on the Redundant Residue Number System (RRNS) is developed. technology is one of the approaches used to improve the
The structure of the system, data splitting and recovery algorithms reliability and performance of storage systems. RAID storage
based on RRNS are developed. A study of the total time and time groups improve storage reliability and failure tolerance in
spent on converting ASCII-encoded data into a RRNS for files of comparison with single-drive storage systems and increase
various sizes is conducted. The research of data recovery time is input / output performance [7].
conducted for the inverse transformation from RRNS to ASCII
codes. A promising approach to improve the storage systems
reliability is the use of the correcting codes. The error
Keywords: data storage system, Redundant Residue correcting codes are widely used in wireless sensor networks,
Number System, correcting codes, cloud services. mobile and satellite systems [8].
I. INTRODUCTION In [9] the new storage media FEC model using locked
convolutional encoder with the enhanced NTC- Viterbi
The ubiquitous implementation of Internet of Things (IoT) decoder is proposed.
technologies in various fields caused rapid growth of data that
require secure and reliable storage. IoT is the basis for creating [10] proposes an approach that enables cloud service
smart healthcare systems, urban planning, smart farming, clients to use simultaneously different cloud service
smart homes and smart cities. The transition to Industry 4.0 providers for data storage. However only the customers have
and the development of cyber-physical systems are another the full control of their data, and in addition, if a provider
domains that will generate big data [1, 2]. suddenly disappears and/or it is not available anymore, the
customers will be able to continue accessing their data,
Smart city is one of the important sources that every day reconstructing them from data fragments replicated in other
generates big data from different applications like healthcare, Cloud storage providers. The authors showed how file size
energy-management, traffic-management, environment and redundancy affect the performance of the proposed
monitoring, video surveillance, public transport, public system for the next cloud services: Google Drive, Dropbox,
administration and etc. [3, 4]. and Copy. However, the influence of the number and value
Nowadays, the cloud storage services allow to store huge of modules on the storage system efficiency is not shown in
amount of data, including IoT data, into different providers: the paper.
Dropbox, OneDrive, Google Drive, Amazon S3, iCloud, The relevant problem of the use of Residue Number
SkyDrive, Mega, Box, IDrive, pCloud and others [5, 6]. System correcting codes in storage systems is to choose the
The advantages of a cloud storage are: high availability, optimal number of modules to provide the specified
scalability, internet access, simplicity. reliability with minimal redundancy.
Along with the benefits of cloud storage, there are some III. AIM OF THE WORK
unresolved issues, such as data privacy (fraudulent access to The aim of the work is to develop a secure distributed data
data by Cloud storage providers), long-term data storage storage system using cloud storage and correcting codes based
(termination of service), and cost increasing. Recently, the on RRNS, that enables data processing on own or leased
number of cyber-attacks on cloud storage providers has server (cloud services), as well as lost data recovering in the
increased in order to gain unauthorized access to files and / or case if one of the selected resources fails.
buybacks.
One of the key factors leading to data loss is the lack of IV. DATA STORAGE CHARACTERISTICS
reliability of storage system. Therefore, the development of Let us consider the fundamental characteristics of data
storage system that provides high data reliability and security storage system [6].
is an actual scientific task.
Scalability. It is necessary to provide for the possibility of
II. RELATED WORK deleting and adding new nodes without the need to stop
storage, while maintaining the uniformity of data distribution
Many scientific papers deal with the improvement of the and, as a consequence, the load on individual nodes.
reliability of data storage in Cloud services and data centers.
Leading companies like Google, Amazon, Microsoft, With the proposed secure distributed storage system, the
Dropbox, and others are looking for ways to address the issue scalability requirement can be addressed by increasing the
of secure data storage. amount of leased storage on each of the connected cloud
Authorized licensed use limited to: University College London. Downloaded on November 02,2020 at 10:10:01 UTC from IEEE Xplore. Restrictions apply.
storage facilities, and by replacing storage with less available RNS codes.
storage.
The last level of cloud storage evaluates available storage
Reliability. A replication mechanism is used to provide the space and forecasts the need for more storage.
necessary storage reliability, in such case any stored
information in the repository has a specified number of copies V. THE STRUCTURE AND ALGORITHM OF DATA
that can be located at a considerable distance from each other, DISTRIBUTION SYSTEM
so even if some equipment fails, the user will have access to The proposed distributed storage system is based on
the information. dividing data into pieces (fragments) using the RRNS
In the proposed system the requirement of reliability is transformation and storing the residues on different storage
implemented by the use of correcting codes that allows to devices or cloud services [13, 14].
recover lost or distorted data by introducing redundancy. The proposed approach improves the reliability and
Productivity. It should be possible to store big data with security of data storage due to the use of redundant RNS that
maintaining high performance due to uniformly data allows to recover distorted or lost data fragments. It is
distribution, that balances workloads, and metadata necessary to have the access to all fragments stored on
organizing, that allows to accelerate data search in a different cloud services to obtain access to data, that allows to
distributed system. improve data security.
In the proposed distributed storage system, the The use of distributed storage and virtualization
performance requirement is met by the use of multiple approaches allows to increase the amount of stored data, but it
network repositories at the same time. Data are evenly requires the implementation of specialized software and
distributed among defined resources, that is more hardware for direct organization and management of storage.
advantageous than storing all information on one resource and The secure distributed storage system structure is presented in
its replication. Fig. 2. The storage system works as follows: the user uploads
the file to the server via the Web interface. The file is split into
The storage structure consists of four levels (Fig. 1) [6]. chunks and the hash sum of each fragment is calculated on a
server using special software. The resulting file fragments
Productivity (residue-segments) are stored in the appropriate cloud service.
Web User Interface Multitenence The database stores the hash sum of the fragments (residue-
Scalability segments) and data for the reconstructing of original file.
Web Cloud
Access Protocol
Internet Productivity
User Server storage 1
Interface
Public / Private / Hybrid
Cloud
storage 2
Control
Data Algorithm Reliability
Security ……
…..
Storage efficiency Cloud
Cloud storage space
Cost storage n
The user interface middleware provides access to the Fig. 2 Structure of the distributed storage system based on RRNS and
cloud services
stored information and facilitates communication between
users and the computer. As shown on Fig. 1 the basic
The data splitting algorithm based on RRNS is shown in
characteristics of this level are:
Fig. 3.
- productivity - the efficiency of the resource usage;
The main steps that have to be executed are:
- multitenence - allows multiple users to work in a
Step 1. The file uploading and specifying the initial data
software environment at the same time;
for the program.
- scalability - flexible resource sharing among the users.
Step 2. The calculation of the resulting file length L.
Let's consider the communication level. The significant
In our research we consider a system with the four
differences between cloud storage and traditional storage are
modules, one of which is check modulo. The information
the access tools and protocols. System performance depends
range for 5 characters should be greater than 2.55255E + 14
on the type of network and the type of access.
because of the use of ASCII symbols. The following system
The next level (according to Fig.1). Data Algorithm of modules satisfies this requirement: m = [64937, 64951,
includes such functions as data addressing and fragmentation 64969, 64997]. The concatenated elements are stored in the
[11, 12]. The information is shared over accessible cloud array a[] and the variable RE specifies the number of elements
repositories, included in data bases and encoded with the that will be uploaded simultaneously.
797
Authorized licensed use limited to: University College London. Downloaded on November 02,2020 at 10:10:01 UTC from IEEE Xplore. Restrictions apply.
Fig. 4 Depicts data recovery algorithm
Step 5. Concatenation RE bytes and adding to the array Steps 4-6. Recovering of original file. If hash sums are
a[]. identical we can reconstruct file without any additional
processing (step 4), in another case it can be reconstructed by
Step 6. Calculation of residues. means of the Chinese Remainder Theorem (step 6).
Step 7. Calculation of the hash sum of the residue- Step 5. If two or more residue-segments are distorted,
segments. repeat downloading of such residue-segments to eliminate the
Hash sums and data necessary to reconstruction of the possibility of their damage during transmission. If recovering
original file are stored in the database. is successful, the final array is written to the reconstructed
original file.
The proposed approach allows to reduce the storage
consumption. VI. ESTIMATION OF THE DEVELOPED SYSTEM
EFFICIENCY
Let's consider an example, the original file size is 40bits
after the processing data with proposed algorithm we obtain In order to estimate developed system several experiments
64 bits, a traditional redundancy approach, where multiple were conducted with Microsoft Azure.
copies of the file are stored give the result 80 bits. The investigation of the time spent on the converting
Thus, on equal error tolerance, our approach reduces the ASCII-encoded data to RRNS (direct transformation) was
storage size of about a factor 1.25. conducted for the different file sizes: 1, 10, 20, 30, 40, 50
Mbytes (Fig.5).
File recovering is executed according to the algorithm is
shown in Figure 4.
798
Authorized licensed use limited to: University College London. Downloaded on November 02,2020 at 10:10:01 UTC from IEEE Xplore. Restrictions apply.
160
independent on file size.
As the file size grew to 50 MB time spent on the
140
converting RRNS to ASCII-encoded data have increased to
120 180 s.
100 The developed system provides high reliability of data
storage due to the data recovering possibility in case of failure
Time, s
100 https://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c/sw/ra
id/configuration/guide/RAID_GUIDE/IntroToRAID.html. [Accessed:
30-Jan- 2020].
50
[8] Hassan, K., Michael, K., & Mrutu, S. I. Forward error correction for
storage media: An overview. International Journal of Computer
0 Science and Information Security (IJCSIS), Vol. 13, No. 12, December
0.97 10.1 21.81 30.32 39.12 48.35 2015, рр. 32-40.
[9] Villari, M., Celesti, A., Fazio, M., & Puliafito, A. Evaluating a file
Volume, Mb
fragmentation system for multi-provider cloud storage. Scalable
Convert to RNS Total time Computing: Practice and Experience, 14(4), 2013, рр.265-277.
[10] Celesti, A., Fazio, M., Villari, M., & Puliafito, A. Adding long-term
availability, obfuscation, and encryption to multi-cloud storage
Fig. 6 Dependency between file size and processing time (inverse systems. Journal of Network and Computer Applications, 59, 2016, рр.
transformation) 208-218.
[11] A. Drozd, S. Antoshchuk, J. Drozd, K. Zashcholkin, M. Drozd, N.
As the result time spent on converting residue- segments Kuznietsov, M. Al-Dhabi, V. Nikul, “Checkable FPGA Design:
into ASCII - encoded data increases with the grows of file Energy Consumption, Throughput and Trustworthiness,” in book:
size. Green IT Engineering: Social, Business and Industrial Applications,
Studies in Systems, Decision and Control, V. Kharchenko, Y.
VII. CONCLUSIONS Kondratenko, J. Kacprzyk (Edits), Vol. 171. Berlin, Heidelberg:
Springer International Publishing, pp. 73-94, 2018.
The reliable distributed system for storing residue [12] O. Drozd, V. Kharchenko, A. Rucinski, T. Kochanski, R. Garbos, D.
segments into the Cloud based on RRNS is developed and Maevsky, “Development of Models in Resilient Computing,” Proc. of
investigated in the paper. The proposed approach improves 10th IEEE International Conference on Dependable Systems, Services
the reliability and enhances data security of storage systems and Technologies (DESSERT’2019), Leeds, UK, June 5-7 2019, pp. 2-
7. DOI: 10.1109/DESSERT.2019.8770035
due to the RRNS. The user able to reconstruct the original file
[13] Yatskiv V., Tsavolyk T., Yatskiv N. The Correcting Codes Formation
only if he/she has access to all residue segments stored on Method Based on the Residue Number System. Conference
different cloud services. Proceedings of 14 th International Conference the Experience of
Designing and Application of CAD Systems in Microelectronics
Experimental studies have demonstrated that increasing (CADSM-2017) 21-25 February 2017 Polyana-Svalyava, Ukraine,
the file size significantly affects the time spent on the 2017, pp. 237-240.
converting ASCII-encoded data to RRNS and the time spent
on operations unrelated to converting is 12-26 s and almost
799
Authorized licensed use limited to: University College London. Downloaded on November 02,2020 at 10:10:01 UTC from IEEE Xplore. Restrictions apply.