Acit49673 2020 9208849

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Protected Distributed Data Storage Based on

Residue Number System and Cloud Services


Vasyl Yatskiv Serhii Kulyna Nataliya Yatskiv Halyna Kulyna
Department of Cyber Security Department of Cyber Security Department for Information and Department of financial
Ternopil National Economic Ternopil National Economic Computing Systems and Control management and insurance
University University Ternopil National Economic Ternopil National Economic
Ternopil, Ukraine Ternopil, Ukraine University University
vy@tneu.edu.ua sersks@tneu.edu.ua Ternopil, Ukraine Ternopil, Ukraine
jatskiv@ukr.net gkulyna@gmail.com

Abstract – The reliable distributed data storage system based RAID (Redundant Array of Independent Disks)
on the Redundant Residue Number System (RRNS) is developed. technology is one of the approaches used to improve the
The structure of the system, data splitting and recovery algorithms reliability and performance of storage systems. RAID storage
based on RRNS are developed. A study of the total time and time groups improve storage reliability and failure tolerance in
spent on converting ASCII-encoded data into a RRNS for files of comparison with single-drive storage systems and increase
various sizes is conducted. The research of data recovery time is input / output performance [7].
conducted for the inverse transformation from RRNS to ASCII
codes. A promising approach to improve the storage systems
reliability is the use of the correcting codes. The error
Keywords: data storage system, Redundant Residue correcting codes are widely used in wireless sensor networks,
Number System, correcting codes, cloud services. mobile and satellite systems [8].

I. INTRODUCTION In [9] the new storage media FEC model using locked
convolutional encoder with the enhanced NTC- Viterbi
The ubiquitous implementation of Internet of Things (IoT) decoder is proposed.
technologies in various fields caused rapid growth of data that
require secure and reliable storage. IoT is the basis for creating [10] proposes an approach that enables cloud service
smart healthcare systems, urban planning, smart farming, clients to use simultaneously different cloud service
smart homes and smart cities. The transition to Industry 4.0 providers for data storage. However only the customers have
and the development of cyber-physical systems are another the full control of their data, and in addition, if a provider
domains that will generate big data [1, 2]. suddenly disappears and/or it is not available anymore, the
customers will be able to continue accessing their data,
Smart city is one of the important sources that every day reconstructing them from data fragments replicated in other
generates big data from different applications like healthcare, Cloud storage providers. The authors showed how file size
energy-management, traffic-management, environment and redundancy affect the performance of the proposed
monitoring, video surveillance, public transport, public system for the next cloud services: Google Drive, Dropbox,
administration and etc. [3, 4]. and Copy. However, the influence of the number and value
Nowadays, the cloud storage services allow to store huge of modules on the storage system efficiency is not shown in
amount of data, including IoT data, into different providers: the paper.
Dropbox, OneDrive, Google Drive, Amazon S3, iCloud, The relevant problem of the use of Residue Number
SkyDrive, Mega, Box, IDrive, pCloud and others [5, 6]. System correcting codes in storage systems is to choose the
The advantages of a cloud storage are: high availability, optimal number of modules to provide the specified
scalability, internet access, simplicity. reliability with minimal redundancy.
Along with the benefits of cloud storage, there are some III. AIM OF THE WORK
unresolved issues, such as data privacy (fraudulent access to The aim of the work is to develop a secure distributed data
data by Cloud storage providers), long-term data storage storage system using cloud storage and correcting codes based
(termination of service), and cost increasing. Recently, the on RRNS, that enables data processing on own or leased
number of cyber-attacks on cloud storage providers has server (cloud services), as well as lost data recovering in the
increased in order to gain unauthorized access to files and / or case if one of the selected resources fails.
buybacks.
One of the key factors leading to data loss is the lack of IV. DATA STORAGE CHARACTERISTICS
reliability of storage system. Therefore, the development of Let us consider the fundamental characteristics of data
storage system that provides high data reliability and security storage system [6].
is an actual scientific task.
Scalability. It is necessary to provide for the possibility of
II. RELATED WORK deleting and adding new nodes without the need to stop
storage, while maintaining the uniformity of data distribution
Many scientific papers deal with the improvement of the and, as a consequence, the load on individual nodes.
reliability of data storage in Cloud services and data centers.
Leading companies like Google, Amazon, Microsoft, With the proposed secure distributed storage system, the
Dropbox, and others are looking for ways to address the issue scalability requirement can be addressed by increasing the
of secure data storage. amount of leased storage on each of the connected cloud

978-1-7281-6760-2/20/$31.00 ©2020 IEEE 796

Authorized licensed use limited to: University College London. Downloaded on November 02,2020 at 10:10:01 UTC from IEEE Xplore. Restrictions apply.
storage facilities, and by replacing storage with less available RNS codes.
storage.
The last level of cloud storage evaluates available storage
Reliability. A replication mechanism is used to provide the space and forecasts the need for more storage.
necessary storage reliability, in such case any stored
information in the repository has a specified number of copies V. THE STRUCTURE AND ALGORITHM OF DATA
that can be located at a considerable distance from each other, DISTRIBUTION SYSTEM
so even if some equipment fails, the user will have access to The proposed distributed storage system is based on
the information. dividing data into pieces (fragments) using the RRNS
In the proposed system the requirement of reliability is transformation and storing the residues on different storage
implemented by the use of correcting codes that allows to devices or cloud services [13, 14].
recover lost or distorted data by introducing redundancy. The proposed approach improves the reliability and
Productivity. It should be possible to store big data with security of data storage due to the use of redundant RNS that
maintaining high performance due to uniformly data allows to recover distorted or lost data fragments. It is
distribution, that balances workloads, and metadata necessary to have the access to all fragments stored on
organizing, that allows to accelerate data search in a different cloud services to obtain access to data, that allows to
distributed system. improve data security.

In the proposed distributed storage system, the The use of distributed storage and virtualization
performance requirement is met by the use of multiple approaches allows to increase the amount of stored data, but it
network repositories at the same time. Data are evenly requires the implementation of specialized software and
distributed among defined resources, that is more hardware for direct organization and management of storage.
advantageous than storing all information on one resource and The secure distributed storage system structure is presented in
its replication. Fig. 2. The storage system works as follows: the user uploads
the file to the server via the Web interface. The file is split into
The storage structure consists of four levels (Fig. 1) [6]. chunks and the hash sum of each fragment is calculated on a
server using special software. The resulting file fragments
Productivity (residue-segments) are stored in the appropriate cloud service.
Web User Interface Multitenence The database stores the hash sum of the fragments (residue-
Scalability segments) and data for the reconstructing of original file.

Web Cloud
Access Protocol
Internet Productivity
User Server storage 1
Interface
Public / Private / Hybrid

Cloud
storage 2
Control
Data Algorithm Reliability
Security ……
…..
Storage efficiency Cloud
Cloud storage space
Cost storage n

Fig. 1 Data storage system architecture Database

The user interface middleware provides access to the Fig. 2 Structure of the distributed storage system based on RRNS and
cloud services
stored information and facilitates communication between
users and the computer. As shown on Fig. 1 the basic
The data splitting algorithm based on RRNS is shown in
characteristics of this level are:
Fig. 3.
- productivity - the efficiency of the resource usage;
The main steps that have to be executed are:
- multitenence - allows multiple users to work in a
Step 1. The file uploading and specifying the initial data
software environment at the same time;
for the program.
- scalability - flexible resource sharing among the users.
Step 2. The calculation of the resulting file length L.
Let's consider the communication level. The significant
In our research we consider a system with the four
differences between cloud storage and traditional storage are
modules, one of which is check modulo. The information
the access tools and protocols. System performance depends
range for 5 characters should be greater than 2.55255E + 14
on the type of network and the type of access.
because of the use of ASCII symbols. The following system
The next level (according to Fig.1). Data Algorithm of modules satisfies this requirement: m = [64937, 64951,
includes such functions as data addressing and fragmentation 64969, 64997]. The concatenated elements are stored in the
[11, 12]. The information is shared over accessible cloud array a[] and the variable RE specifies the number of elements
repositories, included in data bases and encoded with the that will be uploaded simultaneously.

797

Authorized licensed use limited to: University College London. Downloaded on November 02,2020 at 10:10:01 UTC from IEEE Xplore. Restrictions apply.
Fig. 4 Depicts data recovery algorithm

Step 1. Downloading of residue-segments.


Fig. 3 File splitting algorithm Step 2. Calculation of the hash sum of each residue-
segment.
Step 3.The concatenation starts if the file contains more
than 5 bytes in another case it is necessary to calculate how Step 3. Comparison of the calculated hash with
many zeros we have to add to obtain 5 bytes (step 4). correspondent hash stored in the local database.

Step 5. Concatenation RE bytes and adding to the array Steps 4-6. Recovering of original file. If hash sums are
a[]. identical we can reconstruct file without any additional
processing (step 4), in another case it can be reconstructed by
Step 6. Calculation of residues. means of the Chinese Remainder Theorem (step 6).
Step 7. Calculation of the hash sum of the residue- Step 5. If two or more residue-segments are distorted,
segments. repeat downloading of such residue-segments to eliminate the
Hash sums and data necessary to reconstruction of the possibility of their damage during transmission. If recovering
original file are stored in the database. is successful, the final array is written to the reconstructed
original file.
The proposed approach allows to reduce the storage
consumption. VI. ESTIMATION OF THE DEVELOPED SYSTEM
EFFICIENCY
Let's consider an example, the original file size is 40bits
after the processing data with proposed algorithm we obtain In order to estimate developed system several experiments
64 bits, a traditional redundancy approach, where multiple were conducted with Microsoft Azure.
copies of the file are stored give the result 80 bits. The investigation of the time spent on the converting
Thus, on equal error tolerance, our approach reduces the ASCII-encoded data to RRNS (direct transformation) was
storage size of about a factor 1.25. conducted for the different file sizes: 1, 10, 20, 30, 40, 50
Mbytes (Fig.5).
File recovering is executed according to the algorithm is
shown in Figure 4.

798

Authorized licensed use limited to: University College London. Downloaded on November 02,2020 at 10:10:01 UTC from IEEE Xplore. Restrictions apply.
160
independent on file size.
As the file size grew to 50 MB time spent on the
140
converting RRNS to ASCII-encoded data have increased to
120 180 s.
100 The developed system provides high reliability of data
storage due to the data recovering possibility in case of failure
Time, s

80 of one of the cloud services: all residue segments are stored on


different cloud services, as well as high protecting data from
60
unauthorized access since residue-segments calculated by one
40 modulo are stored on one cloud service, that makes it
impossible to reconstruction the whole file by service provider
20 or other persons who have obtained unauthorized access to the
0
cloud service.
0.97 10.1 21.81 30.32 39.12 48.35 REFERENCES
Volume, Mb
[1] Chamoso, P., González-Briones, A., Rodríguez, S., & Corchado, J. M.
Convert to RNS Total time Tendencies of technologies and platforms in smart cities: A state-of-
the-art review. Wireless Communications and Mobile Computing,
Fig. 5 Dependency between file size and processing time (direct 2018, pp. 1-17.)
transformation) [2] Ansari, S., Aslam, T., Poncela, J., Otero, P., & Ansari, A. Internet of
Things-Based Healthcare Applications. In IoT Architectures, Models,
Figure 5 summarizes the time spent on operations which and Platforms for Smart City Applications, IGI Global, 2020 pp. 1-28.
are not associated with RNS transformation it is 12-26 s and [3] Salam, A., Shah, S. Internet of things in smart agriculture: Enabling
almost independent of the file size. technologies. In 2019 IEEE 5th World Forum on Internet of Things
(WF-IoT), pp. 692-695.
The investigations of the total processing time and the time [4] González-Briones, A., Chamoso, P., Casado-Vara, R., Rivas, A.,
spent on the converting RRNS to ASCII-encoded data Omatu, S., & Corchado, J. M. Internet of things platform to encourage
(inverse transformation) were conducted for the different file recycling in a smart city. 2019, pp. 1-10.
sizes (Fig.6). [5] Amazon Simple Storage Service (Amazon S3). [Online].
http://aws.amazon.com/s3/. [Accessed: 30-Jan- 2020]
[6] Jones M. T. “Anatomy of a cloud storage infrastructure,” IBM
200
developer works (November 30, 2010), 2010. [Online]. Available:
https://www.ibm.com/developerworks/cloud/library/cl-cloudstorage/.
150 [Accessed: 30-Jan- 2020].
[7] Cisco UCS Servers RAID Guide. [Online]
Time, s

100 https://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c/sw/ra
id/configuration/guide/RAID_GUIDE/IntroToRAID.html. [Accessed:
30-Jan- 2020].
50
[8] Hassan, K., Michael, K., & Mrutu, S. I. Forward error correction for
storage media: An overview. International Journal of Computer
0 Science and Information Security (IJCSIS), Vol. 13, No. 12, December
0.97 10.1 21.81 30.32 39.12 48.35 2015, рр. 32-40.
[9] Villari, M., Celesti, A., Fazio, M., & Puliafito, A. Evaluating a file
Volume, Mb
fragmentation system for multi-provider cloud storage. Scalable
Convert to RNS Total time Computing: Practice and Experience, 14(4), 2013, рр.265-277.
[10] Celesti, A., Fazio, M., Villari, M., & Puliafito, A. Adding long-term
availability, obfuscation, and encryption to multi-cloud storage
Fig. 6 Dependency between file size and processing time (inverse systems. Journal of Network and Computer Applications, 59, 2016, рр.
transformation) 208-218.
[11] A. Drozd, S. Antoshchuk, J. Drozd, K. Zashcholkin, M. Drozd, N.
As the result time spent on converting residue- segments Kuznietsov, M. Al-Dhabi, V. Nikul, “Checkable FPGA Design:
into ASCII - encoded data increases with the grows of file Energy Consumption, Throughput and Trustworthiness,” in book:
size. Green IT Engineering: Social, Business and Industrial Applications,
Studies in Systems, Decision and Control, V. Kharchenko, Y.
VII. CONCLUSIONS Kondratenko, J. Kacprzyk (Edits), Vol. 171. Berlin, Heidelberg:
Springer International Publishing, pp. 73-94, 2018.
The reliable distributed system for storing residue [12] O. Drozd, V. Kharchenko, A. Rucinski, T. Kochanski, R. Garbos, D.
segments into the Cloud based on RRNS is developed and Maevsky, “Development of Models in Resilient Computing,” Proc. of
investigated in the paper. The proposed approach improves 10th IEEE International Conference on Dependable Systems, Services
the reliability and enhances data security of storage systems and Technologies (DESSERT’2019), Leeds, UK, June 5-7 2019, pp. 2-
7. DOI: 10.1109/DESSERT.2019.8770035
due to the RRNS. The user able to reconstruct the original file
[13] Yatskiv V., Tsavolyk T., Yatskiv N. The Correcting Codes Formation
only if he/she has access to all residue segments stored on Method Based on the Residue Number System. Conference
different cloud services. Proceedings of 14 th International Conference the Experience of
Designing and Application of CAD Systems in Microelectronics
Experimental studies have demonstrated that increasing (CADSM-2017) 21-25 February 2017 Polyana-Svalyava, Ukraine,
the file size significantly affects the time spent on the 2017, pp. 237-240.
converting ASCII-encoded data to RRNS and the time spent
on operations unrelated to converting is 12-26 s and almost

799

Authorized licensed use limited to: University College London. Downloaded on November 02,2020 at 10:10:01 UTC from IEEE Xplore. Restrictions apply.

You might also like