A_low_cost_and_fast_controller_architecture_for_mu
A_low_cost_and_fast_controller_architecture_for_mu
A_low_cost_and_fast_controller_architecture_for_mu
DOI 10.1186/s13639-016-0060-8
EURASIP Journal on
Embedded Systems
Abstract
Real-time multimedia data access plays an important role in electronic systems; as time goes by, with decrease in
data processing speed and increase in communication time, storage time, and retrieval time, the overall response
time increases for real-time applications. Therefore, in this paper, a novel real-time, fast, low-cost, system-on-chip
(SoC) controller has been proposed and implemented where large volume of data can be efficiently stored and
retrieved from flash memory cards. It is being implemented only using hardware description language (HDL) on a field
programmable gate array (FPGA) chip without using any other on-board or external hardware resources or high-level
languages. The entire controller architecture, in a single chip, contains five different modules and is designed using
finite state machine (FSM)-based approach. The modules are card initialization module (CINM), idle module (IM), card
read module (CRM), card write module (CWM), and decision module (DM). The architecture is completely synthesized
for Spartan 3E xc3s500e-4-fg320 FPGA with only 5% of the total logic utilization. The experimental results tested for
microSD, SD, and SDHC cards of different size, and these show that the architecture uses less hardware and clock
cycles for card initialization and single/multiblock read/write procedure.
Keywords: Flash memory read/write, Secure Digital High Capacity (SDHC) card, MicroSD card, Serial peripheral
interface (SPI), Finite state machine (FSM), Very high speed integrated circuit hardware description language (VHDL),
Field programmable gate array (FPGA)
© The Author(s). 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made.
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 2 of 26
Input Output
Flash Memory
Controller
Data SCLK
From Data To
PISO SIPO
External External
MISO CS MOSI
World World
Vcc Flash Based
GND Memory Device
Buffer Buffer
Power Lines
Fig. 1 Thematic model of a data archival system
removed from the system where it is presently housed in The flash based cards like microSD, SD or SDHC
and accessed by some other system to retrieve the desired cards work in two different bus modes. They are the
signal. Secure Digital (SD) bus mode and the Serial Periph-
The flash memory is very much useful in fields where eral Interface (SPI) bus mode. The SPI mode is a
data transportation and archival is a key requirement. synchronous serial protocol with less complexity. It is
The memory can be used as a data concentrator (Fig. 2) extremely popular for interfacing the peripheral devices
where the proposed controller architecture along with the and no native-host interface is needed for this mode.
flash memory can be used as the removable memory of For its simplicity and usefulness in the low cost embed-
any data concentrator network device. The flash mem- ded system application, we have considered designing an
ory has extensive application in database [4], networking entirely on-board hardware-based controller for smooth
[5], biomedical application [6, 7], virtualized storage sys- realization of SPI bus mode-based data transfer pro-
tem [8], cloud computing [9], geographic remote sensing tocol to communicate with the flash-based memory
[10], mobile devices [11–13], etc. It can be used in a card.
router to store the routing table for further access. NAND- To date, we find that limited research work describes
based flash system is also widely employed as cache in the design of data archival system and subsequent imple-
virtualized system [8]. In such application scenarios, the mentation using HDL [1]. In some research work, SPI
efficiency of single and multiblock data transfer is very mode-based data communication system has been imple-
important which consequently affects the input-output mented. Another work was proposed in the literature
operations per seconds (IOPS) measure of storage system. where the SDHC card had been used in SD mode (i.e.,
Generally, we wish to maximize this metric with respect bulk data transfer mode) for video signal storage and
to different types of flash as this indicates the measure processing [14]. With the newly emerging technologies,
of flash utilization. As stated, flash-based system can be flash-based memory devices have been used as the effi-
used as a cache or a data concentrator or to cater any cient storage unit and till now it accomplishes the need of
such storage requirements. However, here in the paper, memory storage even on the modern era of technological
we do not analyze the pros and cons of such utilization of advancement [2, 3, 15–23].
flash in detail as the work mainly concentrates on efficient In light of the above, this paper proposes a novel,
implementation of a controller for single or multiblock real-time, low-cost, system-on-chip application specific
data transfer with respect to flash memory. The imple- controller for multimedia data storage and retrieval to
mentation may be exploited in any kind of flash resource flash-based memory cards. The architecture of this real-
utilization and will ultimately contribute in the calculation time controller has been designed using FSM-based
of the metric of memory resource utilization. Therefore, it approach. The HDL used here is very high speed inte-
is observed that the implementation of an efficient flash- grated circuit hardware description language (VHDL).
based data transfer is a fundamental driving factor in the Also, the design is such that there is no use of any on-
improvement of flash resource utilization and the paper is chip general purpose processor (GPP), external controller,
focusing on that rudimentary aspect. hardware resources, or any high-level languages during
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 3 of 26
the operation. The prototype has been entirely imple- 1. To design a modular low-cost, system-on-chip,
mented in target FPGA board. The physical attribute of application-specific controller for multimedia data
an FPGA chip, being compact in size and low in power storage and retrieval to flash memory cards like SD,
consumption, makes it an ideal platform for the imple- SDHC, and microSD card in SPI mode with less
mentation. Also, we have tried to exploit the parallel overhead.
processing capability of FPGA during design and imple- 2. The design is completely FSM based and the
mentation of various modules. Till date to the authors’ controller has been primarily realized using five
knowledge, optimal-in-hardware implementation of such different modules and a control unit. This five
an application-specific controller and the study of its different modules, along with the control unit, are
various modules were not explored in details by ear- the different functional areas of the proposed system,
lier research. In this work, the proposed controller has which is implemented completely in a single chip.
been examined for both the audio and video data stor- 3. When the card is in idle state, the system has an
age and retrieval separately. Also to test the importance option of working in power saving mode.
of the work with respect to practical workloads [24], 4. The controller will work in real time, in modular
we have collected the dataset from MSR Cambridge fashion and the implementation is on a single FPGA.
Traces [25], SNIA Iotta Repository [26], and UMass The proposed design tries to utilize the parallel
Trace Repository [27] and tried to establish the impor- processing capability of FPGA. No other external
tance of the controller with respect to flash read-write devices or on-card intelligent controller has been
procedure. used for this implementation. Here, the prototype
Again in nutshell, the objectives of this paper are as has been completely synthesized and tested for
follows: Spartan 3E xc3s500e-4-fg320 FPGA.
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 4 of 26
5. The architecture has been designed using HDL only. Table 1 Existing papers on SD card controller design
Here in this paper, we have considered VHDL for Work Controller design Platform model
implementing the controller. No high-level languages [1] Data archival to SD card Altera Cyclone II
were used for this design. This FSM-based design using HDL
using VHDL is one of the basic feature of this [14] Design of SDHC card video NIOS II CPU with IDCT
proposed controller, which makes it faster than any player on SoPC hardware acceleration IP
core
other controller.
6. It is completely a prototype design of the proposed [15] Simultaneous Altera Cyclone II
multi-channel data
controller; the design can be implemented in any acquisition system
other platform instead of Spartan 3E target device [16] NAND flash memory Freescale DSP 56858
(even using ASIC also) with a very minor controller for SD/MMC platform with UMC 0.18
modification in configuration part of the card μm CMOS process
implementation. There will be no change in the [17] Portable analog data WOLFSON WM8731 ADC,
design phase. capture using custom NIOS-II processor
processing
The rest of the paper is organized as follows. Section 2 [18] A high efficient flash TWCNP-OS
storage system for
introduces the related works and highlights the novelty two-way cable modem
of the proposed approach. Section 3 describes the pro-
Proposed FSM-based Xilinx Spartan 3E
posed FPGA-based controller, its architecture, execution application-specific xc3s500e-4-fg320
process, and overall operation of the controller. Section 4 controller using HDL
presents the hardware-specific implementation and syn-
thesis details for the target Xilinx Spartan-3E (xc3s500e-4- embedded C. The use of the high-level language in this
fg320) FPGA development platform. Experimental results paper makes the system slower with additional overhead.
are described in Section 5 and Section 6 concludes the Lin and Dung [16] proposed a novel NAND flash mem-
paper. ory controller for SD/Multimedia Card (MMC). They
have designed Bose-Chaudhuri-Hocquenghan (BCH)
2 Review of the related works error correction code (ECC) [28] for correcting the ran-
FPGAs have been used for prototype design in a range dom bit errors of the flash memory chip. The UMC
of engineering application [1, 14–19]; however, till date 0.18 μm CMOS process was used to implement the pro-
to the authors’ knowledge, the design of a complete posed memory controller chip. This proposed controller
application-specific controller for different flash memory was verified for MMC only.
card access with detailed description of the modules and Elkeelany and Vince [17] proposed a portable analog
their operation is limited. Table 1 depicts some of the data capture system using custom processor. The SD card
earlier work in this domain. had been used in 1-bit SD mode for their proposed system.
Elkeelany et al. [1] proposed an FPGA-based data The SPI mode or 4-bit SD mode-based communication
archival system to SD card, using Verilog HDL and they were not discussed in their design.
accessed the card in SPI mode. Scalability issues have not The works, summarized in Table 1, have established
been achieved in this design. They have partially applied the concept of FPGA-based implementation for the SD
FSM based approach and the implementation issues of card data archival system either in SD mode or SPI
different SD cards have not been discussed explicitly in mode. Some researchers [15–18] have taken help of high-
this paper. level languages or external controllers, on-board proces-
Yang et al. [14] presented the SDHC card video player sors, and other resources apart from only FPGA logic
based on SoPC technology. The IP core and two display resources during implementation of data access mech-
buffer SRAMs were alternately utilized for their proposed anism. In some paper [14, 15, 23, 29, 30], partially,
design. They have accessed the SDHC card in SD mode FSM-based approaches have been used for realization
for bulk data transfer. The proposed design has been of data transfer mechanism. The single/multiple blocks
implemented using high-level language. read/write procedures were designed using FSM, and they
In another work, Abdallah and Elkeelany [15] pro- have been implemented those procedures using HDL for
posed a FPGA-based simultaneous multi-channel data target device. Also, the BCH code for NAND flash mem-
acquisition system and they had verified the proposed ory has been optimized in previous work [31] and the
architecture for analog signals. The design includes data-intensive application using FPGA has been per-
analog-to-digital converter to convert the analog signal to formed in earlier researches [32, 33].
digital data. The time-critical tasks were implemented in Flash-based storage system has other advantages also.
hardware, while the other tasks were implemented using Many a time, it has been observed that the efficient data
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 5 of 26
communication of flash-based system plays a significant Interface) mode. Table 2 shows the pin configuration of
role in the improvement of flash resource utilization in the SD/SDHC card and Fig. 3 shows the thematic repre-
many of the systems. Flash resource can be utilized as a sentation of electrical interface of the card (slave) in SPI
cache-based storage system or can be integrated with hard mode with the FPGA board (Master).
disk drive (HDD) and can act as a hybrid system. Flash The SD/SDHC card communication protocol in SPI
resources are also utilized in virtualized storage system [8] mode is entirely a command-dependent protocol and
where efficient managers are designed to get more high the card responds to every command with a pre-defined
cost effectiveness than normal caching algorithm. Flash- response pattern. In the way of initialization, first the card
based multi-tiered systems are also studied presently in is initiated with CMD0 command. Then, the controller
the literature. Some of them are multi-tier SSD-based validates the voltage range by generating the CMD8 com-
solution [34], a hyper-visor-based design [35] etc. Most mand. It also identifies the version of the card (version
of the works, however, emphasize on the improvement of 2 (SDHC) card or some other cards). Subsequently, the
caching policies with respect to standard existing caching controller generates the application-specific commands
algorithm like LRU and that analysis is out of scope of such as (CMD55 + ACMD41) to complete the initializa-
this paper. Here we primarily analyzed on multimedia data tion process. The controller will continuously generate
storage and retrieval to flash-based storage system and (CMD55 + ACMD41) command until the card initializes
this in turn has profound effect on the improvement of itself by giving a “00000000” response. The SDHC card
flash-based resource implementation. supports two types of addressing mode. They are block
In our work, we aim to design an application- addressing mode and byte addressing mode. The CMD58
specific controller for efficient multimedia data commu- command identifies the addressing mode of the version 2
nication with flash-based cards in SPI mode and the SDHC card. Also, CMD16 command is issued to fix the
controller architecture was entirely designed using FSM- data block length to 512 bytes. After initialization process,
based approach. There are mainly five states present in the card goes to the idle state until the next command is
the proposed FSM and the states are named as initial- being generated for single/multiblock read/write.
ization state, idle state, card-read state, card-write state, The speed class of the card denotes minimum writing
and decision-making state. During the realization of the performance of the card to record a video normally [36].
controller architecture, these states are mapped into the Various speed classes defined by SD Association are 2,
modules of the controller. Now some of these modules 4, 6, and 10. Throughout this work, we have used the
are used to accomplish card read/write procedures and SDHC and SD card with speed classes 4 and 2, respec-
therefore internal architecture of those modules are again tively, which means that the SDHC and SD card, used in
implemented based on FSM format for the realization of this purpose, supports minimum 4 and 2 MB/s writing
above procedures. Note that the proposed architecture speed, respectively, for video recording.
and implementation aims to minimize both the clock uti-
lization and on-board resource utilization of the FPGA 3.2 MicroSD card
board. Also in this work, we have considered the required The microSD card communication is much similar to the
clock cycles, workload, response time, etc. as performance SD card communication. The difference between these
metric to compare the effectiveness of the proposed two monsters in the present age data storage medium is
approach with respect to other existing papers. However, in their pin configuration. The microSD communication
only in the initialization phase, we have represented the is based on the 8-pin interface where all the pins from the
performance with respect to “time” metric to compare the SD card are present except the second ground (Vss2) pin.
achieved results with the reported values in the literature.
Table 2 SD/SDHC card pin details
3 Proposed FPGA-based controller Pin No. Name Function in SD mode Function in SPI mode
This section initially describes the basic characteristics of ¯ ¯
1 DAT3/(CS) Data line 3 Chip select/slave select (SS)
the SD/SDHC card and microSD card and then introduces
2 CMD/DI Command line MOSI
the proposed controller in rest of the section.
3 Vss1 Ground Ground
3.1 High capacity SD card 4 VDD Supply voltage Supply voltage
The SD/SDHC card communication is based on the 5 Clock Clock Clock (SCLK)
advanced nine pin interface, i.e., Clock, Command 6 Vss2 Ground Ground
line/Master Out Slave In (MOSI), 4xData lines/ Master
7 DAT0/DO Data line 0 MISO
In Slave Out (MISO), and 3xPower lines. The card sup-
8 DAT1/IRQ Data line 1 Unused/IRQ
ports three communication protocols [21]. They are SD
1-bit mode, SD 4-bit mode, and SPI (Serial Peripheral 9 Dat2/NC Data line 2 Unused
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 6 of 26
NC NC
MASTER SLAVE
MASTER SLAVE
9 DAT2/NC DAT2/NC
SS 1 DAT3/CS SS DAT3/CS
MOSI 2 CMD/DI MOSI CMD/DI
GND 3 VSS1 VDD VDD
Vdd 4 Vdd SCLK CLK/SCLK
SCLK 5 CLK/SCLK GND GND
6 VSS2 MISO DAT0/DO
MISO 7 DAT0/DO
DAT1/IRQ
8 DAT1/IRQ
NC
FPGA CHIP NC SDHC CARD
FPGA CHIP MICROSD CARD
Fig. 3 SDHC card electrical interface Fig. 4 MicroSD card electrical interface with FPGA board
As we observe from the above mentioned flow sequence and CWM of the controller with the external card based
and the schematic of the internal architecture, the pro- on the SELECT bus. The output bus from the multiplexer
posed controller is divided into five different modules. communicates with the card. Only in the data bus from
They are card initialization module (CINM), idle mod- the IM, the clock signal remains unconnected to realize
ule (IM), card read module (CRM), card write module the power saving mode of the controller. Therefore as a
(CWM), and decision module (DM). Along with the above whole, the designed multiplexer has 2 bit SELECT bus (S1
modules, a control unit (CU) is there to monitor and and S0), 4 input bus lines (4×4 = 16 lines) and one output
control the activities of each module and the flow of bus line (1 × 4 = 4 lines). SELECT bus connected with the
respective driving signals. The CU operates based on the multiplexer in sequence helps to communicate the indi-
FSM shown in Fig. 6. vidual modules with the card, and CU controls the entire
Each of the modules and CU contains several internal selection process. The module selection activities of the
and external data and control lines. The communication SELECT bus is described in Table 4.
with the external world is done by the controller either BUSY and ACK are the two status signals present in the
using I/O interfacing units or a customized multiplexer. controller and they are connected with CU. The BUSY sig-
The Reset, DTM (Data Transfer Mode), R/W̄ , DATA, nal represents the busy state of the controller and ACK
ACK, and BUSY signals are interfaced with the controller signal acknowledges any assigned work accomplished by
via I/O interfacing unit. The Reset signal, connected with the controller. On completion, the module deactivates the
CU, initiates the data storage or retrieval operation. DTM BUSY signal and activates the ACK signal to intimate the
signal selects the single/multiblock data transfer mode of user that the task has been completed successfully. Fail-
the controller, R/W̄ is used for read/write operation selec- ure to complete any assigned task makes both the BUSY
tion, and a 8-bit bidirectional DATA bus is used for com- and ACK signal de-asserted. The activity of the signals is
munication with external world. Also, other signals like tabulated in Table 5.
Clock, Chip Select (CS), MISO, and MOSI signals are con- The CU also internally communicates with every mod-
nected between SDHC card and the controller through a ule in sequence for efficient data transfer with the card.
(4×1) bi-directional customized multiplexer where each The common signals for all the modules are Reset, CS, and
input line of the multiplexer is a 4 bit width data bus. clock signal. The CS and clock signals are also supplied
¯ MISO,
Each data bus consists of Clock, Chip Select (CS), to the card via multiplexer. Once Reset signal is received
and MOSI signals. These four input buses of the multi- by CU, it issues a START signal to the CINM along with
plexer connect the four modules, say, CINM, IM, CRM, the clock signal. CU also issues START and clock signal
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 8 of 26
for the other modules when they are about to initiate their After successful initialization of the card, the control
action. In idle state, no clock is received by the modules transfers to IM with the START signal. The IM contin-
and thus they work in power saving mode. Once a module uously monitors the R/W̄ and DTM signal. R/W̄ signal
receives a START signal, it acknowledges so by issuing a specifies whether the next operation will be card read or
READY signal to the CU and starts working. After every card write. The DTM signal will intimate the IM regarding
successful completion of the work, the module intimates the single/multiblock data transfer. Whenever it receives
CU with DONE signal. the R/W̄ signal, it passes both the R/W̄ and DTM value
CU starts working with the card initialization module. to CU and goes to the idle state by sending a DONE
On receiving the Reset signal, CU activates the CINM and signal. Depending on the R/W̄ signal, the CU transfers the
it issues the initialization commands to the card in order control either to CRM or to CWM.
to initialize it in SPI mode. The card responds to every The CU generates different command sequences for
command and on completion of initialization procedure, either of these modules. In card read sequence, the CRM
the module receives the final response from the card. reads the data block from the card along with the CRC
bits and publish it to the I/O interfacing unit. In card write
Table 4 Table for SELECT bus activity
S1 S0 Selected module Table 5 Table for the status signal activity
0 0 CINM BUSY ACK Controller status
0 1 IM 0 0 Time out, controller failed to read/write
1 0 CRM 0 1 Task successfully accomplished
1 1 CWM 1 X Controller is busy
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 9 of 26
Fig. 11 Psudo-code for card write process Fig. 13 Psudo-code for decision process
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 11 of 26
Fig. 15 Flow chart for read and write operation of SD/SDHC card
the performance of the controller. Subsequently, different Time taken per process (TTPP): This is the time taken
synthetic and practical workloads were described as case by the individual process to accomplish the task assigned
studies of the paper. successfully. We have used the conventional units of time
to define the TTPP.
5.1 Performance metrics Input/output operations per seconds (IOPS): IOPS is a
In this paper, we have used following metrices to evalu- measurement process used to characterize the storage
ate the performance of the controller. They are defined as devices like flash storage memory. The IOPS is not defined
follows. independently and it is a combination of three metrics.
Clock cycle taken per process (CCTPP): In order to make Along with IOPS other two metrics, say response time
the performance evaluation of the proposed controller and workload metrics are also defined to characterize the
independent of system clock and other system specific performance of the memory module.
parameters, we have described the performance of the
controller in terms of clock cycles. Here, CCTPP describes
the clock cycle taken by an individual process to suc- 5.2 Simulation details
cessfully complete a process. If we navigate the proposed The proposed architecture was first simulated in the Xil-
design into multiple systems, then CCTPP parameter will inx ISE 14.1 virtual environment for initial verification.
be system independent. In this phase, Fig. 16 gives the waveform representa-
tion of the initialization process completely. Figure 17
Table 6 Resource utilization during implementation represents the timing diagram representation of the sin-
Logic blocks Number of Number of Utilization (%) gle block write, and Fig. 18 gives the timing diagram
logics used logics available representation of the multiblock data write operation.
Logic slices 226 4656 5 The simulation results are only feasible for data write
Slice flip/flops 167 9312 2 operation and the implementation section represents the
entire results for initialization and both the read/write
Slice LUTs 421 9312 6
operation. Table 7 elaborates the different abbreviation
Clock buffer 1 24 4
of the input and output ports used to simulate the
Number of bonded IOBs 15 232 6 design.
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 13 of 26
the state of the art computing and also to solve the prac- data transfer procedure. The SanDisk 8 GB SDHC card
tical world problems through technological innovation. has been used as the flash storage in this case study.
This is a collaboration of the academics with the gov- Case VI: dataset from UMass Trace Repository: The
ernment and industry researchers. There are 35 different UMass Trace Repository [27] is a commonly used data
traces available for this MSR Cambridge Traces and they repository. It provides storage, network, and other traces
represent 1-week block I/O traces of enterprise servers at for analysis to the research community. This work is sup-
Microsoft Research Cambridge. The characteristics of the ported by the National Science Foundation. This reposi-
traces are given along with file names, their attributes, file tory contains different traces namely CPU and memory,
size, etc. in Table 8. We have used those datasets to check network, Storage, weather, power, smart, and multime-
the performance of the proposed controller in multiblock dia traces under two different categories, namely Financial
and WebSearch. The characteristics of the traces are given
in Table 9. We have used those datasets to verify the multi-
Table 8 Characteristics table for MSR Cambridge Repository [25]
block data transfer process of the proposed controller and
Repository Abbre- File name File size Attributes
the SanDisk 8 GB SDHC card has been used for this
name -viation in KB
purpose.
MSR M11 CAM-02-SRV-lvm0 227,577 Timestamps, Case VII: dataset from SNIA Iotta Repository Historical
Cambridge M12 CAM-02-SRV-lvm1 1,576,883 Hostname, Section: Storage Networking Industry Association (SNIA)
1 M13 CAM-02-SRV-lvm2 117,104 Disk Iotta Repository [26] is a commonly used repository,
M14 CAM-02-SRV-lvm3 1,274,935 Number, used to store, manage, and distribute different traces or
M15 CAM-02-SRV-lvm4 1,274,935 Type, datasets for storage. The historical section of this reposi-
M16 CAMRESHMSA01-lvm0 197,530 Offset, tory includes all the traces which are older than 10 years.
M17 CAMRESHMSA01-lvm1 29,069 Size, The historical section contains five different traces namely
M18 CAMRESISAA02-lvm0 643,399 Response Block I/O Traces, Network File System Traces, Parallel
M19 CAMRESISAA02-lvm1 8,832,021 Time
Traces, Static Snapshots, and System Call Traces. Each of
these traces are further divided into multiple sub-traces.
M110 CAMRESWMSA03-lvm0 62,457
We have used some sub-traces from those available traces
M111 CAMRESWMSA03-lvm1 86,267
to verify the multiblock data transfer process of the pro-
M112 CAM-USP-01-lvm0 284,762
posed controller. The characteristics of the traces are
MSR M21 CAM-01-SRV-lvm0 115,703 Timestamp, given in Table 10. The SanDisk 8 GB SDHC card has been
Cambridge M22 CAM-01-SRV-lvm1 2,393,907 Hostname, used as the storage device in this case study.
2 M23 CAM-01-SRV-lvm2 560,593 Disk Different flash transition layer settings: The function-
M24 CAMRESIRA01-lvm0 75,897 Number, ality and efficiency of the proposed controller have also
M25 CAMRESIRA01-lvm1 739 Type,
been tested using different volume of data. We have tested
the read-write operations of the controller for different
M26 CAMRESIRA01-lvm2 10,894 Offset,
volume likely 4, 8, 10, and 512 MB and 1 GB.
M27 CAMRESSDPA01-lvm0 2,028,692 Size,
M28 CAMRESSDPA01-lvm1 2,476,675 Response 5.3.2 Results
M29 CAMRESSDPA01-lvm2 98,864 Time The implemented architecture of the proposed controller
M210 CAMRESSDPA03-lvm0 80,578
works in three different modes; namely card initialization
mode, card read mode, and card write mode. After card
M211 CAMRESSDPA03-lvm1 34,773
initialization, based on the external commands, controller
M212 CAMRESSDPA03-lvm2 61,550
communicates with the card either for read or for write
M213 CAMRESSTGA01-lvm0 103,206
operation.
M214 CAMRESSTGA01-lvm1 113,632
M215 CAMRESTSA01-lvm0 91,123 Table 9 Characteristics table for UMass Repository [27]
M216 CAMRESWEBA03-lvm0 105,051 Repository File name File size Attributes
name in KB
M217 CAMRESWEBA03-lvm1 8,308
UMass Financial1 151165 ASU,
M218 CAMRESWEBA03-lvm2 299,650
M219 CAMRESWEBA03-lvm3 1611 Financial2 102873 LBA,
Table 10 Characteristics table for SNIA Historical Section all the cards and initialization section for the SDHC card
Repository [26] only.
Trace name Sub-traces Number The results of the proposed controller have been
of files
observed and verified in chronological order as per prob-
Block I/O Traces Cello 1999 12 lem description stated above.
Cello 1996 12 Case I: result: pre-declared embedded data pattern: The
Cello 1992 12 first case study represents the complete three-state access
HP LAJW 12 of a cannon 16 MB SD card. The steps in different phases
of data transfer are described below:
Cello 1991 12
Initialization - Figures 19 and 20 show the initializa-
NFS Traces Animation dataset 140 tion state output mapped in the on-board LEDs. The
Harvard SOS Traces 7 driving signal for the initialization module is the Reset sig-
Parallel Traces Sprite Traces 1
nal. Figure 19 shows the output response pattern of the
SD card after execution of the first command (CMD0).
Static Snapshots Multimedia file sizes 1 Figure 20 shows the output pattern after completion of the
Microsoft Longitudinal Study 1 entire initialization step, when the card is in idle mode.
Microsoft 1998 Static Study 1 Card write operation - The pre-defined 8-bit embedded
data pattern has been written to the SD card. Figure 21
System Call Traces LASR Traces 13
shows the CRC status response (viz. “00000101”) coming
Seer Traces (ASCII) 1
from SD card to the output port. The card responds to
Seer Traces 1 every successful write of an entire block. If the operation
CMU DFS Traces 14 performed is a single block write, then the card gives the
response once after the completion of the write operation
of the entire block. But if it is a multiblock write operation,
then the card responds after every block write comple-
tion, until the end block command (CMD12) has been
To validate the read/write operation two types of test issued.
unit has been considered. They are described as follows: Card read operation - Here, we consider an arbitrary
Test unit: on-board output unit (LED): The Spartan 3E data pattern, say e.g., “11010011” which has been read
target FPGA board contains 8-bit output unit with 8 on- from the SD card. Figure 22 shows that data pattern on
board light emitting diodes (LEDs) [20]. In the testing the LEDs. Both the single block and the multiblock write
setup of the experiment, the MISO signal of the controller and read operations have been verified with the data pat-
has been mapped with a single LED of the on-board 8- tern. This 8-bit data pattern has been written repeatedly to
bit LED unit and the resultant command responses and form the entire block (of 512 bytes) to perform the block
read/write data patterns were observed. Since the SPI write operation.
mode is a serial bus interface mode, the SD card gives Case II: result: pseudo-random number sequences: The
response serially through its output port to the mas- second case study represents the complete 3-state access
ter. The MISO line signal is left shifted in every clock for a SanDisk 8 GB class 4 SDHC card. Both the input
cycle to observe a series of patterns in 8-bit LED unit and output ports are connected with the DSO. The study
available in the FPGA board. Additionally, in the output shows the efficiency of the proposed controller for high
section, a clock down converter unit has been designed capacity SD card. Later on, the 16 MB Cannon SD card
and integrated to slow down the speed of response of and the 2 GB microSD card have also been verified for the
the controller for better perception. This is a little com- read/write process.
promise with the speed of operation in testing and ver-
ification section. However, introduction of this section
is purely optional and while integration of the controller
with real-time high speed system this unit is to be omitted.
Test unit: DSO: A digital storage oscilloscope (DSO) has
been connected with the input (MOSI) and the output
(MISO) port of the controller. Both the output and input
data patterns have been observed in DSO for further ver-
ification. The clock down converter unit has also been
integrated here to slow down the response speed of the
Fig. 19 CMD0 response pattern on the on-board LEDs for SD card
card. But, it is introduced only for the card read section for
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 17 of 26
Fig. 21 CRC response pattern for write operation Fig. 23 CMD0: input sequence and response pattern
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 18 of 26
Fig. 24 CMD8: input sequence and response pattern Fig. 27 Idle mode input and response sequence
Fig. 25 CMD55: input sequence and response pattern Fig. 28 Input data with response pattern for single block write
operation on DSO
Fig. 26 ACMD41: input sequence and response pattern with single Fig. 29 Input data with response pattern for multiblock write
block read sequence operation on DSO
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 19 of 26
Fig. 30 Single block read mode input command and data response Fig. 33 Input data with response pattern for multiblock write
from SDHC operation to SD card
Fig. 31 Multiblock read mode input command and data response Fig. 34 Single block read mode input command and data response
from SDHC from SD
Fig. 36 Initialization: input sequences and response patterns for Fig. 38 Multiblock read mode input command and data response
microSD card from microSD
The third and the forth case study were to test and ver- that phase different commands, responses and data pat-
ify the efficiency of the proposed controller for real-time terns were recorded using DSO. Later on, the same data
audio and video signal storage and retrieval. pattern has been retrieved from the SDHC card in read
Case III: result - audio signal transmission: In this study mode and they were also recorded. We got the stored data
an analog signal is fed to the system in audio range. The back along with the CRC bits and on DSO those CRC pat-
received signal has been converted and compressed into terns were present with the original data. For simplicity,
8-bit digital data pattern and fed it to the input data bus the result has been given in the form of bit pattern and the
of the controller for the verification of multiblock data parts of the stored data have been encircled to show the
write procedure in the SDHC card. The data stored in this same portion of the retrieved data.
process is again accessed via multiblock card read proce- Case IV: result - video signal transmission: The video sig-
dure of the controller. Both the input audio data and the nal has been stored and retrieved in this fourth case study.
data accessed from the SDHC card were fed to the DSO. A video signal of 20 frames/second was fed to the system
Figure 39 shows the input data pattern and the read oper- and subsequently stored into the card. The SanDisk 8 GB
ation from the SDHC card. The read and write operations class 4 SDHC card has been used to verify the video signal
are two different operations which cannot be performed transmission process. The card performs in 4 MB/s writ-
at same time instance. Therefore, initially the analog signal ing speed to store and retrieve the video signal. The entire
was converted and stored into the SDHC card and during communication was governed under the control of the
proposed controller. To ensure the successful data stor-
age and retrieval, later those frames were retrieved from
the card and they were displayed in CRT monitor. The 6 Performance comparison
retrieved frames were compared with the original frames This section describes the performance comparison of
which were stored in the card. Figure 40 shows a snapshot the controller with reported results. The logic utilization
of such video data storage and retrieval. The right hand for the proposed controller is only 5% of the total logic
screen in Fig. 40 shows the actual frame that was written present in Spartan 3E FPGA board. This is evident from
in the card and the left hand side monitor shows the frame Table 6.
retrieved from the card. Here the left hand side monitor, The proposed controller has been tested for a single
the implemented controller along with the card work as a block as well as for multiblock (5000 block) data read
standalone unit. and write operation for the SD, SDHC, and microSD
Case V: result - dataset from MSR Cambridge Traces: cards. Table 11 shows a summary of clock cycle elapsed
The datasets were collected from MSR Cambridge traces by SDHC card, and Table 12 shows a summary of clock
[24, 25], as explained in “Case studies” section where the cycle elapsed by the microSD card during initialization,
description of the datasets is illustrated. In our exper- single and multiblock reads, and single and multiblock
imental part, each file of three traces is considered as write operation. In the initialization phase, the proposed
separate dataset. The datasets were divided into multiple system utilizes full bandwidth supported by SD card tech-
blocks so that we can accomplish multiblock read-write nology. Initially the SDHC card receives CMD0, CMD8,
with respect to flash memory. In the performance com- CMD55, ACMD41, CMD58, and CMD16 commands and
parison section, the metrics were computed for multi- the controller receives corresponding responses for each
block data read write with respect to different I/O traces. of the commands. (CMD55 and ACMD41) pair of com-
Results were tabulated for the datasets collected from the mands may be required to send “n” times until the card
repositories. generates the response pattern x“00”. Therefore the total
Case VI: result - dataset from UMass Trace Repository: number of clock cycles elapsed in initialization phase by
The datasets were collected from UMass Trace Reposi- SDHC card is 112(2 + n). However in Table 11 we have
tory [24, 27], as explained in “Case studies” section. In considered the ideal case of n = 1, and the initializa-
our experimental part, those datasets have been divided tion process takes the minimum of 336 clock cycles for
into multiple blocks and the data has been stored in San- SanDisk 8 GB class 4 SDHC card in SPI bus mode. Sim-
Disk 8 GB class 4 SDHC card to accomplish multiblock ilarly, SD card initialization process requires 56(3 + 2n)
read-write. In the performance comparison section, the clock cycles and microSD card requires 56(1 + 2n) clock
metrics were computed for multiblock data read write cycles for initialization. The difference is due to the fact
with respect to different I/O traces. Results were tabulated that CMD58 command is not necessary for the initializa-
for the dataset collected from the repositories. tion phase of SD card and CMD58, CMD16 commands
Case VII: result - dataset from SNIA Iotta Reposi- are not required for the microSD card initialization. Only
tory Historical Section: The datasets were collected from CMD0 and (CMD55+ACMD41) in place of CMD1 initi-
the historical section SNIA Iotta Repository [24, 26], as ates the microSD card in SPI mode. In the ideal case with
explained in “Case studies” section. In our experimental n = 1, the initialization process takes the minimum of
part, those datasets have been divided into multiple blocks 280 clock cycles for cannon 16 MB class 2 SD card and
to accomplish the read write with respect to flash mem- 168 clock cycles for microSD card in SPI bus mode. The
ory. Results were tabulated for the dataset collected from HCS bit should be high in ACMD41 command for initial-
the repositories. ization of the SDHC card, whereas for SD card and the
microSD card with capacity ≤ 2 GB requires HCS bit to
be de-asserted at the time of initialization.
Tables 13 and 14 show a comparative study of the speed
of response for SD card. Also in Table 13, the initializa-
tion time for SDHC card and microSD card has been
Table 12 Clock cycles achieved for microSD card Table 14 Speed comparison in read/write phase
Process CCTPP CCTPP
Initialization 168 Process SD card Elkeelany % Reduction
achieved et al. [1]
Single block read 4176
Single block write 4184 Single Block Read 4176 12025 65
Multiblock (5000) read 20600056 Multiblock Read (5000 Block) 20600056 26275000 21.59
Fig. 41 Comparison chart for the initialization process of the SD card Fig. 45 Improvement chart for multiblock data transfer of SD card
Fig. 42 Comparison chart for the single block data transfer of SD card
Fig. 46 Speed up comparison chart of SDHC card
Fig. 43 Improvement chart for single block data transfer of SD card Fig. 47 Speed up comparison chart of SD card
Fig. 44 Comparison chart for multiblock data transfer of SD card Fig. 48 Speed up comparison chart of microSD card
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 24 of 26
We have also tested the efficiency, performance and further modification can be incorporated by changing the
handling capacity of this proposed controller in different proposed architecture as well as the design can be imple-
FTL settings where volume of data is varied from 4 MB, mented in other high-end target platform with a very
8 MB, 10 MB, 512 MB to 1 GB. Table 19 shows the per- minor modification in configuration procedure.
formance in terms of CCTPP for the above mentioned
Authors’ contributions
volume of datasets.
The work has been carried out as the M.Tech. final year project of the first
author, SB, under the supervision of the second author, SM. The selection and
7 Conclusions setup of the project had been carried out by both the authors together. The
structuration and coding part was carried out by SB and the testing and
An on-chip design and implementation of a controller
debugging part was done by both the authors. This manuscript had been
has been proposed for SDHC and similar family of cards. prepared and checked by both of the authors together. All authors read and
The design has also been implemented for the microSD approved the final manuscript.
card. In addition to that, the same controller can be used
Competing interests
for data communication with MMC also. The FSM-based The authors declare that they have no competing interests.
architecture design, its operation, FPGA-based imple-
mentation, control flow and execution, synthesis results, Received: 28 May 2016 Accepted: 31 October 2016
Design of sdhc card video player based on sopc (IEEE Computer Society,
2012), pp. 900–904. doi:10.1109/IMCCC.2012.216
15. M Abdallah, O Elkeelany, in Computing, Engineering and Information, 2009.
ICC’09. International Conference On. Simultaneous multi-channel data
acquisition and storing system (IEEE, 2009), pp. 233–236.
doi:10.1109/ICC.2009.17
16. C-S Lin, K-Y Chen, Y-H Wang, L-R Dung, in 2006 13th IEEE International
Conference on Electronics, Circuits and Systems. A nand flash memory
controller for sd/mmc flash memory card (IEEE, 2006), pp. 1284–1287.
doi:10.1109/TMAG.2006.888520
17. O Elkeelany, G Vince, in 2007 Thirty-Ninth Southeastern Symposium on
System Theory. Portable analog data capture using custom processing
(IEEE, 2007), pp. 120–123
18. C Li, Q Wang, L Wang, in Computer and Information Technology
Workshops, 2008. CIT Workshops 2008. IEEE 8th International Conference On.
A high efficient flash storage system for two-way cable modem (IEEE,
2008), pp. 551–556. doi:10.1109/CIT.2008.Workshops.30
19. C-Y Lu, H Kuan, Nonvolatile semiconductor memory revolutionizing
information storage. IEEE Nanotechnol. Mag. 3(4), 4–9 (2009).
doi:10.1109/MNANO.2009.934861
20. S Xilinx, 3E starter kit board user guide. UG230 (v1. 0) March. 9 (2006).
http://www.xilinx.com/support/documentation/boards_and_kits/ug230.
pdf
21. SanDisk, Secure Digital Card Product Manual. 1.9(80-13-00169) (2003)
22. Instruments, Texas, Msp430x1xx family user’s guide. (SLAU049B, 2006)
23. Y Deng, J Zhou, Architectures and optimization methods of flash memory
based storage systems. J. Syst. Archit. 57(2), 214–227 (2011)
24. D Narayanan, A Donnelly, A Rowstron, Write off- loading: practical power
management for enterprise storage. ACM Trans. Storage. 4(3), 10–11023
(2008)
25. Storage Networking Industry Association and others, MSR Cambridge
Traces (2010). http://iotta.snia.org/traces/388
26. Storage Networking Industry Association, et al, SNIA Iotta Repository.
Microsoft Enterprise Traces, Colorado Springs, Colorado (iotta. snia.
org/traces/130) (2011). http://iotta.snia.org/historical_section
27. Application, OLTP, I/O and search engine I/O. umass trace repository
(2007). http://traces.cs.umass.edu/index.php/Storage/Storage
28. S Chen, What types of ECC should be used on flash memory. Application
Note for SPANSION (2007). http://www.spansion.com/support/
application%20notes/types_of_ecc_used_on_flash_an.pdf
29. J No, Nand flash memory-based hybrid file system for high I/O
performance. J. Parallel Distrib. Comput. 72(12), 1680–1695 (2012)
30. R Wang, Z Mi, H Yu, W Yuan, The design of image processing system
based on SOPC and ov7670. Procedia Eng. 24, 237–241 (2011)
31. M Fabiano, M Indaco, S Di Carlo, P Prinetto, Design and optimization of
adaptable BCH codecs for nand flash memories. Microprocess. Microsyst.
37(4), 407–419 (2013)
32. M Baklouti, P Marquet, J Dekeyser, M Abid, FPGA-based many-core
system-on-chip design. Microprocessors and Microsystems. 39(4),
302–312 (2015)
33. F Thomas, M Nayak, S Udupa, J Kishore, V Agrawal, A hardware/software
codesign for improved data acquisition in a processor based embedded
system. Microprocess. Microsyst. 24(3), 129–134 (2000)
34. F Chen, DA Koufaty, X Zhang, in Proceedings of the International
Conference on Supercomputing. Hystor: Making the best use of solid state
drives in high performance storage systems (ACM, Tucson, Arizona, 2011),
pp. 22–32
35. J Guerra, H Pucha, JS Glider, W Belluomini, R Rangaswami, in FAST. Cost
Submit your manuscript to a
effective storage using extent based dynamic tiering, vol. 11, (2011), journal and benefit from:
pp. 20–20
36. Technical Committee SD Card Association, et al., Speed Class Greater 7 Convenient online submission
Performance Choices, Online available and accessed. Onlineathttps:// 7 Rigorous peer review
www.sdcard.org/developers/overview/speed_class/ 7 Immediate publication on acceptance
37. Editor - Metering, Minsen - your ideal supplier of wireless 7 Open access: articles freely available online
water/gas/electricity meters (2013). https://www.metering.com/minsen-
your-ideal-supplier-of-wireless-water-gaselectricity-meters/ 7 High visibility within the field
7 Retaining the copyright to your article