0% found this document useful (0 votes)
4 views26 pages

A_low_cost_and_fast_controller_architecture_for_mu

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 26

Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24

DOI 10.1186/s13639-016-0060-8
EURASIP Journal on
Embedded Systems

RESEARCH Open Access

A low cost and fast controller architecture


for multimedia data storage and retrieval to
flash-based storage device
Samiran Banerjee* and Sumitra Mukhopadhyay

Abstract
Real-time multimedia data access plays an important role in electronic systems; as time goes by, with decrease in
data processing speed and increase in communication time, storage time, and retrieval time, the overall response
time increases for real-time applications. Therefore, in this paper, a novel real-time, fast, low-cost, system-on-chip
(SoC) controller has been proposed and implemented where large volume of data can be efficiently stored and
retrieved from flash memory cards. It is being implemented only using hardware description language (HDL) on a field
programmable gate array (FPGA) chip without using any other on-board or external hardware resources or high-level
languages. The entire controller architecture, in a single chip, contains five different modules and is designed using
finite state machine (FSM)-based approach. The modules are card initialization module (CINM), idle module (IM), card
read module (CRM), card write module (CWM), and decision module (DM). The architecture is completely synthesized
for Spartan 3E xc3s500e-4-fg320 FPGA with only 5% of the total logic utilization. The experimental results tested for
microSD, SD, and SDHC cards of different size, and these show that the architecture uses less hardware and clock
cycles for card initialization and single/multiblock read/write procedure.
Keywords: Flash memory read/write, Secure Digital High Capacity (SDHC) card, MicroSD card, Serial peripheral
interface (SPI), Finite state machine (FSM), Very high speed integrated circuit hardware description language (VHDL),
Field programmable gate array (FPGA)

1 Introduction The SD card includes an on-card intelligent controller


The flash-based memory storage device, introduced by to manage the interface protocol, security algorithms, data
Toshiba in 1984, is basically a non-volatile electronic storage and retrieval, error handling and corresponding
memory and used whenever a shock resistance is the key error correction code (ECC) algorithms, defect handling
requirement of any application [1]. The Secured Digi- and diagnostics, power management, and clock control
tal High Capacity (SDHC) card, for example, is a flash- [1]. However, to interface the SDHC card (slave unit)
based memory storage device and is mainly designed to with master unit (e.g., computer, host, or any application-
meet certain requirements such as security, capacity, per- specific device), we need a system which can talk with
formance, and environmental issues inherent in newly the on-card controller of the SDHC card for smooth
emerging audio and video consumer electronic devices. execution of single/multiblock data read/write.
The Secured Digital (SD) card standard is designed and Figure 1 represents a generic model of a data archival
licensed by SD Card Association [2] and is a collabora- system and it shows how the flash-based cards like
tive effort of the three manufacturers, namely Toshiba, microSD, SD, or SDHC cards can be used as the plug
SanDisk, and MEI [3]. and play memory module for real-time application. Gen-
erally, the signal is received from the external world in a
buffer, converted into a serial bit stream and subsequently
*Correspondence: samiranbanerjee1991@gmail.com stored into the memory card. The stored data at later
Institute of Radio Physics and Electronics, University of Calcutta, 92, APC Road, stage may be transmitted for further processing. Also,
Kolkata, India
the flash memory acts as a portable unit and it can be

© The Author(s). 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made.
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 2 of 26

Input Output
Flash Memory
Controller
Data SCLK
From Data To
PISO SIPO

External External
MISO CS MOSI
World World
Vcc Flash Based
GND Memory Device
Buffer Buffer

Power Lines
Fig. 1 Thematic model of a data archival system

removed from the system where it is presently housed in The flash based cards like microSD, SD or SDHC
and accessed by some other system to retrieve the desired cards work in two different bus modes. They are the
signal. Secure Digital (SD) bus mode and the Serial Periph-
The flash memory is very much useful in fields where eral Interface (SPI) bus mode. The SPI mode is a
data transportation and archival is a key requirement. synchronous serial protocol with less complexity. It is
The memory can be used as a data concentrator (Fig. 2) extremely popular for interfacing the peripheral devices
where the proposed controller architecture along with the and no native-host interface is needed for this mode.
flash memory can be used as the removable memory of For its simplicity and usefulness in the low cost embed-
any data concentrator network device. The flash mem- ded system application, we have considered designing an
ory has extensive application in database [4], networking entirely on-board hardware-based controller for smooth
[5], biomedical application [6, 7], virtualized storage sys- realization of SPI bus mode-based data transfer pro-
tem [8], cloud computing [9], geographic remote sensing tocol to communicate with the flash-based memory
[10], mobile devices [11–13], etc. It can be used in a card.
router to store the routing table for further access. NAND- To date, we find that limited research work describes
based flash system is also widely employed as cache in the design of data archival system and subsequent imple-
virtualized system [8]. In such application scenarios, the mentation using HDL [1]. In some research work, SPI
efficiency of single and multiblock data transfer is very mode-based data communication system has been imple-
important which consequently affects the input-output mented. Another work was proposed in the literature
operations per seconds (IOPS) measure of storage system. where the SDHC card had been used in SD mode (i.e.,
Generally, we wish to maximize this metric with respect bulk data transfer mode) for video signal storage and
to different types of flash as this indicates the measure processing [14]. With the newly emerging technologies,
of flash utilization. As stated, flash-based system can be flash-based memory devices have been used as the effi-
used as a cache or a data concentrator or to cater any cient storage unit and till now it accomplishes the need of
such storage requirements. However, here in the paper, memory storage even on the modern era of technological
we do not analyze the pros and cons of such utilization of advancement [2, 3, 15–23].
flash in detail as the work mainly concentrates on efficient In light of the above, this paper proposes a novel,
implementation of a controller for single or multiblock real-time, low-cost, system-on-chip application specific
data transfer with respect to flash memory. The imple- controller for multimedia data storage and retrieval to
mentation may be exploited in any kind of flash resource flash-based memory cards. The architecture of this real-
utilization and will ultimately contribute in the calculation time controller has been designed using FSM-based
of the metric of memory resource utilization. Therefore, it approach. The HDL used here is very high speed inte-
is observed that the implementation of an efficient flash- grated circuit hardware description language (VHDL).
based data transfer is a fundamental driving factor in the Also, the design is such that there is no use of any on-
improvement of flash resource utilization and the paper is chip general purpose processor (GPP), external controller,
focusing on that rudimentary aspect. hardware resources, or any high-level languages during
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 3 of 26

Fig. 2 Block diagram of a real-time data concentrator system [37]

the operation. The prototype has been entirely imple- 1. To design a modular low-cost, system-on-chip,
mented in target FPGA board. The physical attribute of application-specific controller for multimedia data
an FPGA chip, being compact in size and low in power storage and retrieval to flash memory cards like SD,
consumption, makes it an ideal platform for the imple- SDHC, and microSD card in SPI mode with less
mentation. Also, we have tried to exploit the parallel overhead.
processing capability of FPGA during design and imple- 2. The design is completely FSM based and the
mentation of various modules. Till date to the authors’ controller has been primarily realized using five
knowledge, optimal-in-hardware implementation of such different modules and a control unit. This five
an application-specific controller and the study of its different modules, along with the control unit, are
various modules were not explored in details by ear- the different functional areas of the proposed system,
lier research. In this work, the proposed controller has which is implemented completely in a single chip.
been examined for both the audio and video data stor- 3. When the card is in idle state, the system has an
age and retrieval separately. Also to test the importance option of working in power saving mode.
of the work with respect to practical workloads [24], 4. The controller will work in real time, in modular
we have collected the dataset from MSR Cambridge fashion and the implementation is on a single FPGA.
Traces [25], SNIA Iotta Repository [26], and UMass The proposed design tries to utilize the parallel
Trace Repository [27] and tried to establish the impor- processing capability of FPGA. No other external
tance of the controller with respect to flash read-write devices or on-card intelligent controller has been
procedure. used for this implementation. Here, the prototype
Again in nutshell, the objectives of this paper are as has been completely synthesized and tested for
follows: Spartan 3E xc3s500e-4-fg320 FPGA.
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 4 of 26

5. The architecture has been designed using HDL only. Table 1 Existing papers on SD card controller design
Here in this paper, we have considered VHDL for Work Controller design Platform model
implementing the controller. No high-level languages [1] Data archival to SD card Altera Cyclone II
were used for this design. This FSM-based design using HDL
using VHDL is one of the basic feature of this [14] Design of SDHC card video NIOS II CPU with IDCT
proposed controller, which makes it faster than any player on SoPC hardware acceleration IP
core
other controller.
6. It is completely a prototype design of the proposed [15] Simultaneous Altera Cyclone II
multi-channel data
controller; the design can be implemented in any acquisition system
other platform instead of Spartan 3E target device [16] NAND flash memory Freescale DSP 56858
(even using ASIC also) with a very minor controller for SD/MMC platform with UMC 0.18
modification in configuration part of the card μm CMOS process
implementation. There will be no change in the [17] Portable analog data WOLFSON WM8731 ADC,
design phase. capture using custom NIOS-II processor
processing

The rest of the paper is organized as follows. Section 2 [18] A high efficient flash TWCNP-OS
storage system for
introduces the related works and highlights the novelty two-way cable modem
of the proposed approach. Section 3 describes the pro-  
Proposed FSM-based Xilinx Spartan 3E
posed FPGA-based controller, its architecture, execution application-specific xc3s500e-4-fg320
process, and overall operation of the controller. Section 4 controller using HDL
presents the hardware-specific implementation and syn-
thesis details for the target Xilinx Spartan-3E (xc3s500e-4- embedded C. The use of the high-level language in this
fg320) FPGA development platform. Experimental results paper makes the system slower with additional overhead.
are described in Section 5 and Section 6 concludes the Lin and Dung [16] proposed a novel NAND flash mem-
paper. ory controller for SD/Multimedia Card (MMC). They
have designed Bose-Chaudhuri-Hocquenghan (BCH)
2 Review of the related works error correction code (ECC) [28] for correcting the ran-
FPGAs have been used for prototype design in a range dom bit errors of the flash memory chip. The UMC
of engineering application [1, 14–19]; however, till date 0.18 μm CMOS process was used to implement the pro-
to the authors’ knowledge, the design of a complete posed memory controller chip. This proposed controller
application-specific controller for different flash memory was verified for MMC only.
card access with detailed description of the modules and Elkeelany and Vince [17] proposed a portable analog
their operation is limited. Table 1 depicts some of the data capture system using custom processor. The SD card
earlier work in this domain. had been used in 1-bit SD mode for their proposed system.
Elkeelany et al. [1] proposed an FPGA-based data The SPI mode or 4-bit SD mode-based communication
archival system to SD card, using Verilog HDL and they were not discussed in their design.
accessed the card in SPI mode. Scalability issues have not The works, summarized in Table 1, have established
been achieved in this design. They have partially applied the concept of FPGA-based implementation for the SD
FSM based approach and the implementation issues of card data archival system either in SD mode or SPI
different SD cards have not been discussed explicitly in mode. Some researchers [15–18] have taken help of high-
this paper. level languages or external controllers, on-board proces-
Yang et al. [14] presented the SDHC card video player sors, and other resources apart from only FPGA logic
based on SoPC technology. The IP core and two display resources during implementation of data access mech-
buffer SRAMs were alternately utilized for their proposed anism. In some paper [14, 15, 23, 29, 30], partially,
design. They have accessed the SDHC card in SD mode FSM-based approaches have been used for realization
for bulk data transfer. The proposed design has been of data transfer mechanism. The single/multiple blocks
implemented using high-level language. read/write procedures were designed using FSM, and they
In another work, Abdallah and Elkeelany [15] pro- have been implemented those procedures using HDL for
posed a FPGA-based simultaneous multi-channel data target device. Also, the BCH code for NAND flash mem-
acquisition system and they had verified the proposed ory has been optimized in previous work [31] and the
architecture for analog signals. The design includes data-intensive application using FPGA has been per-
analog-to-digital converter to convert the analog signal to formed in earlier researches [32, 33].
digital data. The time-critical tasks were implemented in Flash-based storage system has other advantages also.
hardware, while the other tasks were implemented using Many a time, it has been observed that the efficient data
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 5 of 26

communication of flash-based system plays a significant Interface) mode. Table 2 shows the pin configuration of
role in the improvement of flash resource utilization in the SD/SDHC card and Fig. 3 shows the thematic repre-
many of the systems. Flash resource can be utilized as a sentation of electrical interface of the card (slave) in SPI
cache-based storage system or can be integrated with hard mode with the FPGA board (Master).
disk drive (HDD) and can act as a hybrid system. Flash The SD/SDHC card communication protocol in SPI
resources are also utilized in virtualized storage system [8] mode is entirely a command-dependent protocol and
where efficient managers are designed to get more high the card responds to every command with a pre-defined
cost effectiveness than normal caching algorithm. Flash- response pattern. In the way of initialization, first the card
based multi-tiered systems are also studied presently in is initiated with CMD0 command. Then, the controller
the literature. Some of them are multi-tier SSD-based validates the voltage range by generating the CMD8 com-
solution [34], a hyper-visor-based design [35] etc. Most mand. It also identifies the version of the card (version
of the works, however, emphasize on the improvement of 2 (SDHC) card or some other cards). Subsequently, the
caching policies with respect to standard existing caching controller generates the application-specific commands
algorithm like LRU and that analysis is out of scope of such as (CMD55 + ACMD41) to complete the initializa-
this paper. Here we primarily analyzed on multimedia data tion process. The controller will continuously generate
storage and retrieval to flash-based storage system and (CMD55 + ACMD41) command until the card initializes
this in turn has profound effect on the improvement of itself by giving a “00000000” response. The SDHC card
flash-based resource implementation. supports two types of addressing mode. They are block
In our work, we aim to design an application- addressing mode and byte addressing mode. The CMD58
specific controller for efficient multimedia data commu- command identifies the addressing mode of the version 2
nication with flash-based cards in SPI mode and the SDHC card. Also, CMD16 command is issued to fix the
controller architecture was entirely designed using FSM- data block length to 512 bytes. After initialization process,
based approach. There are mainly five states present in the card goes to the idle state until the next command is
the proposed FSM and the states are named as initial- being generated for single/multiblock read/write.
ization state, idle state, card-read state, card-write state, The speed class of the card denotes minimum writing
and decision-making state. During the realization of the performance of the card to record a video normally [36].
controller architecture, these states are mapped into the Various speed classes defined by SD Association are 2,
modules of the controller. Now some of these modules 4, 6, and 10. Throughout this work, we have used the
are used to accomplish card read/write procedures and SDHC and SD card with speed classes 4 and 2, respec-
therefore internal architecture of those modules are again tively, which means that the SDHC and SD card, used in
implemented based on FSM format for the realization of this purpose, supports minimum 4 and 2 MB/s writing
above procedures. Note that the proposed architecture speed, respectively, for video recording.
and implementation aims to minimize both the clock uti-
lization and on-board resource utilization of the FPGA 3.2 MicroSD card
board. Also in this work, we have considered the required The microSD card communication is much similar to the
clock cycles, workload, response time, etc. as performance SD card communication. The difference between these
metric to compare the effectiveness of the proposed two monsters in the present age data storage medium is
approach with respect to other existing papers. However, in their pin configuration. The microSD communication
only in the initialization phase, we have represented the is based on the 8-pin interface where all the pins from the
performance with respect to “time” metric to compare the SD card are present except the second ground (Vss2) pin.
achieved results with the reported values in the literature.
Table 2 SD/SDHC card pin details
3 Proposed FPGA-based controller Pin No. Name Function in SD mode Function in SPI mode
This section initially describes the basic characteristics of ¯ ¯
1 DAT3/(CS) Data line 3 Chip select/slave select (SS)
the SD/SDHC card and microSD card and then introduces
2 CMD/DI Command line MOSI
the proposed controller in rest of the section.
3 Vss1 Ground Ground
3.1 High capacity SD card 4 VDD Supply voltage Supply voltage
The SD/SDHC card communication is based on the 5 Clock Clock Clock (SCLK)
advanced nine pin interface, i.e., Clock, Command 6 Vss2 Ground Ground
line/Master Out Slave In (MOSI), 4xData lines/ Master
7 DAT0/DO Data line 0 MISO
In Slave Out (MISO), and 3xPower lines. The card sup-
8 DAT1/IRQ Data line 1 Unused/IRQ
ports three communication protocols [21]. They are SD
1-bit mode, SD 4-bit mode, and SPI (Serial Peripheral 9 Dat2/NC Data line 2 Unused
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 6 of 26

NC NC
MASTER SLAVE
MASTER SLAVE
9 DAT2/NC DAT2/NC

SS 1 DAT3/CS SS DAT3/CS
MOSI 2 CMD/DI MOSI CMD/DI
GND 3 VSS1 VDD VDD
Vdd 4 Vdd SCLK CLK/SCLK
SCLK 5 CLK/SCLK GND GND
6 VSS2 MISO DAT0/DO
MISO 7 DAT0/DO
DAT1/IRQ
8 DAT1/IRQ
NC
FPGA CHIP NC SDHC CARD
FPGA CHIP MICROSD CARD

Fig. 3 SDHC card electrical interface Fig. 4 MicroSD card electrical interface with FPGA board

The state diagram of the overall control flow of the con-


Table 3 shows the pin details of the microSD card, and
troller is shown in Fig. 6, and the internal architecture
Fig. 4 shows the interfacing of the microSD card with the
is shown in Fig. 7. The state diagram of Fig. 6 is work-
Spartan3E target FPGA board.
ing as a backbone for the architecture of the controller.
The microSD card communication is also based on
The complete architecture has been implemented using
command-dependent protocol, and it is almost similar to
VHDL.
the SD and SDHC card communication methods. The
capacity of the microSD card denotes how it works. If the
capacity is less than or equal to 2 GB (≤ 2 GB), then the
card works similar to the SD card; otherwise, the principle
of operation is the same as the SDHC card.
The definition of the speed class for microSD card is
the same as the SD card [36]. We have used the class
4 microSD card throughout the work, which means, the
card supports 4 MB/s writing speed for video recording in
a normal mode.

3.3 Architecture of the proposed controller


The workflow of the proposed host controller is based
on the initialization of the card followed by data transfer
(read/write) sequences. The overall external view of the
controller interfacing the SDHC/SD card is given in Fig. 5.
The same process can be used for interfacing the microSD
card also. The Vss2 pin remains unconnected when the
process is used for interfacing the microSD card as the
card contains only 8-pin interface, and the Vss2 pin is not
present in the physical architecture of the microSD card.

Table 3 MicroSD card pin details


Pin No. Name Function in SD mode Function in SPI mode
1 Dat2/NC Data line 2 Unused
2 ¯
DAT3/(CS) Data line 3 ¯
Chip select/slave select (SS)
3 CMD/DI Command line MOSI
4 VDD Supply voltage Supply voltage
5 Clock Clock Clock (SCLK)
6 GND Ground Ground
7 DAT0/DO Data line 0 MISO
8 DAT1/IRQ Data line 1 Unused/IRQ Fig. 5 Overall schematic of the proposed controller
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 7 of 26

Fig. 6 State diagram of the controller

As we observe from the above mentioned flow sequence and CWM of the controller with the external card based
and the schematic of the internal architecture, the pro- on the SELECT bus. The output bus from the multiplexer
posed controller is divided into five different modules. communicates with the card. Only in the data bus from
They are card initialization module (CINM), idle mod- the IM, the clock signal remains unconnected to realize
ule (IM), card read module (CRM), card write module the power saving mode of the controller. Therefore as a
(CWM), and decision module (DM). Along with the above whole, the designed multiplexer has 2 bit SELECT bus (S1
modules, a control unit (CU) is there to monitor and and S0), 4 input bus lines (4×4 = 16 lines) and one output
control the activities of each module and the flow of bus line (1 × 4 = 4 lines). SELECT bus connected with the
respective driving signals. The CU operates based on the multiplexer in sequence helps to communicate the indi-
FSM shown in Fig. 6. vidual modules with the card, and CU controls the entire
Each of the modules and CU contains several internal selection process. The module selection activities of the
and external data and control lines. The communication SELECT bus is described in Table 4.
with the external world is done by the controller either BUSY and ACK are the two status signals present in the
using I/O interfacing units or a customized multiplexer. controller and they are connected with CU. The BUSY sig-
The Reset, DTM (Data Transfer Mode), R/W̄ , DATA, nal represents the busy state of the controller and ACK
ACK, and BUSY signals are interfaced with the controller signal acknowledges any assigned work accomplished by
via I/O interfacing unit. The Reset signal, connected with the controller. On completion, the module deactivates the
CU, initiates the data storage or retrieval operation. DTM BUSY signal and activates the ACK signal to intimate the
signal selects the single/multiblock data transfer mode of user that the task has been completed successfully. Fail-
the controller, R/W̄ is used for read/write operation selec- ure to complete any assigned task makes both the BUSY
tion, and a 8-bit bidirectional DATA bus is used for com- and ACK signal de-asserted. The activity of the signals is
munication with external world. Also, other signals like tabulated in Table 5.
Clock, Chip Select (CS), MISO, and MOSI signals are con- The CU also internally communicates with every mod-
nected between SDHC card and the controller through a ule in sequence for efficient data transfer with the card.
(4×1) bi-directional customized multiplexer where each The common signals for all the modules are Reset, CS, and
input line of the multiplexer is a 4 bit width data bus. clock signal. The CS and clock signals are also supplied
¯ MISO,
Each data bus consists of Clock, Chip Select (CS), to the card via multiplexer. Once Reset signal is received
and MOSI signals. These four input buses of the multi- by CU, it issues a START signal to the CINM along with
plexer connect the four modules, say, CINM, IM, CRM, the clock signal. CU also issues START and clock signal
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 8 of 26

Fig. 7 Architecture of the proposed controller

for the other modules when they are about to initiate their After successful initialization of the card, the control
action. In idle state, no clock is received by the modules transfers to IM with the START signal. The IM contin-
and thus they work in power saving mode. Once a module uously monitors the R/W̄ and DTM signal. R/W̄ signal
receives a START signal, it acknowledges so by issuing a specifies whether the next operation will be card read or
READY signal to the CU and starts working. After every card write. The DTM signal will intimate the IM regarding
successful completion of the work, the module intimates the single/multiblock data transfer. Whenever it receives
CU with DONE signal. the R/W̄ signal, it passes both the R/W̄ and DTM value
CU starts working with the card initialization module. to CU and goes to the idle state by sending a DONE
On receiving the Reset signal, CU activates the CINM and signal. Depending on the R/W̄ signal, the CU transfers the
it issues the initialization commands to the card in order control either to CRM or to CWM.
to initialize it in SPI mode. The card responds to every The CU generates different command sequences for
command and on completion of initialization procedure, either of these modules. In card read sequence, the CRM
the module receives the final response from the card. reads the data block from the card along with the CRC
bits and publish it to the I/O interfacing unit. In card write
Table 4 Table for SELECT bus activity
S1 S0 Selected module Table 5 Table for the status signal activity
0 0 CINM BUSY ACK Controller status
0 1 IM 0 0 Time out, controller failed to read/write
1 0 CRM 0 1 Task successfully accomplished
1 1 CWM 1 X Controller is busy
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 9 of 26

sequence, the CWM receives the data block as an input


from the I/O interfacing unit and writes the received block
along with CRC to the card and receives the CRC response
from the card. After read/write operation, CRM/CWM
issues a DONE signal and itself goes to the power sav-
ing mode. Finally after completion of entire data transfer,
controller generates the ACK signal to the external world
through the I/O interfacing unit to intimate that the work
is successfully completed.
Fig. 8 Psudo-code for power on sequence
The DM monitors the Reset signal received from the
I/O interfacing unit during operation. Assertion of Reset
signal means that CU will again reinitialize the process validates the addressing type for the SDHC card (either
by activating the CINM. If the Reset signal remains de- block addressing or byte addressing) by asking to publish
asserted, then the CU will activate the IM and again follow its card capacity status (CCS) bit in operational control
the previous sequence of operation for continuous data register (OCR). The high value of the CCS bit in OCR
transfer. The DM gives its decision to the CU by the DECI- register means that the card is a version 2 SDHC card
SION signal and goes to power saving mode by issuing a supporting block addressing mode, and the low value of
DONE signal. CCS bit refers the version 2 (SDHC) card supporting
byte addressing mode. If the result matches with byte
3.4 Overall system operation addressing mode, the controller then generates the next
The flow of execution and communication between the command to forcefully make the block length to 512 bytes.
individual units of the controller and SDHC card is now On completion of the initialization process CINM will
described. The similar procedure is also applicable for SD issue a DONE signal.
and microSD card-based data storage and retrieval. All Idle module (IM): The idle module works in two modes.
of them actually works in three different phases namely They are polling mode and control transfer mode. IM con-
card initialization phase, card read phase, and card write tinuously polls the R/W̄ signal and DTM signal. Depend-
phase. Previously stated five modules are the astringent of ing upon the status of these two controlling signals, CU
these three phases. The controller communicates with the transfers the control either to CRM or to CWM. On com-
external world through the I/O interfacing unit and the pletion of the operation, the DONE signal is asserted to
customized multiplexer. The controller initiates its oper- CU. On polling mode, the controller is in power saving
ation on reception of Reset, DTM, and R/W̄ signals and state as the card is in the idle state. The controller sends
sends the signals to CU. Then it intimates regarding its a constant high value to the MOSI line, and the card also
status using the BUSY and ACK signal. When the con- responds with a constant high value via MISO line of the
troller is busy in processing some task, it makes the BUSY controller.
signal high until the task is accomplished. After every suc- Card read module (CRM): The SDHC and similar type
cessful completion of a process, the controller informs the of cards support two types of data transfer, one is the
outer world by asserting the ACK signal and de-asserting single block data transfer and another is the multiblock
the BUSY signal. If the controller fails to complete the task data transfer. The left branch of the flow chart in Fig. 15
given, then it de-assert both the BUSY and ACK signal. describes the read operation from the card. The CRM has
Table 5 describes the operation of the two status signals. been designed to read the data blocks from the SDHC
The pseudo-code for the data transmission process to card. The CU issues two different commands for CRM
the card is given in Figs. 8, 9, 10, 11, 12 and 13, and
the FSM of the controller and the architecture are essen-
tially inspired form the activities described in the codes.
The rest of the section describes the operation of different
modules.
Card initialization module (CINM): The CINM initi-
ates the SDHC card in SPI mode. Figure 14 contains the
FSM of the initialization module. The START signal acti-
vates the CINM and it acknowledges the CU with READY
signal. The module first elapses 74 or more clock cycles
for initiating the card in SPI mode. Then, the commands
are generated to complete the initialization process. After
Fig. 9 Psudo-code for initialization (card)
completion of the initialization process, the controller
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 10 of 26

Fig. 10 Psudo-code for idle state

depending upon the DTM signal value. One is for sin-


gle block data transfer and another is for multiblock data Fig. 12 Psudo-code for card read process
transfer. MOSI line of the controller is connected with
the CMD/DI line of the card, and similarly MISO line
of the controller is connected with DAT0/DO line of the
card. After successful transmission of the command via operation. On the way of execution, CU transfers the con-
MOSI line of the controller, CRM receives the command trol to CWM along with the DTM signal to ensure a single
response from the card through MISO and then starts block data write or a multiblock data write. In this imple-
reading the data block(s) from the card along with the mentation, both single and multiblock write operation
CRC bits. The block transfer is preceded by a start block have been taken into consideration. For single block write
token “11111110” along with a block of data, which is operation, the controller generates the command with the
followed by the CRC. starting address and then it starts writing 512 bytes of
The multiple block transfer can be terminated by the data. For multiblock write, the controller writes the data
command CMD12, generated by the controller. block until the CMD12 stop command is being issued.
Card write module (CWM): The CWM has been The CRC bits are appended to each data byte for the entire
designed to write the data block to the SDHC card. The write operation. The card sends back the response pat-
right branch of the flow chart in Fig. 15 describes the write tern in the MISO line of the controller, where “XXX00101”
means the data block is accepted, “XXX01011” means
the data block is rejected due to the CRC error, and
“XXX01111” indicates that the data block is rejected due
to the flash program error (in the pattern, “XXX” means
do not care bits). The multiple block write improves the

Fig. 11 Psudo-code for card write process Fig. 13 Psudo-code for decision process
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 11 of 26

throughput as a single command is generated for bulk of 4 Hardware-specific implementation details


data blocks write procedure. To explore the feasibility of the proposed architecture, the
Decision module(DM): The DM has been designed to FPGA-based controller was implemented with the help of
decide the destination of the controller after completing synthesizable VHDL. This actually reduces the processing
the data transfer operation viz. whether the control will time of the proposed controller than any other high-level
go to the IM or to the CINM. It is required for succes- languages. The target development platform is based on
sive sequence of data transfer. After performing the data Spartan-3E (xc3s500e-4-fg320) FPGA chip. The card has
transfer process, the control comes to the DM. The DM been accessed through a multi-port card reader connected
constantly monitors the Reset signal and depending on with the target FPGA board using 6-pin cable. A SanDisk
the status of the Reset signal, it decides whether the con- 8 GB SDHC card, a Cannon 16 MB SD card, and a 2 GB
trol will go to the IM for performing the next data transfer microSD card, with speed classes 4, 2, and 4, respectively,
operation or it will go to the CINM for initialization of the have been used for testing and verification of the proposed
card. controller.
The ACK signal is finally issued by the controller
through the I/O interfacing unit, to intimate the user 4.1 Application-specific controller initialization
that the work given to the controller has been com- challenges
pleted successfully and it is ready to process next set of From the hardware point of view, traditionally, the per-
operation. sonal computer accesses the flash memory through a
permanent device interface, which implies a fixed point
access of the memory. In this proposed design, the system
comes up with a detachable unit to be connected with the
memory card so that the card can be accessed anywhere.
Now from the designers’ point of view, synchroniza-
tion of clock throughout the design is a big challenge to
the designer as the flash memory card requires 100- to
400-KHz clock frequency for the application specific com-
mand execution during initialization [3] and maximum
25-MHz clock frequency for data transfer [1]; which ulti-
mately points to the decrement of clock frequency and
it implies sacrifice in performance in a single clock input
system. If the clock frequency does not match in initializa-
tion process, then the card uses to poll the command in a
50-ms time gap to complete the initialization process.

4.2 Synthesis details


The design was successfully synthesized using Xilinx ISE
version 14.1 for the Spartan-3E XC3S500C target device,
and then it was compiled and built for implementation.
This process consists of translating, mapping, placement,
and routing of the signals. For the design implementa-
tion process, no partition was specified and the design
was translated and mapped successfully. All signals were
placed and routed successfully as well; all the timing con-
straints were met. Five percent of the total logic slices
on the device was utilized for this implementation. Brief
description of the resource utilization during implemen-
tation is given in Table 6.
The number of logic slices utilized for the proposed
controller is 226 out of 4656 (5%).

5 Experiments and results


In this section, we have tested the working performance
and efficiency of the proposed controller extensively
for different synthetic dataset and practical work loads.
Fig. 14 Flowchart for the initialization of SD/SDHC card
Initially, we have defined the metrics used to evaluate
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 12 of 26

Fig. 15 Flow chart for read and write operation of SD/SDHC card

the performance of the controller. Subsequently, different Time taken per process (TTPP): This is the time taken
synthetic and practical workloads were described as case by the individual process to accomplish the task assigned
studies of the paper. successfully. We have used the conventional units of time
to define the TTPP.
5.1 Performance metrics Input/output operations per seconds (IOPS): IOPS is a
In this paper, we have used following metrices to evalu- measurement process used to characterize the storage
ate the performance of the controller. They are defined as devices like flash storage memory. The IOPS is not defined
follows. independently and it is a combination of three metrics.
Clock cycle taken per process (CCTPP): In order to make Along with IOPS other two metrics, say response time
the performance evaluation of the proposed controller and workload metrics are also defined to characterize the
independent of system clock and other system specific performance of the memory module.
parameters, we have described the performance of the
controller in terms of clock cycles. Here, CCTPP describes
the clock cycle taken by an individual process to suc- 5.2 Simulation details
cessfully complete a process. If we navigate the proposed The proposed architecture was first simulated in the Xil-
design into multiple systems, then CCTPP parameter will inx ISE 14.1 virtual environment for initial verification.
be system independent. In this phase, Fig. 16 gives the waveform representa-
tion of the initialization process completely. Figure 17
Table 6 Resource utilization during implementation represents the timing diagram representation of the sin-
Logic blocks Number of Number of Utilization (%) gle block write, and Fig. 18 gives the timing diagram
logics used logics available representation of the multiblock data write operation.
Logic slices 226 4656 5 The simulation results are only feasible for data write
Slice flip/flops 167 9312 2 operation and the implementation section represents the
entire results for initialization and both the read/write
Slice LUTs 421 9312 6
operation. Table 7 elaborates the different abbreviation
Clock buffer 1 24 4
of the input and output ports used to simulate the
Number of bonded IOBs 15 232 6 design.
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 13 of 26

Fig. 17 Simulation waveform for single block write process


Fig. 16 Simulation waveform for initialization process
with 8-bit data. The proposed controller tests the basic
read/write operation for a Cannon 16 MB SD card using
5.3 Implementation details this data pattern. The same 8-bit data pattern has been
The proposed controller has been tested using various written repeatedly to the SD card for testing the single and
case studies with different type of input data pattern. multiblock data write operation. Later on, the same data
pattern has been retrieved to validate the single as well as
5.3.1 Case studies multiblock data read operation from the same SD card.
Seven different case studies have been performed to verify Case II: pseudo-random number sequences: The second
the effectiveness of the proposed controller for SD, SDHC, case study has been performed for testing the proposed
and microSD cards with varying volume size of the data. controller for SDHC card. A pseudo-random number
For first four cases, synthetically, text, audio, and video (PRN) generator has been designed to generate 8-bit ran-
patterns were generated. Other three datasets were taken dom data pattern continuously to perform the single block
from practical workloads representing block I/O traces as well as the multiblock write operation for a SanDisk
from MSR Cambridge Traces [25], SNIA Iotta Repository 8 GB SDHC card. The previously used Cannon 16 MB
[26], and UMass Trace Repository [27]. SD card and a 2 GB microSD card have also been tested
Case I: pre-declared embedded data pattern: The first for single and multiblock read/write operation using those
case study uses a pre-declared embedded data pattern PRN sequences.
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 14 of 26

Table 7 Abbreviation of the ports in simulation waveform


Abbreviation Full name
miso Master in slave out
rd Read
wr Write
dm_in Data mode selection
reset Reset
din Data in
clk Clock
cs Chip select
mosi Master out slave in
sclk Slave clock
busy Busy signal
ack Acknowledgment
dout Data out

Here, D[13:0] represents the 14-bit 2’s complement


value of the analog input. The maximum sample rate for
the ADC is approximately 1.5 MHz [20]. The 14-bit digital
output from the ADC has been compressed to 8-bit digital
data pattern, so that the proposed 8-bit controller can eas-
ily read the data byte for storing into the SDHC memory
card. On later phase, the retrieval of the stored data from
the same SDHC card indicates the memory read operation
for the analog signal.
Case IV: video signal transmission: The fourth case study
is performed with video signal storage and retrieval. The
on-board DB15 VGA connector port [20] has been used
to display the video frames in a CRT monitor. The VGA
signal timing is specified, published, copyrighted, and sold
by the Video Electronics Standards Association (VESA)
[20]. Thus in this paper, the detail specifications of the
processing video signals are not mentioned in depth.
Fig. 18 Simulation waveform for multiblock write process The frames of the video signal have been generated and
then decoded in the binary-numbered matrix. Then, the
Case III: audio signal transmission: The third case study data from the generated matrix have been stored in the
has been performed for testing the efficiency of the pro- card for testing the card write operation. Here, the multi-
posed controller with continuous data transfer. An analog block data write operation has been performed as the
signal in audio range is fed to an on-board LTC 1407A volume of data is too high to perform the single block
analog-to-digital converter (ADC) [20]. The digital data write operation. On later phase, the previously written
at the output of the converter is fed to the controller for data have been retrieved from the same card for validat-
data storage and retrieval. The LTC6912-1 available in ing the multiblock card read operation process. The FPGA
the on-board ADC provides two independent inverting board was connected with a CRT monitor through a VGA
amplifiers with programmable gain to maximize the con- cable for displaying the retrieved result (i.e., the video sig-
version range of the ADC to 1.65 ± 1.25 V. The gains nal) from the card. The CRT monitor, used in this case
for both the channels are independently programmable study, has the horizontal frequency of 90 kHz and the
using a 3-wire SPI interface to select voltage gains among screen resolution of 1024 × 768 pixels with color quality
“0,” “−1,” “−2,” “−5,” “−10,” “−20,” “−50,” and “−100” of 16 bits. So, the refresh rate of the monitor is 72.
V/V (LTC6912-1) [20]. The analog-to-digital conversion Case V: dataset from MSR Cambridge Traces: The
formula is given below in Eq. 1: Microsoft Research (MSR) Cambridge traces [25] are
a commonly used data repository, built by Microsoft
D[ 13 : 0] = GAIN × [(VIN − 1.65 V) /1.25 V] × 8192 (1) research group. This repository was formed to advance
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 15 of 26

the state of the art computing and also to solve the prac- data transfer procedure. The SanDisk 8 GB SDHC card
tical world problems through technological innovation. has been used as the flash storage in this case study.
This is a collaboration of the academics with the gov- Case VI: dataset from UMass Trace Repository: The
ernment and industry researchers. There are 35 different UMass Trace Repository [27] is a commonly used data
traces available for this MSR Cambridge Traces and they repository. It provides storage, network, and other traces
represent 1-week block I/O traces of enterprise servers at for analysis to the research community. This work is sup-
Microsoft Research Cambridge. The characteristics of the ported by the National Science Foundation. This reposi-
traces are given along with file names, their attributes, file tory contains different traces namely CPU and memory,
size, etc. in Table 8. We have used those datasets to check network, Storage, weather, power, smart, and multime-
the performance of the proposed controller in multiblock dia traces under two different categories, namely Financial
and WebSearch. The characteristics of the traces are given
in Table 9. We have used those datasets to verify the multi-
Table 8 Characteristics table for MSR Cambridge Repository [25]
block data transfer process of the proposed controller and
Repository Abbre- File name File size Attributes
the SanDisk 8 GB SDHC card has been used for this
name -viation in KB
purpose.
MSR M11 CAM-02-SRV-lvm0 227,577 Timestamps, Case VII: dataset from SNIA Iotta Repository Historical
Cambridge M12 CAM-02-SRV-lvm1 1,576,883 Hostname, Section: Storage Networking Industry Association (SNIA)
1 M13 CAM-02-SRV-lvm2 117,104 Disk Iotta Repository [26] is a commonly used repository,
M14 CAM-02-SRV-lvm3 1,274,935 Number, used to store, manage, and distribute different traces or
M15 CAM-02-SRV-lvm4 1,274,935 Type, datasets for storage. The historical section of this reposi-
M16 CAMRESHMSA01-lvm0 197,530 Offset, tory includes all the traces which are older than 10 years.
M17 CAMRESHMSA01-lvm1 29,069 Size, The historical section contains five different traces namely
M18 CAMRESISAA02-lvm0 643,399 Response Block I/O Traces, Network File System Traces, Parallel
M19 CAMRESISAA02-lvm1 8,832,021 Time
Traces, Static Snapshots, and System Call Traces. Each of
these traces are further divided into multiple sub-traces.
M110 CAMRESWMSA03-lvm0 62,457
We have used some sub-traces from those available traces
M111 CAMRESWMSA03-lvm1 86,267
to verify the multiblock data transfer process of the pro-
M112 CAM-USP-01-lvm0 284,762
posed controller. The characteristics of the traces are
MSR M21 CAM-01-SRV-lvm0 115,703 Timestamp, given in Table 10. The SanDisk 8 GB SDHC card has been
Cambridge M22 CAM-01-SRV-lvm1 2,393,907 Hostname, used as the storage device in this case study.
2 M23 CAM-01-SRV-lvm2 560,593 Disk Different flash transition layer settings: The function-
M24 CAMRESIRA01-lvm0 75,897 Number, ality and efficiency of the proposed controller have also
M25 CAMRESIRA01-lvm1 739 Type,
been tested using different volume of data. We have tested
the read-write operations of the controller for different
M26 CAMRESIRA01-lvm2 10,894 Offset,
volume likely 4, 8, 10, and 512 MB and 1 GB.
M27 CAMRESSDPA01-lvm0 2,028,692 Size,
M28 CAMRESSDPA01-lvm1 2,476,675 Response 5.3.2 Results
M29 CAMRESSDPA01-lvm2 98,864 Time The implemented architecture of the proposed controller
M210 CAMRESSDPA03-lvm0 80,578
works in three different modes; namely card initialization
mode, card read mode, and card write mode. After card
M211 CAMRESSDPA03-lvm1 34,773
initialization, based on the external commands, controller
M212 CAMRESSDPA03-lvm2 61,550
communicates with the card either for read or for write
M213 CAMRESSTGA01-lvm0 103,206
operation.
M214 CAMRESSTGA01-lvm1 113,632
M215 CAMRESTSA01-lvm0 91,123 Table 9 Characteristics table for UMass Repository [27]
M216 CAMRESWEBA03-lvm0 105,051 Repository File name File size Attributes
name in KB
M217 CAMRESWEBA03-lvm1 8,308
UMass Financial1 151165 ASU,
M218 CAMRESWEBA03-lvm2 299,650
M219 CAMRESWEBA03-lvm3 1611 Financial2 102873 LBA,

M220 CAMWEBDEV-lvm0 58,850 WebSearch1 30744 Size,


M221 CAMWEBDEV-lvm1 55 WebSearch2 135948 Opcode,
M222 CAMWEBDEV-lvm2 9369 WebSearch3 127520 Timestamps,
M223 CAMWEBDEV-lvm3 35 Optical fields
KB kilobytes ASU application-specific unit, LBA logical block address
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 16 of 26

Table 10 Characteristics table for SNIA Historical Section all the cards and initialization section for the SDHC card
Repository [26] only.
Trace name Sub-traces Number The results of the proposed controller have been
of files
observed and verified in chronological order as per prob-
Block I/O Traces Cello 1999 12 lem description stated above.
Cello 1996 12 Case I: result: pre-declared embedded data pattern: The
Cello 1992 12 first case study represents the complete three-state access
HP LAJW 12 of a cannon 16 MB SD card. The steps in different phases
of data transfer are described below:
Cello 1991 12
Initialization - Figures 19 and 20 show the initializa-
NFS Traces Animation dataset 140 tion state output mapped in the on-board LEDs. The
Harvard SOS Traces 7 driving signal for the initialization module is the Reset sig-
Parallel Traces Sprite Traces 1
nal. Figure 19 shows the output response pattern of the
SD card after execution of the first command (CMD0).
Static Snapshots Multimedia file sizes 1 Figure 20 shows the output pattern after completion of the
Microsoft Longitudinal Study 1 entire initialization step, when the card is in idle mode.
Microsoft 1998 Static Study 1 Card write operation - The pre-defined 8-bit embedded
data pattern has been written to the SD card. Figure 21
System Call Traces LASR Traces 13
shows the CRC status response (viz. “00000101”) coming
Seer Traces (ASCII) 1
from SD card to the output port. The card responds to
Seer Traces 1 every successful write of an entire block. If the operation
CMU DFS Traces 14 performed is a single block write, then the card gives the
response once after the completion of the write operation
of the entire block. But if it is a multiblock write operation,
then the card responds after every block write comple-
tion, until the end block command (CMD12) has been
To validate the read/write operation two types of test issued.
unit has been considered. They are described as follows: Card read operation - Here, we consider an arbitrary
Test unit: on-board output unit (LED): The Spartan 3E data pattern, say e.g., “11010011” which has been read
target FPGA board contains 8-bit output unit with 8 on- from the SD card. Figure 22 shows that data pattern on
board light emitting diodes (LEDs) [20]. In the testing the LEDs. Both the single block and the multiblock write
setup of the experiment, the MISO signal of the controller and read operations have been verified with the data pat-
has been mapped with a single LED of the on-board 8- tern. This 8-bit data pattern has been written repeatedly to
bit LED unit and the resultant command responses and form the entire block (of 512 bytes) to perform the block
read/write data patterns were observed. Since the SPI write operation.
mode is a serial bus interface mode, the SD card gives Case II: result: pseudo-random number sequences: The
response serially through its output port to the mas- second case study represents the complete 3-state access
ter. The MISO line signal is left shifted in every clock for a SanDisk 8 GB class 4 SDHC card. Both the input
cycle to observe a series of patterns in 8-bit LED unit and output ports are connected with the DSO. The study
available in the FPGA board. Additionally, in the output shows the efficiency of the proposed controller for high
section, a clock down converter unit has been designed capacity SD card. Later on, the 16 MB Cannon SD card
and integrated to slow down the speed of response of and the 2 GB microSD card have also been verified for the
the controller for better perception. This is a little com- read/write process.
promise with the speed of operation in testing and ver-
ification section. However, introduction of this section
is purely optional and while integration of the controller
with real-time high speed system this unit is to be omitted.
Test unit: DSO: A digital storage oscilloscope (DSO) has
been connected with the input (MOSI) and the output
(MISO) port of the controller. Both the output and input
data patterns have been observed in DSO for further ver-
ification. The clock down converter unit has also been
integrated here to slow down the response speed of the
Fig. 19 CMD0 response pattern on the on-board LEDs for SD card
card. But, it is introduced only for the card read section for
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 17 of 26

Fig. 20 Initialization process completion response on the on-board


LEDs for SD card

Fig. 22 Data pattern retrieved in read operation

SDHC card - The results obtained for the 4 GB SanDisk


SDHC card are described in this section.
Initialization - Figures 23, 24, 25, 26 and 27 show the transfer procedure, the multiblock write operation, shown
complete initialization process of the SDHC card. The ini- in Fig. 33, has been performed for the PRN sequences.
tialization command and corresponding response pattern Later on, the data pattern has been retrieved from the
of the card have been recorded using DSO. In idle state, SD card by performing both the single block and multi-
the proposed controller gives a logic high signal through block read operation. Figure 34 shows the single block
the MOSI line and it also receives a logic high signal from read operation and Fig. 35 describes the multiblock read
the SDHC card. operation.
Card write operation - Figures 28 and 29 show the input
and output pattern for write operation. The input data
have been generated by a PRN generator. Both the sin- MicroSD card - The results obtained for the 2 GB
gle block and the multiblock data transfer operation have microSD card are described in this section. The opera-
been performed. Figure 28 shows the input and output tion method and response patterns are nearly the same
sequence of a single block write operation whereas Fig. 29 to the SD card. The microSD card initialization requires
shows the multiblock write operation. 100- to 400-KHz clock frequency. The clock divider
Card read operation - Figures 30 and 31 show the read module has not been integrated during initialization
operation of the card. The previously written data pat- and block write phases. It has been used only for the
tern, generated by PRN generator, has been read to verify read operation to make the validation process realizable.
the complete operational performance. Figure 30 shows The multiblock read/write operation has been performed
the single block read operation and Fig. 31 shows the only to validate the data transfer operation. Figure 36
multiblock read operation. describes the complete initialization process. Figure 37
describes the multiblock write process and Fig. 38
indicates the multiblock read operation for the 2 GB
SD card - The results obtained for the 16 MB Can- microSD card.
non SD card are described in this section. Figure 32
describes the complete initialization process. The clock
divider section has not been introduced in this section.
So the process operates in full clock as provided for
the initialization of the SD card. To validate the data

Fig. 21 CRC response pattern for write operation Fig. 23 CMD0: input sequence and response pattern
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 18 of 26

Fig. 24 CMD8: input sequence and response pattern Fig. 27 Idle mode input and response sequence

Fig. 25 CMD55: input sequence and response pattern Fig. 28 Input data with response pattern for single block write
operation on DSO

Fig. 26 ACMD41: input sequence and response pattern with single Fig. 29 Input data with response pattern for multiblock write
block read sequence operation on DSO
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 19 of 26

Fig. 30 Single block read mode input command and data response Fig. 33 Input data with response pattern for multiblock write
from SDHC operation to SD card

Fig. 31 Multiblock read mode input command and data response Fig. 34 Single block read mode input command and data response
from SDHC from SD

Fig. 35 Multiblock read mode input command and data response


Fig. 32 Initialization: input sequences and response patterns from SD
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 20 of 26

Fig. 36 Initialization: input sequences and response patterns for Fig. 38 Multiblock read mode input command and data response
microSD card from microSD

The third and the forth case study were to test and ver- that phase different commands, responses and data pat-
ify the efficiency of the proposed controller for real-time terns were recorded using DSO. Later on, the same data
audio and video signal storage and retrieval. pattern has been retrieved from the SDHC card in read
Case III: result - audio signal transmission: In this study mode and they were also recorded. We got the stored data
an analog signal is fed to the system in audio range. The back along with the CRC bits and on DSO those CRC pat-
received signal has been converted and compressed into terns were present with the original data. For simplicity,
8-bit digital data pattern and fed it to the input data bus the result has been given in the form of bit pattern and the
of the controller for the verification of multiblock data parts of the stored data have been encircled to show the
write procedure in the SDHC card. The data stored in this same portion of the retrieved data.
process is again accessed via multiblock card read proce- Case IV: result - video signal transmission: The video sig-
dure of the controller. Both the input audio data and the nal has been stored and retrieved in this fourth case study.
data accessed from the SDHC card were fed to the DSO. A video signal of 20 frames/second was fed to the system
Figure 39 shows the input data pattern and the read oper- and subsequently stored into the card. The SanDisk 8 GB
ation from the SDHC card. The read and write operations class 4 SDHC card has been used to verify the video signal
are two different operations which cannot be performed transmission process. The card performs in 4 MB/s writ-
at same time instance. Therefore, initially the analog signal ing speed to store and retrieve the video signal. The entire
was converted and stored into the SDHC card and during communication was governed under the control of the
proposed controller. To ensure the successful data stor-
age and retrieval, later those frames were retrieved from

Fig. 37 Input data with response pattern for multiblock write


operation to microSD card Fig. 39 Audio signal processing: data stored and retrieval pattern
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 21 of 26

the card and they were displayed in CRT monitor. The 6 Performance comparison
retrieved frames were compared with the original frames This section describes the performance comparison of
which were stored in the card. Figure 40 shows a snapshot the controller with reported results. The logic utilization
of such video data storage and retrieval. The right hand for the proposed controller is only 5% of the total logic
screen in Fig. 40 shows the actual frame that was written present in Spartan 3E FPGA board. This is evident from
in the card and the left hand side monitor shows the frame Table 6.
retrieved from the card. Here the left hand side monitor, The proposed controller has been tested for a single
the implemented controller along with the card work as a block as well as for multiblock (5000 block) data read
standalone unit. and write operation for the SD, SDHC, and microSD
Case V: result - dataset from MSR Cambridge Traces: cards. Table 11 shows a summary of clock cycle elapsed
The datasets were collected from MSR Cambridge traces by SDHC card, and Table 12 shows a summary of clock
[24, 25], as explained in “Case studies” section where the cycle elapsed by the microSD card during initialization,
description of the datasets is illustrated. In our exper- single and multiblock reads, and single and multiblock
imental part, each file of three traces is considered as write operation. In the initialization phase, the proposed
separate dataset. The datasets were divided into multiple system utilizes full bandwidth supported by SD card tech-
blocks so that we can accomplish multiblock read-write nology. Initially the SDHC card receives CMD0, CMD8,
with respect to flash memory. In the performance com- CMD55, ACMD41, CMD58, and CMD16 commands and
parison section, the metrics were computed for multi- the controller receives corresponding responses for each
block data read write with respect to different I/O traces. of the commands. (CMD55 and ACMD41) pair of com-
Results were tabulated for the datasets collected from the mands may be required to send “n” times until the card
repositories. generates the response pattern x“00”. Therefore the total
Case VI: result - dataset from UMass Trace Repository: number of clock cycles elapsed in initialization phase by
The datasets were collected from UMass Trace Reposi- SDHC card is 112(2 + n). However in Table 11 we have
tory [24, 27], as explained in “Case studies” section. In considered the ideal case of n = 1, and the initializa-
our experimental part, those datasets have been divided tion process takes the minimum of 336 clock cycles for
into multiple blocks and the data has been stored in San- SanDisk 8 GB class 4 SDHC card in SPI bus mode. Sim-
Disk 8 GB class 4 SDHC card to accomplish multiblock ilarly, SD card initialization process requires 56(3 + 2n)
read-write. In the performance comparison section, the clock cycles and microSD card requires 56(1 + 2n) clock
metrics were computed for multiblock data read write cycles for initialization. The difference is due to the fact
with respect to different I/O traces. Results were tabulated that CMD58 command is not necessary for the initializa-
for the dataset collected from the repositories. tion phase of SD card and CMD58, CMD16 commands
Case VII: result - dataset from SNIA Iotta Reposi- are not required for the microSD card initialization. Only
tory Historical Section: The datasets were collected from CMD0 and (CMD55+ACMD41) in place of CMD1 initi-
the historical section SNIA Iotta Repository [24, 26], as ates the microSD card in SPI mode. In the ideal case with
explained in “Case studies” section. In our experimental n = 1, the initialization process takes the minimum of
part, those datasets have been divided into multiple blocks 280 clock cycles for cannon 16 MB class 2 SD card and
to accomplish the read write with respect to flash mem- 168 clock cycles for microSD card in SPI bus mode. The
ory. Results were tabulated for the dataset collected from HCS bit should be high in ACMD41 command for initial-
the repositories. ization of the SDHC card, whereas for SD card and the
microSD card with capacity ≤ 2 GB requires HCS bit to
be de-asserted at the time of initialization.
Tables 13 and 14 show a comparative study of the speed
of response for SD card. Also in Table 13, the initializa-
tion time for SDHC card and microSD card has been

Table 11 Clock cycles achieved for SDHC


Process CCTPP
Initialization 336
Single block read 4176
Single block write 4184
Multiblock (5000) read 20600056
Fig. 40 Snapshot of video processing Multiblock (5000) write 20640056
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 22 of 26

Table 12 Clock cycles achieved for microSD card Table 14 Speed comparison in read/write phase
Process CCTPP CCTPP
Initialization 168 Process SD card Elkeelany % Reduction
achieved et al. [1]
Single block read 4176
Single block write 4184 Single Block Read 4176 12025 65

Multiblock (5000) read 20600056 Multiblock Read (5000 Block) 20600056 26275000 21.59

Multiblock (5000) write 20640056 Single Block Write 4184 7472 44


Multiblock Write (5000 Block) 20640056 21250000 2.87

reported. Here the comparison is only done with respect


to SD card initialization time available in the literature.
Also from Table 14, we find the percentage increase in in required clock cycle for multiblock write. The speed
speed of response in read-write phase for SD card with the of SD card data access in terms of clock cycles for single
proposed controller. block read/write is increasing to 65 and 44%, respectively,
Table 15 shows the software speed-up for the proposed and for multiblock read write it is increasing to 21.59 and
controller. The SanDisk 8 GB class 4 SDHC card, Cannon 2.87%, respectively. The comparison chart of the software
16 MB class 2 SD card, and the class 4 microSD card have speed-up of VHDL with respect to the high-level language
been used for multiblock data transfer using the proposed is shown in Figs. 46, 47 and 48 for SDHC, SD and microSD
controller, designed in VHDL. The similar data transfer card, respectively.
setup has been implemented based on two modern oper- The performance of the proposed controller has also
ating systems (Windows 7 and Windows XP (Service Pack been tested for three different commonly used repos-
3)) to validate the speed up of the proposed controller itoties. These three repositories are MSR Cambridge
with respect to existing software based approaches. The Traces [25], UMass Trace Repository [27], and SNIA Iotta
modern operating systems are mostly written in JAVA Repository [26]. Tables 8, 9 and 10 represent the char-
or a high-level languages which ultimately makes the acteristics of MSR Cambridge Traces [25], UMass Trace
system’s in-build controller slower than the proposed con- Repository [27], and SNIA Iotta Repository [26], respec-
troller, optimized using VHDL. The computer used for tively. The files in each repositories, the size of the file,
this purpose contains Intel second-generation dual core and the attributes of the files are given in the tables.
processor, 2 GB DDR3 RAM as the basic specifications. Tables 16, 17 and 18 show the performance of the con-
Figure 41 shows the comparison chart of the initializa- troller for both read and write process with respect to
tion process performed in the SD card. The work has been repository datasets and workload, average CCTPP(read),
compared with the closest work reported in Elkeelany average CCTPP (write), and IOPS metrics. All the datasets
et al.’s paper [1]. The chart in Fig. 41 clearly shows the of MSR cambridge and UMass repositories have been
reduction in the initialization time. The proposed process used and only Harvard SOS subtraces of NFS trace of
introduces 65.56% improvement in the initialization time. SNIA Iotta repository has been used for experimental
Figure 42 describes the comparison in clock cycle required section.
for single block data transfer (single block read/write)
and Fig. 43 shows the corresponding improvements in
percentage. The proposed architecture introduces 65% Table 15 Software speedup in multiblock data transfer phase
improvement of clock cycle for single block read and TTPP
44% improvement of clock cycle for single block write Card used Process VHDL achieved Windows 7 Windows XP
operation. On the other hand, Fig. 44 explains the com-
SDHC Multiblock read 60 μs 2.01 s 2.70 s
parison in multiblock data transfer and Fig. 45 shows (5000 blocks)
corresponding improvement in percentage. Here, the pro-
Multiblock write 100 μs 10.26 s 11.10 s
posed system introduces 21.59% improvement in required (5000 blocks)
clock cycle for multiblock read and 2.87% improvement
SD Multiblock read 57 μs 1.98 s 2.46 s
(5000 blocks)
Table 13 Speed comparison in initialization phase Multiblock write 93 μs 9.89 s 10.71 s
(5000 blocks)
TTPP
MicroSD Multiblock read 48 μs 1.53 s 1.99 s
Process SD card O. Elkeelany SDHC MicroSD (5000 blocks)
achieved et al. [1] achieved achieved
Multiblock write 75 μs 9.12 s 10.02 s
Initialization 22 ms 63.88 ms 27 ms 20 ms (5000 blocks)
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 23 of 26

Fig. 41 Comparison chart for the initialization process of the SD card Fig. 45 Improvement chart for multiblock data transfer of SD card

Fig. 42 Comparison chart for the single block data transfer of SD card
Fig. 46 Speed up comparison chart of SDHC card

Fig. 43 Improvement chart for single block data transfer of SD card Fig. 47 Speed up comparison chart of SD card

Fig. 44 Comparison chart for multiblock data transfer of SD card Fig. 48 Speed up comparison chart of microSD card
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 24 of 26

Table 16 Performance table for MSR Cambridge Repository [25]


Repository name File name Workload in Average response Average response IOPS
abbreviation blocks of data CCTPP (Read) CCTPP (Write) (operations/second)
MSR M11 144,741 87,980 59,568 957
Cambridge M12 453,996 43,481 228,544 957
1 M13 566,045 33,408 218,313 958
M14 929,401 79,007 118,519 958
M15 103,129 33,438 117,673 957
M16 233,319 108,512 51,613 958
M17 386,921 153,010 199,474 956
M18 59,242 149,141 20,467 957
M19 246,939 101,917 105,340 957
M110 123,427 36,811 52,927 957
M111 85,412 33,663 48,690 957
M112 127,229 165,980 50,863 957

MSR M21 189,660 154,168 72,762 957


Cambridge M22 229,310 162,438 60,576 957
2 M23 283,342 42,193 133,960 958
M24 196,382 172,309 81,244 957
M25 277,067 106,205 115,738 960
M26 83,360 33,408 35,466 958
M27 136,364 33,419 122,567 958
M28 570,197 317,922 107,067 956
M29 1,157,651 44,544 53,952 956
M210 117,600 34,965 51,319 958
M211 164,072 33,709 107,546 957
M212 148,760 33,408 95,238 958
M213 192,269 88,610 72,740 958
M214 154,498 48,024 64,521 957
M215 272,685 125,598 66,648 957
M216 145,708 119,592 59,161 954
M217 190,880 81,501 78,696 957
M218 1,227,759 84,665 36,812 957
M219 811,852 889,535 166,042 957
M220 215,659 112,999 78,539 957
M221 10,816 35,696 42,812 957
M222 162,960 32,609 68,052 957
M223 10,596 33,527 44,698 957

Table 17 Performance table for UMass Repository [27]


Repository name File name Workload in bytes Average response Average response IOPS
of data CCTPP (Read) CCTPP (Write) (operations/second)
UMass Financial1 107,967 50,889 41,093 957
Financial2 57,254 20,371 41,542 957
WebSearch1 306,500 128,006 66,816 957
WebSearch2 303,744 126,855 66,816 956
WebSearch3 289,776 121,032 66,816 957
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 25 of 26

Table 18 Performance table for SNIA Historical Section [26]


Trace name Sub-trace Workload in number Response CCTPP (Read) Response CCTPP (Write) IOPS (operations/second)
NFS Harvard 12357 38093 16343 956
Trace SOS Traces
(Deasna Week1)

We have also tested the efficiency, performance and further modification can be incorporated by changing the
handling capacity of this proposed controller in different proposed architecture as well as the design can be imple-
FTL settings where volume of data is varied from 4 MB, mented in other high-end target platform with a very
8 MB, 10 MB, 512 MB to 1 GB. Table 19 shows the per- minor modification in configuration procedure.
formance in terms of CCTPP for the above mentioned
Authors’ contributions
volume of datasets.
The work has been carried out as the M.Tech. final year project of the first
author, SB, under the supervision of the second author, SM. The selection and
7 Conclusions setup of the project had been carried out by both the authors together. The
structuration and coding part was carried out by SB and the testing and
An on-chip design and implementation of a controller
debugging part was done by both the authors. This manuscript had been
has been proposed for SDHC and similar family of cards. prepared and checked by both of the authors together. All authors read and
The design has also been implemented for the microSD approved the final manuscript.
card. In addition to that, the same controller can be used
Competing interests
for data communication with MMC also. The FSM-based The authors declare that they have no competing interests.
architecture design, its operation, FPGA-based imple-
mentation, control flow and execution, synthesis results, Received: 28 May 2016 Accepted: 31 October 2016

and all other implementation related issues were dis-


cussed in details. Results were tabulated for different References
problems specified above, and it is seen that the efficiency 1. O Elkeelany, VS Todakar, Data archival to SD card via hardware description
is increasing in the proposed design. The speed of SD language. IEEE Embed. Syst. Lett. 3(4), 105–108 (2011)
2. SD Card Association and Technical Committee and Specifications, SD, et
card data access in terms of clock cycles for single block al., Part 1, physical layer, simplified specification, Version 2.00. (SD Card
read/write is increasing to 65 and 44% respectively, and for Association, San Ramon, 2006). https://www.sdcard.org/downloads/pls/
multiblock read write it is increasing to 21.59 and 2.87% simplified_specs/archive/part1_200.pdf
3. Part, SD Specifications, Physical Layer Simplified Specification Version
respectively compared to the closest reported work [1]. 2.00. 31, 1 (2010). http://www.sdcard.org/developers/tech/sdcard/pls/
Future work will involve the extension of the pro- 4. A Lakshman, P Malik, Cassandra: a decentralized structured storage
posed controller in more wide sense. This paper presently system. ACM SIGOPS Operating Systems Review. 44(2), 35–40 (2010)
5. C Border, in ACM SIGCSE Bulletin. The development and deployment of a
focuses on SPI mode-based data communication for SD, multi-user, remote access virtualization system for networking, security,
SDHC, and microSD cards. However, this approach can and system administration classes, vol. 39 (ACM, 2007), pp. 576–580
be extended to other mode of data transfer supported by 6. H Müller, N Michoux, D Bandon, A Geissbuhler, A review of content-based
image retrieval systems in medical applications–clinical benefits and
similar family of cards. Regarding increase of speed in data future directions. Int. J. Med. Inform. 73(1), 1–23 (2004)
transfer between external system and the storage device, 7. S-H Liao, Expert system methodologies and applications—a decade
review from 1995 to 2004. Expert Syst. Appl. 28(1), 93–103 (2005)
8. J Tai, D Liu, Z Yang, X Zhu, J Lo, N Mi, Improving flash resource utilization
Table 19 Performance in different FTL settings at minimal management cost in virtualized flash based storage systems.
IEEE Trans. Cloud Comput. 99, 1–14 (2015)
Data volume Process CCTPP 9. C Wang, Q Wang, K Ren, N Cao, W Lou, Toward secure and dependable
4 MB Write block 33,472 storage services in cloud computing. IEEE Trans. Serv. Comput. 5(2),
220–232 (2012)
Read block 33,408 10. H Nie, Q Xie, Y Zhang, M Li, Q Liu, H Liu, J Zhang, B Li, in Intelligent System
Design and Engineering Application (ISDEA), 2012 Second International
8 MB Write block 66,944 Conference On. Research on solid state storage based remote sensing
Read block 66,816 data storage (IEEE, 2012), pp. 1294–1297
11. H Chen, TN Cong, W Yang, C Tan, Y Li, Y Ding, Progress in electrical energy
10 MB Write block 83,680 storage system: a critical review. Prog. Nat. Sci. 19(3), 291–312 (2009)
12. S Cohn, M Ross, Methods, systems, and devices for wireless delivery, storage,
Read block 83,520 and playback of multimedia content on mobile devices. (Google Patents,
2001). US Patent App. 10/040,617
512 MB Write block 4,284,416
13. H Qi, A Gani, in Digital Information and Communication Technology and It’s
Read block 4,276,224 Applications (DICTAP), 2012 Second International Conference On. Research
on mobile cloud computing: Review, trend and perspectives (IEEE, 2012),
1 GB Write block 8,368,000 pp. 195–202
14. Y Yang, Y Yang, L Niu, in 2012 Second International Conference on
Read block 8,352,000
Instrumentation, Measurement, Computer, Communication and Control.
Banerjee and Mukhopadhyay EURASIP Journal on Embedded Systems (2016) 2016:24 Page 26 of 26

Design of sdhc card video player based on sopc (IEEE Computer Society,
2012), pp. 900–904. doi:10.1109/IMCCC.2012.216
15. M Abdallah, O Elkeelany, in Computing, Engineering and Information, 2009.
ICC’09. International Conference On. Simultaneous multi-channel data
acquisition and storing system (IEEE, 2009), pp. 233–236.
doi:10.1109/ICC.2009.17
16. C-S Lin, K-Y Chen, Y-H Wang, L-R Dung, in 2006 13th IEEE International
Conference on Electronics, Circuits and Systems. A nand flash memory
controller for sd/mmc flash memory card (IEEE, 2006), pp. 1284–1287.
doi:10.1109/TMAG.2006.888520
17. O Elkeelany, G Vince, in 2007 Thirty-Ninth Southeastern Symposium on
System Theory. Portable analog data capture using custom processing
(IEEE, 2007), pp. 120–123
18. C Li, Q Wang, L Wang, in Computer and Information Technology
Workshops, 2008. CIT Workshops 2008. IEEE 8th International Conference On.
A high efficient flash storage system for two-way cable modem (IEEE,
2008), pp. 551–556. doi:10.1109/CIT.2008.Workshops.30
19. C-Y Lu, H Kuan, Nonvolatile semiconductor memory revolutionizing
information storage. IEEE Nanotechnol. Mag. 3(4), 4–9 (2009).
doi:10.1109/MNANO.2009.934861
20. S Xilinx, 3E starter kit board user guide. UG230 (v1. 0) March. 9 (2006).
http://www.xilinx.com/support/documentation/boards_and_kits/ug230.
pdf
21. SanDisk, Secure Digital Card Product Manual. 1.9(80-13-00169) (2003)
22. Instruments, Texas, Msp430x1xx family user’s guide. (SLAU049B, 2006)
23. Y Deng, J Zhou, Architectures and optimization methods of flash memory
based storage systems. J. Syst. Archit. 57(2), 214–227 (2011)
24. D Narayanan, A Donnelly, A Rowstron, Write off- loading: practical power
management for enterprise storage. ACM Trans. Storage. 4(3), 10–11023
(2008)
25. Storage Networking Industry Association and others, MSR Cambridge
Traces (2010). http://iotta.snia.org/traces/388
26. Storage Networking Industry Association, et al, SNIA Iotta Repository.
Microsoft Enterprise Traces, Colorado Springs, Colorado (iotta. snia.
org/traces/130) (2011). http://iotta.snia.org/historical_section
27. Application, OLTP, I/O and search engine I/O. umass trace repository
(2007). http://traces.cs.umass.edu/index.php/Storage/Storage
28. S Chen, What types of ECC should be used on flash memory. Application
Note for SPANSION (2007). http://www.spansion.com/support/
application%20notes/types_of_ecc_used_on_flash_an.pdf
29. J No, Nand flash memory-based hybrid file system for high I/O
performance. J. Parallel Distrib. Comput. 72(12), 1680–1695 (2012)
30. R Wang, Z Mi, H Yu, W Yuan, The design of image processing system
based on SOPC and ov7670. Procedia Eng. 24, 237–241 (2011)
31. M Fabiano, M Indaco, S Di Carlo, P Prinetto, Design and optimization of
adaptable BCH codecs for nand flash memories. Microprocess. Microsyst.
37(4), 407–419 (2013)
32. M Baklouti, P Marquet, J Dekeyser, M Abid, FPGA-based many-core
system-on-chip design. Microprocessors and Microsystems. 39(4),
302–312 (2015)
33. F Thomas, M Nayak, S Udupa, J Kishore, V Agrawal, A hardware/software
codesign for improved data acquisition in a processor based embedded
system. Microprocess. Microsyst. 24(3), 129–134 (2000)
34. F Chen, DA Koufaty, X Zhang, in Proceedings of the International
Conference on Supercomputing. Hystor: Making the best use of solid state
drives in high performance storage systems (ACM, Tucson, Arizona, 2011),
pp. 22–32
35. J Guerra, H Pucha, JS Glider, W Belluomini, R Rangaswami, in FAST. Cost
Submit your manuscript to a
effective storage using extent based dynamic tiering, vol. 11, (2011), journal and benefit from:
pp. 20–20
36. Technical Committee SD Card Association, et al., Speed Class Greater 7 Convenient online submission
Performance Choices, Online available and accessed. Onlineathttps:// 7 Rigorous peer review
www.sdcard.org/developers/overview/speed_class/ 7 Immediate publication on acceptance
37. Editor - Metering, Minsen - your ideal supplier of wireless 7 Open access: articles freely available online
water/gas/electricity meters (2013). https://www.metering.com/minsen-
your-ideal-supplier-of-wireless-water-gaselectricity-meters/ 7 High visibility within the field
7 Retaining the copyright to your article

Submit your next manuscript at 7 springeropen.com

You might also like