Arm Cortex M Book 2019 PDF Oct19

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

System-on-Chip

Design
with Arm® Cortex®-M Processors

Reference Book
JOSEPH YIU
System-on-Chip
Design
with Arm® Cortex®-M Processors
System-on-Chip
Design
with Arm® Cortex®-M Processors

Reference Book
JOSEPH YIU
Arm Education Media is an imprint of Arm Limited, 110 Fulbourn Road, Cambridge, CBI 9NJ, UK

Copyright © 2019 Arm Limited (or its affiliates). All rights reserved.

No part of this publication may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopying, recording or any other information storage and retrieval
system, without permission in writing from the publisher, except under the following conditions:

Permissions
„„You may download this book in PDF format from the Arm.com website for personal, non-
commercial use only.

„„You may reprint or republish portions of the text for non-commercial, educational or research
purposes but only if there is an attribution to Arm Education.

This book and the individual contributions contained in it are protected under copyright by the
Publisher (other than as may be noted herein).

Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden
our understanding, changes in research methods and professional practices may become necessary.

Readers must always rely on their own experience and knowledge in evaluating and using any
information, methods, project work, or experiments described herein. In using such information or
methods, they should be mindful of their safety and the safety of others, including parties for whom
they have a professional responsibility.

To the fullest extent permitted by law, the publisher and the authors, contributors, and editors shall
not have any responsibility or liability for any losses, liabilities, claims, damages, costs or expenses resulting
from or suffered in connection with the use of the information and materials set out in this textbook.

Such information and materials are protected by intellectual property rights around the world and are
copyright © Arm Limited (or its affiliates). All rights are reserved. Any source code, models or other materials
set out in this textbook should only be used for non-commercial, educational purposes (and/or subject to
the terms of any license that is specified or otherwise provided by Arm). In no event shall purchasing this
textbook be construed as granting a license to use any other Arm technology or know-how.

ISBN: 978-1-911531-19-7

Version: 1.0.3 – pdf

For information on all Arm Education Media publications, visit our website at
https://www.arm.com/resources/education/books

To report errors or send feedback please email edumedia@arm.com


To our families
Contents
Foreword xiv
Preface xviii
Example Codes and Projects / Disclaimer / A note about the scope of this book xix
About the Author xx
Acknowledgments xxi

1. Introduction to Arm Cortex-M


1.1 Why learn Cortex-M system design? 2
1.1.1 Starting Cortex-M system design is easy 2
1.1.2 Cortex-M processor systems on FPGA 3
1.1.3 Security by design is made easier with Arm architecture 4
1.2 Understanding different types of Arm processors 4
1.3 Cortex-M deliverables 7
1.3.1 Licensing through Arm Flexible Access and Arm DesignStart 7
1.3.2 Obfuscated Verilog – DesignStart Eval 8
1.3.3 Verilog RTL sources – DesignStart Pro 9
1.3.4 FPGA Packages – DesignStart FPGA 9
1.3.5 Documentation 9

2. Introduction to system design with Cortex-M processors


2.1 Overview of Cortex-M Processors 12
2.2 What memories are needed? 13
2.2.1 Overview of memories 13
2.2.2 Memory declarations in FPGA design tools 14
2.2.3 Memory handling in ASIC designs 16
2.2.4 Memory endianness 17
2.3 Defining the peripherals 17
2.4 Memory map definition 18
2.5 Bus and memory system design 20
2.6 TCM integration 21
2.7 Cache integration 21
2.8 Defining the processor’s configuration options 22
2.9 Interrupt signals and related areas 22

vii
Contents

2.10 Event interface 24

2.11 Clock generation 25

2.12 Reset generation 27

2.13 SysTick 29

2.14 Debug integration 30

2.15 Power management features 31

2.16 Top-level pin assignment and pin multiplexing 31

2.17 Miscellaneous signals 32

2.18 Sign off requirements 32

3. AMBA, AHB, and APB


3.1 What is AMBA? 36

3.1.1 Introduction to Advanced Microcontroller Bus Architecture 36

3.1.2 History of AMBA 36

3.1.3 Various versions of AMBA specification 37

3.2 Overview of AHB 38

3.2.1 Various versions of AHB 38

3.2.2 AHB signals 38

3.2.3 Basic operations 40

3.2.4 Minimal AHB systems 42

3.2.5 Handling of multiple bus masters 43

3.3 More details on the AHB protocol 45

3.3.1 Address phase signals 45

3.3.2 Data phase signals 51

3.3.3 Legacy arbiter handshake signals 55

3.4 Exclusive access operations 57

3.4.1 Introduction to exclusive accesses 57

3.4.2 AHB5 exclusive access support 60

3.4.3 Mapping of Cortex-M3/M4/M7 exclusive access signals to AHB5 61

3.5 AHB5 TrustZone support 62

3.6 Overview of APB 63

3.6.1 Introduction to the APB bus system 63

3.6.2 APB signals and connection 64

viii
Contents

3.6.3 Additional signals in APB protocol v2.0 68

3.6.4 Data values on APB 69

3.6.5 Mixing different versions of APB components 69

4. Building simple bus systems for Cortex-M processors


4.1 Introduction to the basics of bus design 72

4.2 Building a simple Cortex-M0 system 73

4.3 Building a simple Cortex-M0+ system 74

4.4 Building a simple Cortex-M1 system 76

4.5 Building a simple Cortex-M3/Cortex-M4 system 78

4.6 Handling multiple bus masters 84

4.7 Exclusive access support 86

4.8 Address remap 88

4.9 AHB- based memory connection versus TCM 89

4.10 Handling of embedded flash memories 91

4.10.1 IP requirements 91

4.10.2 Flash programming 91

4.10.3 Bringing up a new device without a valid program image 92

5. Debug integration with Cortex-M processor systems


5.1 Overview of debug and trace features 96

5.2 CoreSight Debug Architecture 98

5.2.1 Introduction to Arm CoreSight 98

5.2.2 Debug connection protocols 99

5.2.3 Debug connection concept - Debug Access Port (DAP) 100

5.2.4 Various arrangements of debug interface structure 101

5.2.5 Trace connection concept 102

5.2.6 Timestamp 104

5.2.7 Debug components discovery (ROM table and component IDs) 104

5.2.8 Debug authentication 106

5.2.9 Debug power request 107

5.2.10 Debug reset request 108

5.2.11 Cross Trigger Interface 108

ix
Contents

5.3 Debug integration 109

5.3.1 JTAG / Serial Wire Debug connections 109

5.3.2 Trace port connections 110

5.3.3 Clocks for the debug and trace system 111

5.3.4 Multi-drop serial wire support 113

5.3.5 Debug authentication 114

5.4 Other related topics 116

5.3.1 Other signal connections 116

5.3.2 Daisy chain of JTAG connection 116

6. Low-power support
6.1 Overview of low-power Cortex-M features 120

6.2 Low-power design basics 121

6.3 Cortex-M low-power interfaces 123

6.3.1 Sleep status and GATEHCLK output 123

6.3.2 Q-channel low-power interface (Cortex-M23, Cortex-M33, Cortex-M35P) 124

6.3.3 Sleep hold interface 126

6.3.4 Wakeup Interrupt Controller (WIC) 128

6.3.5 SRPG’s impact on software 132

6.3.6 Software power-saving approach 132

6.4 Cortex-M processor characteristics that enable low-power designs 133

6.4.1 High code density 133

6.4.2 Short pipeline 133

6.4.3 Instruction fetch optimizations 134

6.5 System-level design considerations 135

6.5.1 Low-power designs overview 135

6.5.2 Clock sources 135

6.5.3 Low-power memories 135

6.5.4 Caches 135

6.5.5 Low-power analog components 136

6.5.6 Maximizing clock gating opportunities 136

6.5.7 Sleep mode that completely powers down the processor 137

x
Contents

7. Design of bus infrastructure components


7.1 Overview of a simple AMBA system design 142

7.2 Typical AHB slave design rules 144

7.3 Typical AHB infrastructure components 146

7.3.1 AHB decoders 146

7.3.2 Default slave 147

7.3.3 AHB Slave multiplexer 149

7.3.4 ROM and RAM with AHB interface 151

7.3.5 AHB to APB Bridge 159

7.4 Bridging from Cortex-M3/Cortex-M4 AHB Lite to AHB5 168

8. Design of simple peripherals


8.1 Common practices for peripheral designs 172

8.2 Designing Simple APB Peripherals 173

8.2.1 General Purpose Input Output (GPIO) interface 180

8.2.2 Simple APB Timer 186

8.2.3 Simple UART 190

8.3 ID registers 199

8.4 Other peripheral design considerations 200

8.4.1 Security of system control functions 200

8.4.2 Processor’s halting 200

8.4.3 Handling of 64-bit data 200

9. Putting the system together


9.1 Creating a simple microcontroller-like system 204

9.2 Design partitioning 205

9.3 What is inside a simulation environment? 206

9.4 Prepare the minimal software support for simulation 207

9.4.1 Overview of example code based on CMSIS-CORE 207

9.4.2 Device header file for example MCU (cm3_mcu.h) 208

9.4.3 Device start-up file for example MCU (startup_cm3_mcu.s) 211

9.4.4 UART utilities 212

9.4.5 System initialization function 213

xi
Contents

9.4.6 Retargeting 214

9.4.7 Other software support package considerations 215

9.5 System-level simulation 216

9.5.1 Compiling hello world 216

9.5.2 Using Modelsim/QuestaSim to compile and simulate the design 217

9.6 Advanced processor systems and Corstone Foundation IP 220

9.7 Verification 221

9.8 ASIC implementation flow 223

9.9 Design for Testing/Testability (DFT) 224

10. Beyond the processor system


10.1 Clock system design 230

10.1.1 Clock system design overview 230

10.1.2 Clock switching 231

10.1.3 Low-power considerations 232

10.1.4 DFT considerations 232

10.2 Multiple power domains and power gating 232

10.3 Arm processors in a mixed-signal world 235

10.3.1 Convergence of microcontrollers and mixed-signal designs 235

10.3.2 Analog to digital conversions 236

10.3.3 Digital to analog conversions 241

10.3.4 Other analog interface approaches 242

10.3.5 Connecting ADC and DAC IPs into a Cortex-M system 242

10.4 Bring an SoC to life – Beetle test chip case study 243

10.4.1 Beetle test chip overview 243

10.4.2 Beetle test chip challenges 245

10.4.3 Beetle test chip system design 246

10.4.4 Implementation of the Beetle test chip 246

10.4.5 Other related tasks 247

11. Software Development


11.1 Introduction to CMSIS (Cortex Microcontroller Software Interface Standard) 252

11.2 Creating software support for multiple toolchains 254

xii
Contents

11.2.1 What is needed for creating multiple toolchain support? 254

11.2.2 Compilation with Arm Compiler 6 254

11.2.3 Compilation with gcc 256

11.3 Introduction of the Arm Development Studio featuring Arm Keil Microcontroller
Development Kit (MDK) 261

11.3.1 Overview of Keil MDK 261

11.3.2 Keil MDK Installation 262

11.3.3 Create an application 263

11.3.4 Using the project wizard to create a project 264

11.3.5 Create and add source files 266

11.3.6 Edit the source files 268

11.3.7 Defining project options 269

11.3.8 Compile the project 272

11.3.9 Download and debug the application 272

11.3.10 Using ITM for text message output (printf) 274

11.3.11 Software development in collaborative environments 279

11.4 Using an RTOS 279

11.4.1 RTOS software concepts 279

11.4.2 Using Keil RTX 280

11.4.3 Optimizing memory usage 282

11.4.3.1 The need for RAM usage analysis 282

11.4.3.2 Configure RTX for stack watermarking 282

11.4.3.2 RTX RTOS viewer in Watch windows 283

11.5 Other toolchains 286

Glossary of terms 288


References 301
Index 302

xiii
Foreword

Why Read this Book?


Right now, you are probably surrounded by Arm processors without even knowing they are there.
More than 145 billion chips containing an Arm processor have been produced up to now – this is
19 for every human on the planet.

The most surprising thing is that Arm does not produce chips. It just designs the technology and
enables its partners to manufacture differentiated devices that integrate them.

Many more of those chips, also called SoCs (system-on-chip), are expected to be produced in the
coming years. We even start talking about trillions of devices for the Internet of Things (IoT). Of the
total number of SoCs currently out in the market, the great majority use the smallest processors in the
Arm product range: the Cortex-M series. Small, very energy efficient and powerful enough for many
applications, they are at the heart of many of today’s electronic devices.

This book is here to explain how SoCs based on the Arm Cortex-M processor portfolio cores are
designed, detail the different elements that compose such a system, explain the different design
issues, describe the integration into systems, and discuss how these SoCs are programmed.

A Brief History of Arm


The crazy years marking the history of personal computing began in the 1980s. Acorn, a British
company, became very successful with the BBC Micro-computer, which was used in many
schools throughout the country. For its future generation computers, the company wanted an
updated processor and started a quest for such a component. Unfortunately, none of the available
microprocessors were suitable for its needs. Most of them were either too complex or not available
and required a large number of external components. The Acorn team then learned about the Reduced
Instruction Set Computer (RISC) concept and found it could lead to powerful, yet low-cost, solutions.

At the time, RISC processors were confined to high-end computers, where cost was less of an issue,
since no existing RISC processors were exactly suitable. That led the team to embark on the journey
to develop their own piece of silicon.

This secret project was named “Acorn RISC Machine” (ARM, in short). The first processor, ARM1, was
launched in 1985. It was produced by VLSI Technology in a 3µm technology (almost 500 times larger
than the most advanced designs now) and could run at 6 MHz. One of the side-benefits of this simple
processor architecture was its lower power consumption (compared to contemporaneous CPUs),
which allowed the component to use a lower-cost plastic package without melting it.

At the heart of the processor design was the Arm instruction set, which progressively evolved to
optimize the performance and efficiency of new generations of processors. This is a key element of
what is called the ‘architecture.’

xiv
Foreword

The Arm processors powered several models of Acorn computers, but a major change happened when
VLSI Technology, which was manufacturing the components in its factories, signed an agreement with
Acorn to re-sell the chips to other companies. This was the first ‘Arm license.’

In 1990, after discussions with Apple Computer, who needed a new processor for the Newton
project, Acorn decided to spin-off its processor division and form a joint venture with Apple and VLSI
Technology. The team then changed the meaning of Arm to ‘Advanced RISC Machines’, which became
Arm Ltd later on.

This evolution came at the same time as a great change in the new company’s business model. On
the one hand, Arm had unique assets: great expertise in processor design and an original architecture.
However, producing chips required caring about fabrication, yield, quality, logistics, sales channels,
complex application-specific marketing, or any other tasks that a silicon manufacturer should do to be
successful. This was not optimal.

On the other hand, silicon manufacturers had a hard time staying competitive, because they had to
excel at these activities while simultaneously investing in design and innovation around processors,
at an increasingly fast pace. This was not great either.

The revolutionary idea for the newly-formed company was to become a specialist in R&D and focus
on the processor design only. Instead of selling components, Arm would license ‘Intellectual Property’
(IP in short) to semiconductor manufacturers, who would then use this IP to design their chips, in
combination with other elements that would be more application-specific.

Arm Ecosystem
The IP model selected from the start by Arm required a very tight relationship with the other
companies using the IP. As the company did not manufacture products, its success was entirely
dependent on the success of chip manufacturers embedding the Arm IP into their chips. Conversely,
to make sure that they always get the best performance and efficiency for their products, silicon
manufacturers had to make sure that the success of their products also benefited Arm, so that part of
the increasing revenues would be invested in improved and competitive IP. Together, Arm and partners
solidified the symbiosis using a royalty-based model: Arm revenues were largely dependent on the
success of the chips containing its IP. This resulted in a strong partnership between the company and
its customers, and a great sign of this very special relationship is that customers were called ‘partners’
(This is still the case more than 25 years after the foundation of the company).

Another great benefit from these partnerships was that each semiconductor ‘partner’ could focus
on a different set of applications, on different market segments, and integrate its own expertise and
‘secret sauce’ into the design of their products. This business model allowed the creation of a rich
variety of products that no single company (even the largest ones) would have been able to put into
their product catalog. It also made it increasingly difficult for processor manufacturers using other
architectures to compete with Arm because they had to compete with a whole ‘ecosystem.’ Many
of them progressively decided to stop wasting money on processor architecture development and
realized that it was much less expensive just to license state-of-the-art IP from Arm.

xv
Foreword

Another consequence of having several companies using the same processor IP cores was that tools,
software, and expertise could be reused from one chip to another. Indeed, a processor requires many
tools like code compilers or debuggers: having a larger market for these tools encouraged several
companies to start supporting the Arm architecture. Similarly, having a family of processors that could
execute the same instructions enabled the software developers to propose many operating systems,
libraries, frameworks or various elements that could easily run or be adapted to several components.
Finally, this allowed engineers to avoid having to learn about a new processor every time they changed
their chip, which allowed them to build strong expertise and become more efficient.

All of these factors meant that Arm could add several additional partners in the ecosystem, bringing
even greater value to every participant and making Arm-based solutions even more attractive. This
virtuous circle has significantly contributed to the success of the Arm ecosystem.

Softbank Acquisition
Even if the IP model has been duplicated many times, no other company has managed to be as
successful. This propelled Arm into a very special position in the industry. Its long-term success
required fairness with each member of the industry, and careful management to keep the balance
between all partners of the ecosystem.

2016 marked a significant milestone in Arm’s history: Softbank group agreed with Arm management
to acquire the company with the promise to continue promoting the same values of fairness and
partnership while accelerating its development.

Market and Applications


Arm-based processors are used in virtually all applications requiring processing capability: as the
company says, “wherever computing happens.” Over the years, the company has developed a range
of products that address very different needs, from the tiniest processors for embedded applications
(the Arm Cortex-M processor portfolio) to the largest application processors that are used in high-
performance servers or that power 95% of the mobile phones in the world (the Cortex-A processor
portfolio). There is more than a factor of 100 in complexity and size between the smallest and the
highest performing cores.

However, central processing units are not the only IP offered by Arm: a diverse range of IP has been
developed or acquired by the company to address the needs of many applications. This is the case of
what is called ‘System IP’: all the elements that enable processors to connect to the rest of the system,
transfer or store data between those elements, manage security, enable the debug of the software,
and manage power. Another very important line of products relates to media processing, and the Arm
Mali series is now the world’s ‘most shipped’ commercial GPU IP.

Enabling Future Technology Today


Even if the core business of Arm remains semiconductor IP, more and more software is being developed
to complement hardware designs. This can be seen, for example, in products for IoT applications.
With the Mbed software platform, Arm not only brings the software that is closest to the hardware
elements but also provides many standard functions needed in these devices: to manage security,
connectivity, firmware updates or association to the Cloud services.

xvi
Foreword

An entire division in Arm is now focusing on building this embedded software foundation, and also
creating a Cloud platform, called Pelion, to connect and manage to all these embedded devices, and
to integrate the associated data into enterprise systems.

From providing the IP for the chip to delivering the Cloud services that allow organizations to manage
the deployment of products throughout their lifecycle securely, Arm delivers a pre-integrated IoT
solution for its partners, rooted in its deep understanding of the future of compute and security.

Arm technologies continuously evolve to ensure that intelligence is at the core of a secure and
connected digital world. With a range of licensing options, such as Arm DesignStart and Arm Flexible
Access, it’s now never been easier or faster to start working with Arm IP. Developed to facilitate the
design of modern innovations—from the sensor to the smartphone to the supercomputer—Arm
technologies are making smart possible.

Mike Eftimakis
Director of Business Innovation Strategy, Arm

xvii
Preface

In the past, apart from microprocessors and microcontrollers, not many chip designs had internal
embedded processors. This has changed significantly since Arm Cortex-M processors were released,
and many more device types have emerged that are part of the rapidly growing Internet of Things (IoT).
Today, Arm processors are being used in smart sensors, smart batteries (e.g., for battery health monitor
systems), wireless communication chipsets, power electronics controllers, etc. This trend is driven by
the need for tighter system integration, additional functional features, better system reliability, and
reduction of supply chain dependency.

SoC design is an exciting industry with plenty of opportunities – the applications of Cortex-M based
SoCs ranges from consumer products, industrial and automotive applications, communications,
agriculture, transportation, healthcare/medical, etc. With the expanding IoT device market, the need
for embedding processors into SoC designs continues to increase.

Cortex-M processors, like Cortex-M0, Cortex-M0+, and Cortex-M3, are very small and can integrate
into a range of SoC designs easily. With Arm DesignStart lowering the cost barrier, many small
businesses and start-ups are taking advantage of this to develop their own SoC solutions to offer
better product differentiation. All of these developments have resulted in significant demand for SoC
designers with Arm DesignStart. Arm DesignStart has also received strong interest from academia,
where we see some universities interested in introducing SoC design topics into their courses.

In addition to the popular Armv6-M and Armv7-M processors, newly available SoCs/microcontrollers
based on the Armv8-M processors such as Cortex-M23 and Cortex-M33 processors, deliver enhanced
security solution with Arm TrustZone technology. In February 2019, Arm announced the new
Armv8.1-M architecture with Arm Helium technology, which brings vector processing capability to
Arm Cortex-M devices. These technology enhancements continue to enable the Cortex-M processors
to be used in an even wider range of applications.

While there are many technical resources on the internet on Arm software development, very limited
information was available for Arm-based SoC design, particularly on topics about integrating Arm
processors and on-chip bus protocols. This book is written to fill this gap to enable beginners in the
field to understand a range of technical concepts on SoC design, and also provide detailed descriptions
of design integration with several of the Arm Cortex-M processors. A range of other topics, including
system component design, SoC design flow, and software development, are also covered.

If you are a beginner in SoC design, I hope that this book will enable you to gain SoC design
knowledge and help you to kickstart your SoC or FPGA design projects. For those of you who are
experienced chip designers, I hope that you find this a useful reference source. Enjoy the book -
and let your SoC design creativity go wild! There are always opportunities for new and fascinating
Arm-based SoCs on the market.

xviii
Example Codes and Projects – Free to Download!

For readers of this book, Joseph Yiu has prepared a package of example codes and projects
to download that includes:

„„An example Cortex-M3 system design based on Arm Cortex-M3 DesignStart Eval.

„„A simulation setup for the example system.

„„An FPGA project setup for the example system, for Digilent Arty-S7-50T FPGA board
and Xilinx Vivado 2019.1.

The package can be downloaded from the book section of Arm Education Media’s website at
https://pages.arm.com/socrefbook.html

Disclaimer
The Verilog design examples and related software files included in this book are created for
educational purposes and are not validated to the same quality level as Arm IP products.
Arm Education Media and the author do not make any warranties of these designs.

A note about the scope of this book

This book focuses on the concepts of system designs based on Cortex-M0 and Cortex-M3 processors.
Since the product offering DesignStart and DesignStart FPGA will change over time, the full details of
using those packages will not be covered here. However, the system design concepts and some of the
technical details in this document are relevant to most of the Cortex-M system designs.

xix
About the Author

Joseph Yiu
Distinguished Engineer, Embedded Technology at Arm

Joseph is a distinguished engineer in the Arm IoT/Embedded processors product marketing team.
His role is focused on technologies and products for embedded applications, including areas such as:

„„Cortex-M processor products technical development

„„Embedded product roadmaps

„„Technical marketing

„„Technical advisory for various internal and external projects, as well as Arm’s product support team

He also works with EEMBC (www.eembc.org) on benchmark development – for example, ULPMark.

Joseph started as an IP designer on accelerated 8-bit processors in 1998 before joining Arm in 2001,
where he worked on some of the first Arm-based SoC projects in the emerging System-on-Chip
group. In 2005, he moved to the processor division and worked on a range of Cortex-M processor
and design kit projects. After over 10 years in various senior engineering roles, he moved into the
product management team, while continuing his involvement in Arm embedded technology projects.
His technical specialisms include microcontroller and SoC system-level design with Arm Cortex-M
processors, applications and programming, ASIC/SoC designs, verifications, FPGA prototyping and
implementation areas such as low-power design and production tests (DFT), and RF circuit design.

Authorship
Joseph’s previous book titles include:

The Definitive Guide to ARM Cortex-M3 and Cortex-M4 Processors, 1st to 3rd edition
(Elsevier, October 2013)

The Definitive Guide to the ARM Cortex-M3, 1st and 2nd edition
(Elsevier, January 2010)

xx
Acknowledgments

A big thank you to the editor, Michael Shuff, for his efforts in proofreading and various useful
suggestions. I would also like to thank Christopher Seidl, Chris Shore, and Jon Marsh for contributing
materials, and the Arm marketing team for their support on this project.

xxi
CHAPTER
Introduction to 1
Arm Cortex-M
System-on-Chip Design with Arm® Cortex®-M processors

1.1 Why learn Cortex-M system design?


1.1.1 Starting Cortex-M system design is easy
Arm Cortex-M processors represent one of the most popular architectures used today for Internet
of Things (IoT) and embedded applications. For many digital system designers, the digital blocks they
design need to interface with processors in some ways, for example, using a processor for operation
flow control. Having a small, easy-to-use Cortex-M processor integrated into the design makes it
easier for them to provide a total solution.

You may wonder, ‘Why not use a state machine to handle the control function?’ In the simplest digital
applications, a finite state machine (FSM) implemented in Verilog or VHDL could handle all the required
control functions, and in those cases, there is indeed no need to have a processor in the system. However,
when the application gets more complex, the number of states in the control function FSM increases,
or when the system’s behavior needs to be more flexible, the inclusion of a processor in the system
is unavoidable. To enable better flexibility, complex control flows are handled by a processor running
control software, which can be easily modified and debugged. As a result, embedded processors are
being increasingly embedded in FPGA designs. Although it is possible to use a separate microcontroller to
control an FPGA-based digital system, this will result in an increased component count in the completed
system, as well as potential issues with signal routing between the processor and the FPGA-like timing,
PCB signals routing, noise, and reliability problems.

In general, the advantages of including a processor in the FPGA are:

„„Ability to handle complex tasks like Graphical User Interface (GUI) and data storage management
(e.g., file system);

„„Application programs can be developed and updated separately from the hardware design, allowing
better flexibility in product development;

„„Reduces the total number of components in the system because there is no need for a separated
processor chip;

„„Signal routing between the processor and the functional logic is handled automatically by FPGA
design tools;

„„Debugging software on a well-established processor is much easier than debugging a complex state
machine;

„„Little limitation on the interface between the processor and the user-defined logic blocks;

„„In comparison, the use of separated processor chips can have limitations on the interface like the
number of pins, selection of protocol and electrical characteristics;

„„Program code can be stored on configuration flash for the FPGA, allowing firmware update to the
hardware design and the application code to be carried out at the same time;

2
Chapter 1 | Introduction to Arm Cortex-M

„„Processor implementation features are now becoming part of the FPGA development tools, making
integration of the processor into FPGA easier than using separate processor chip.

There are other intellectual property (IP) products available in the market, of course. However, the
designs of the Cortex-M processors provide:

„„Good performance with a small area/power budget,

„„Easy software development, and

„„Well-proven technology.

Products based on Arm Cortex-M processors have been around since 2005. In recent years, Arm has
made Cortex processor IP more accessible to cost-constrained companies through easy to arrange,
fast, no/low-cost licensing. For example, Arm Flexible Access introduced in 2019 offers a simple way
to evaluate and fully design system-on-chip (SoC) solutions with a wide-ranging mix of Arm IP before
committing to production, paying only for what is used at manufacture. There are also Arm DesignStart
programs that assist designers who are new to Cortex-M technology with a range of Arm IP to help
them get started on their designs instantly and risk-free. You can source various FPGA development
solutions, like affordable FPGA development boards, that can save you both time and money. Through
partnerships with FPGA vendors, Arm also offers DesignStart FPGA, which includes instant and free
access to Cortex-M1 and Cortex-M3 soft CPU IP Cortex-M processors for use on selected FPGA
platforms. Together with an industry-leading ecosystem of tools, software, and services, the Arm
Cortex-M processor portfolio offers some of the best embedded processors for digital system designs.

1.1.2 Cortex-M processor systems on FPGA


Since there are so many ready-to-use Cortex-M based microcontrollers and SoCs, why should someone
spend their time to create their own Cortex-M based systems in FPGA? There can be many different reasons:

„„Education – for many universities teaching digital system design, FPGAs are perfect platforms.
Universities had been interested in using Arm processors in their teaching of digital design courses,
like how to create a typical SoC design with a processor and develop applications for it. However,
doing real chip design is costly and takes a long time, making the FPGA platform much more suitable.

„„Commercial product development – many digital designers are creating custom digital systems with
FPGA and need a processor to control the operations of the digital systems they design. In some
other applications, the digital functions needed are not available in off-the-shelf microcontroller
products, and therefore using the Cortex-M processors in FPGA enables alternate solutions.

„„Prototyping for chip/SoC designs – many ASIC designers use FPGA for prototyping their designs
and their chip/SoC designs that contain the Cortex-M processors. It is also a useful way to prototype
new product ideas, and to provide demonstrations/proof of concepts. With these systems, software
developers can reuse their Cortex-M programming knowledge to program such devices.

While there have been several FPGA vendor-specific processors available, most of those architectures
are proprietary and could be restricted to certain FPGA architectures. In contrast, the Cortex-M
3
System-on-Chip Design with Arm® Cortex®-M processors

processors are much more generic. Most of the Cortex-M processors (e.g., Cortex-M0 and Cortex-M3)
are optimized for ASIC/SoC applications. The Cortex-M1 processor was designed to be optimized for
most of the FPGA devices (it is small and allows high operation frequency), and at the same time can
be portable between different FPGA types and is upward-compatible to other Cortex-M processors.
For example, from a software point of view, the architecture used in Cortex-M1 is based on the same
instruction set used by the popular Cortex-M0, Cortex-M0+ processors. Designers can also upgrade to
a Cortex-M3 or other Cortex-M processor if more instruction features are needed.

Since the recent availability of the Cortex-M processor IP in FPGA design tools, Cortex-M system
designs are no longer restricted to SoC design professionals. Even students, academic researchers,
and electronics enthusiasts now have access to the world of Cortex-M system design.

1.1.3 Security by design is made easier with Arm architecture


Securing connected devices requires a step-by-step approach to building in the right level of device
security, reducing risk around data reliability, and allowing businesses to innovate on new ideas to
reap the benefits of digital transformation. Arm has started an industry-wide initiative called Platform
Security Architecture (PSA) that is supported by a range of silicon vendors and ecosystem partners
who are seeking better collaboration and alignment of security standards.

Although the PSA framework was devised by Arm, it is ‘architecture agnostic’ in that it requires
that all compliant devices, regardless of architecture, are designed to meet a set of defined security
objectives. PSA resources include programming interfaces (APIs), best practices, threat models to
consider, and open-source reference firmware. You can find out more by visiting: https://developer.
arm.com/architectures/security-architectures/platform-security-architecture

1.2 Understanding different types of Arm processors


Arm processors are deployed in many different applications, with very different needs - and to support
that, Arm has developed a broad portfolio of processors to help designers select the best-fit compute
for their device. For example, the application requirements for a smartphone are very different from
the requirement of a motor controller. To address the wide variety of application requirements, Arm
provides a range of processor products in different profiles belonging to the Cortex processor families:

„„The Cortex-A portfolio – Application processors for complex systems. An example of the processors
in this class is the Cortex-A53. It is developed to support applications like smartphones, PDAs, set-
top boxes, which need high-performance processing and require OS support like Linux, Android,
Microsoft Windows, etc.

„„The Cortex-R portfolio – Processors for real-time, high-performance systems. An example of


a processor in this class is the Cortex-R52. It is developed to provide high performance, low
latency, and robust characteristics. Typical applications include hard disk controllers and baseband
processing in communication devices.

„„The Cortex-M portfolio – Processors for microcontroller applications. An example of a processor in


this class is the Cortex-M3 processor. It has been developed for deeply embedded, and cost-sensitive

4
Chapter 1 | Introduction to Arm Cortex-M

applications, and yet provides good performance and rapid interrupt response. Typical applications
include industrial controls, consumer products, like portable audio devices, and digital cameras.

Key characteristics of these processors are summarized in Table 1.1.

Cortex-A Cortex-R Cortex-M

Architecture type Support both 64 and 32-bit from Support both 64 and 32-bit from 32-bit only
Armv8-A, 32-bit in Armv7-A and Armv8-R, 32-bit in Armv7-R and
older architecture older architecture
Clock frequency range Longer pipeline optimized for high Medium-length pipeline Short to medium length pipeline
and pipeline clock frequency range (e.g., 8-stage in Cortex-R5) (2 to 6 stages) for low-power
systems
Virtual memory support Yes No (it is permitted in Armv8-R, No
(required for Linux) but not supported in current
Cortex-R processors)
Virtualization support Yes Yes, from Armv8-R No
(e.g., Cortex-R52)
Arm TrustZone security Yes No Yes, from Armv8-M, but not
extension in Armv6-M and Armv7-M
architectures
Interrupt handling Based on Generic Interrupt Based on Generic Interrupt Based on Nested Vectored
Controller (GIC) with multi-core Controller with multi-core and Interrupt Controller (NVIC)
and virtualization support. virtualization support, or Vectored internal to the processor.
Non-deterministic interrupt Interrupt Controller in older Low interrupt latency and easy
response speed. Cortex-R. Fast interrupt response. to use.

ISA for DSP acceleration Neon Advanced SIMD Neon Advanced SIMD support Support legacy SIMD (32-bit
(128-bit vectored processing). on Armv8-R. Also, support legacy vector processing) in Cortex-M4,
Latest architecture from SIMD (32-bit vector processing). Cortex-M7, Cortex-M33, and
Armv8.3-A supports Scalable Cortex-M35P
Vector Extension (SVE).

Table 1.1: Key characteristics of different Cortex processors.

If you are planning to use Linux in your applications, a Cortex-A processor would be needed. Both
Xilinx and Intel (previously Altera) have FPGA products with built-in Cortex-A processor subsystems.
On the other hand, the Cortex-M processors are ideal for smaller embedded systems, often with real-
time requirements.

There are different types of the Cortex-M processors, too. We can classify them into three product ranges:

Armv6-M and Armv8-M architecture


Armv7-M architecture (supports TrustZone security extension)

High performance Cortex-M7 (Armv7-M) Coming soon

Mainstream processor Cortex-M3 and Cortex-M4 processors (Armv7-M) Cortex-M33 and Cortex-M35P processors

Processors for Cortex-M0, Cortex-M0+, and Cortex-M1 (all Cortex-M23 processor


constrained systems Armv6-M architecture)

Table 1.2: Different Cortex-M processors.

5
System-on-Chip Design with Arm® Cortex®-M processors

For general data processing and control applications, Armv6-M processors are more than capable of
handling these requirements:

„„Cortex-M0 processor: the smallest Arm processor (only 12K gates in minimum configuration) with
a simple 3-stage pipeline, based on Von-Neumann bus architecture. No privilege level separation
and no memory protection unit (MPU).

„„Cortex-M1 processor: similar to the Cortex-M0 processor, but optimized for FPGA applications.
It provides Tightly-Coupled-Memory (TCM) interface to simplify memory integration on FPGA and
delivers higher clock frequency for FPGA implementations.

„„Cortex-M0+ processor: also based on Armv6-M architecture, with privilege level separation and
an optional memory protection unit (MPU). It also has an optional single-cycle I/O interface for
connecting peripheral registers that need low latency accesses, and a low-cost instruction trace
feature called Micro Trace Buffer (MTB).

„„Cortex-M23 processor: For constrained embedded systems that need advanced security, the
Cortex-M23 processor with the Arm TrustZone security extension is more suitable. In addition
to TrustZone support, the Cortex-M23 processor has many other enhancements compared to
Armv6-M processors:

Additional instructions (e.g., hardware divide, compare, and branches);


……

Supports more interrupts (up to 240);


……

Real-time instruction trace using Embedded Trace Macrocell (ETM);


……

More configurability options.


……

„„Cortex-M3 processor: For applications that need more complicated data processing, Armv7-M
processors could be more suitable. The instruction set in Armv7-M provides support for more
addressing modes, conditional execution, bit field processing, multiply, and accumulate (MAC).
So even with a relatively small Cortex-M3 processor, you can have a relatively high-performance
system.

„„Cortex-M4 processor: If DSP-intensive processing or single-precision floating-point processing


are needed, the Cortex-M4 processor is more suitable than Cortex-M3 because it supports 32-bit
SIMD operations and an optional single-precision floating-point unit (FPU).

„„Cortex-M7 processor: the highest performance Cortex-M processor today with a six-stage
pipeline and superscalar design, allowing execution of up to two instructions per cycle. Similar to
the Cortex-M4, it supports 32-bit SIMD operations and an optional FPU. The FPU in Cortex-M7
can be configured to support single-precision or both single and double-precision floating-point
operations. It is also designed to work with high performance and complex memory system by
supporting instruction and data caches and TCM.

6
Chapter 1 | Introduction to Arm Cortex-M

„„Cortex-M33 processor: a mid-range Armv8-M processor at similar footprint to Cortex-M4, adding


TrustZone security extension support, co-processor interface and a newer pipeline design to enable
higher performance.

„„Cortex-M35P processor: similar to the Cortex-M33 processor, but with the enhancement of
anti-tampering features to prevent physical security attacks (e.g., side-channel and fault injection
attacks). It also includes an optional instruction cache.

For beginners, Cortex-M0, Cortex-M1, and Cortex-M3 are good starting points for most projects.

1.3 Cortex-M deliverables


1.3.1 Licensing through Arm Flexible Access and Arm DesignStart
When this chapter was written, the following licensing options were available from Arm:

Find out more about various Arm licensing options


Arm provides a range of licensing options, including no or low upfront fees and free access for
academic purposes. Visit www.arm.com/licensing for more information.

Arm DesignStart
„„Cortex-M0 and Cortex-M3 processors are available via DesignStart program (Note: The Cortex-A5
processor is also available, but this book is not intended to cover this).

„„Cortex-M1 and Cortex-M3 processors are available at no cost as soft CPU IP optimized for easy
integration with FPGA partners.

The Cortex-M33 processor is available as DesignStart FPGA on Cloud: (https://developer.arm.com/


docs/101505/latest/designstart-fpga-on-cloud-cortex-m33-based-platform-technical-reference-
manual)

There are different types of deliverables for each of these DesignStart programs. Currently, Cortex-M
DesignStart is divided into several types:

„„DesignStart Eval(ulation) – delivered as obfuscated Verilog with fixed configuration. Instant access
and free. Suitable for evaluation, research, and teaching.

„„DesignStart Pro – delivered as full RTL source, configurable and requires a simple license;
Zero license fee and success–based royalty model.

„„DesignStart for University - delivered as full RTL source, configurable and requires a simple license.
Zero license fee.

„„DesignStart FPGA – delivered as packages for FPGA development tools. Instant access and free.
Suitable for evaluation, research, teaching, and commercial use.

7
System-on-Chip Design with Arm® Cortex®-M processors

For the latest information and details of DesignStart (including licensing conditions), please visit the
Arm website: https://developer.arm.com/products/designstart

Cortex-M0 and Cortex-M3 DesignStart Eval and Pro contains the following offerings:

Cortex-M0 DesignStart Eval Cortex-M3 DesignStart Eval Cortex-M0 DesignStart Pro Cortex-M3 DesignStart Pro
Cortex-M0 obfuscated model Cortex-M3 obfuscated model Full version of Cortex-M0 Full version of Cortex-M3
deliverable deliverable
Cortex-M0 System Design Kit Corstone-100 foundation IP Cortex-M0 System Design Kit Cortex-M System Design
(CM0SDK) including SSE-050 subsystem (CM0SDK) Kit (CMSDK), Corstone-100
foundation IP including
SSE-050 subsystem and
several IP blocks including
TRNG (True Random Number
Generator) for security

Cortex-M3 Cycle Model Cortex-M3 Cycle Model


(1-year license) (1-year license)
FPGA project for MPS2 FPGA FPGA project for MPS2 FPGA FPGA project for MPS2 FPGA FPGA project for MPS2 FPGA
board board board board
Trial license of Keil MDK Trial license of Keil MDK Trial license of Keil MDK Trial license of Keil MDK
(time-limited license) (time-limited license) (time-limited license) (time-limited license)
DesignStart RTL Review DesignStart RTL Review

Table 1.3: Offerings from Arm Cortex-M DesignStart Eval and Pro.

Trial license for IAR Embedded Workbench for Arm is also available from IAR Systems (https://www.
iar.com/designstart).

You can find out more about Flexible Access and DesignStart on the Arm website and request more
information: https://arm.com/why-arm/how-licensing-works

Disclaimer: The IP offering and commercial terms available through Arm DesignStart and Flexible
Access above are accurate as of July 2019 and are subject to change.

1.3.2 Obfuscated Verilog – DesignStart Eval


The Cortex-M0 and Cortex-M3 DesignStart Eval deliver the processors as obfuscated Verilog files.
These RTL files are not encrypted, but the internal logic is flattened, and the signal names replaced
with random names. You can simulate it with standard Verilog simulators and synthesize it for FPGA
testing (but the synthesis outcome will not be optimized due to the nature of the code). The top-
level signals of the processors are retained as clear un-obfuscated text. DesignStart Eval can be
implemented using any FPGA fabric.

The Cortex-M0 DesignStart Eval includes an example system based on the Cortex-M System Design
Kit (CMSDK) product. The example system is delivered as RTL sources, with example test codes and
simulation scripts. A FPGA prototyping project for MPS2 (Microcontroller Prototyping System 2) is
also included.

8
Chapter 1 | Introduction to Arm Cortex-M

The Cortex-M3 DesignStart Eval includes a system design based on the CoreLink System Design Kit SDK-
100 (a successor of CMSDK). It also has examples, simulation scripts, and FPGA projects for MPS2.

1.3.3 Verilog RTL sources – DesignStart Pro


The Cortex-M0 and Cortex-M3 DesignStart Pro deliver the RTL source code of the processor (not
obfuscated). These provide configuration options in the form of Verilog parameters, allowing designers
to select the features they need. Since the design is delivered as RTL source, the synthesis tools can
provide the best optimization in synthesis.

The DesignStart Pro also includes the deliverable for the full CoreLink subsystem products.

1.3.4 FPGA Packages - DesignStart FPGA


Cortex-M1 and Cortex-M3 can be integrated into an FPGA vendor’s toolchain as an encrypted
component. The components will typically allow some configuration and already include TCM
integration. Some packages will convert the native AHB interface of the processor to an AXI bus.
These packages can only be used with the toolchain from the specific FPGA vendor, but support
a range of devices.

1.3.5 Documentation
There are several types of documents that you will come across when working on Arm system designs:

Architecture reference manuals: these documents specify the behavior of the architecture (e.g.,
instruction set, programmer’s model) but not the processor-specific implementation details (e.g.,
pipeline and interface). There are separated architecture reference manuals for Armv6-M, Armv7-M,
and Armv8-M, and you can download them from https://developer.arm.com (Please refer to Table 1.2
to see which architecture is for which processors).

Technical reference manuals: Often known as TRM, they describe the specification of the processors
or other system IPs. These documents are public and can be found on https://developer.arm.com

Integration and Implementation manuals: Also known as IIM, they describe the interface,
configuration options and explain how to use the deliverables like the execution testbenches. These
documents are confidential and are inside product bundles.

User guides: The details of the FPGA examples are documented in user guides notes.

Release notes: All of the deliverables from ARM are provided with a release note which identifies
the versions of parts within a bundle, any known issues and any changes since a previous release.
The release note will also describe how to install and test the deliverable. These documents are
confidential and are inside product bundles.

Errata: The errata document describes known issues with ARM products, together with workarounds
if applicable.

9
CHAPTER
Introduction to system 2
design with Cortex-M
processors
System-on-Chip Design with Arm® Cortex®-M processors

2.1 Overview
One of the key advantages of using the Cortex-M processor is that, for small system designs, in
particular, it is not that difficult to get the system to work in a Verilog simulation or on FPGA. You will,
of course, need to acquire some knowledge beforehand, like a basic overview of the architecture used
in the Cortex-M processors. Also, if you are using a Verilog RTL version of the design, you will need an
understanding of the bus protocols used in the Cortex-M processors, such as AHB and APB protocols.

The first step of the project is to understand the requirements of the applications. For example, you
will need to know:

„„Which Cortex-M processor is the best fit for your needs?

„„How much memory (ROM and SRAM) is needed?

„„How fast the system runs (i.e., clock speed)?

„„What peripherals are needed?

For ASIC designs, many additional areas should be investigated. For example, the following are
generic chip design considerations:

„„What semiconductor process node should be used?

„„What types of memory technologies are available (e.g., embedded flash memories are not available
for many small geometry process nodes)?

„„How should non-volatile memory (NVM) programming be handled?

„„What type of power management features should be used?

„„What type of chip packaging should be used?

„„What type of Design-for-Test (DFT) features are needed for device manufacturing testing?

For the era of IoT, designers should also investigate security aspects and many other challenging areas
of integrating wireless communication interfaces inside SoC designs.

To keep this document manageable, let us look into the processor system design areas only. To get
a simple Cortex-M processor system to work, typically we need to consider and, where appropriate,
define, the following (this is not a definitive list):

„„Memory blocks – what type of memories are needed, and memory sizes?

„„Peripherals – what peripherals are needed, and creation of peripherals if needed?

12
Chapter 2 | Introduction to system design with Cortex-M processors

„„Memory map.

„„Bus system design.

„„Processor configuration options.

„„Interrupt assignments and interrupt types.

„„Event interface integration.

„„Clock and reset generation.

„„Debug integration.

„„Power management features of the system.

„„Top-level pin assignment and pin multiplexing.

In the rest of this chapter, you can read an overview of some of these areas.

2.2 What memories are needed?


2.2.1 Overview of memories
In a typical Cortex-M based system, there are at least two types of memories:

„„Non-volatile memory (NVM), typically using embedded flash technologies or masked ROM, for
program storage;

„„RAM, for read-write data including stack and heap.

In some systems, there can be additional memories for bootloader and other preloaded firmware.
Some low-power devices also have special retention static RAM (SRAM) for holding small amounts
of data while the rest of the device is shut down during sleep modes.

Most of the Cortex-M processors use 32-bit AHB for memory interfacing (except Cortex-M1 which
uses Tightly Coupled Memory (TCM) interfaces for connecting memories, and Cortex-M7 which
supports both Tightly-Coupled-Memory (TCM) and AXI bus interfaces). Therefore, the memory system
designs are normally 32-bit wide, but they also need to be byte-addressable – it means the RAM must
support byte (8-bit), half-word (16-bit) and word (32-bit) write operations.

For FPGA-based projects, the SRAM inside the FPGA can be used for both program storage (most
FPGA initialization sequences can initialize SRAM contents at the same time) and read-write data.

13
System-on-Chip Design with Arm® Cortex®-M processors

Therefore, in theory, you could use just one SRAM block for a Cortex-M based FPGA system design.

SRAM in FPGA

Use for data


(R/W)

FPGA image
Use as
Initial content program
storage

Figure 2.1: SRAM in FPGA can have initial values so that a single SRAM block can be used as both program ROM and RAM.

However, such an arrangement differs from ASIC/SoC system designs where SRAM cannot be
initialized in the same way. Also, doing so will impact performance on a Cortex-M3/M4-based system
as it will no longer be using a Harvard bus architecture. To avoid confusion, the rest of the examples in
this book use two memory blocks for separating program storage and data read-writes.

2.2.2 Memory declarations in FPGA design tools


If you are using FPGA DesignStart, the memory system for the Cortex-M1 or Cortex-M3 could be
generated for you by the FPGA design tools, so it is easy to do. However, if you are not using FPGA
DesignStart, you might need to handle the memory integration manually.

A long time ago, FPGA tools could not generate RAM blocks using behavioral Verilog codes and
declaration of memories in FPGA projects required instantiation of memory macros manually. This was
changed a few years ago, but such a capability might require the RAM declarations to be written in
a specific way to allow the FPGA design tools to recognize it correctly.

In the Cortex-M0 & Cortex-M3 DesignStart Eval, the file “logical\cmsdk_fpga_sram\verilog\cmsdk_


fpga_sram.v” provides a synthesizable SRAM model that works with most FPGA flows. You can
attach this SRAM model to an AHB bus using a bus wrapper(“cmsdk_ahb_to_sram.v”), as shown in
“logical\models\memories\cmsdk_ahb_ram.v” or “logical\models\memories\cmsdk_ahb_rom.v”.

SRAM
AHB interface
interface
cmsdk_fpga_rom /
cmsdk_ahb_to_sram
cmsdk_fpga_ram

Figure 2.2: FPGA SRAM instantiation with an AHB interface.

This arrangement allows you to swap over the FPGA ROM/RAM with other memories easily
(e.g., when migrating to ASIC).

14
Chapter 2 | Introduction to system design with Cortex-M processors

If you would like to simplify the design, it is possible to use a simple AHB block SRAM design (from
my paper in Embedded World 2014 – “Arm Cortex-M Processor-based System Prototyping on FPGA”
https://community.arm.com/processors/b/blog/posts/embedded-world-2014---arm-cortex--m-
processor-based-system-prototyping-on-fpga

module AHBBlockRam #(
// --------------------------------------
// Parameter Declarations
// --------------------------------------
parameter AWIDTH = 12
)
(
// --------------------------------------
// Port Definitions
// --------------------------------------
input HCLK, // system bus clock
input HRESETn, // system bus reset
input HSEL, // AHB peripheral select
input HREADY, // AHB ready input
input [1:0] HTRANS, // AHB transfer type
input [1:0] HSIZE, // AHB hsize
input HWRITE, // AHB hwrite
input [AWIDTH-1:0] HADDR, // AHB address bus
input [31:0] HWDATA, // AHB write data bus
output HREADYOUT, // AHB ready output to S->M mux
output HRESP, // AHB response
output [31:0] HRDATA // AHB read data bus
);
parameter AWT = ((1<<(AWIDTH-2))-1); // index max value
// --- Memory Array ---
reg [7:0] BRAM0 [0:AWT];
reg [7:0] BRAM1 [0:AWT];
reg [7:0] BRAM2 [0:AWT];
reg [7:0] BRAM3 [0:AWT];
// --- Internal signals ---
reg [AWIDTH-2:0] haddrQ;
wire Valid;
reg [3:0] WrEnQ;
wire [3:0] WrEnD;
wire WrEn;
// --------------------------------------
// Main body of code
// --------------------------------------
assign Valid = HSEL & HREADY & HTRANS[1];
// --- RAM Write Interface ---
assign WrEn = (Valid & HWRITE) | (|WrEnQ);
assign WrEnD[0] = (((HADDR[1:0]==2’b00) && (HSIZE[1:0]==2’b00)) ||
((HADDR[1]==1’b0) && (HSIZE[1:0]==2’b01)) ||
((HSIZE[1:0]==2’b10))) ? Valid & HWRITE : 1’b0;
assign WrEnD[1] = (((HADDR[1:0]==2’b01) && (HSIZE[1:0]==2’b00)) ||
((HADDR[1]==1’b0) && (HSIZE[1:0]==2’b01)) ||
((HSIZE[1:0]==2’b10))) ? Valid & HWRITE : 1’b0;
assign WrEnD[2] = (((HADDR[1:0]==2’b10) && (HSIZE[1:0]==2’b00)) ||
((HADDR[1]==1’b1) && (HSIZE[1:0]==2’b01)) ||
((HSIZE[1:0]==2’b10))) ? Valid & HWRITE : 1’b0;
assign WrEnD[3] = (((HADDR[1:0]==2’b11) && (HSIZE[1:0]==2’b00)) ||
((HADDR[1]==1’b1) && (HSIZE[1:0]==2’b01)) ||
((HSIZE[1:0]==2’b10))) ? Valid & HWRITE : 1’b0;

always @ (negedge HRESETn or posedge HCLK)


if (~HRESETn)

15
System-on-Chip Design with Arm® Cortex®-M processors

WrEnQ <= 4’b0000;


else if (WrEn)
WrEnQ <= WrEnD;

// --- Infer RAM ---


always @ (posedge HCLK)
begin
if (WrEnQ[0])
BRAM0[haddrQ] <= HWDATA[7:0];
if (WrEnQ[1])
BRAM1[haddrQ] <= HWDATA[15:8];
if (WrEnQ[2])
BRAM2[haddrQ] <= HWDATA[23:16];
if (WrEnQ[3])
BRAM3[haddrQ] <= HWDATA[31:24];
// do not use enable on read interface.
haddrQ <= HADDR[AWIDTH-1:2];
end
`ifdef CM_SRAM_INIT
initial begin
$readmemh(“itcm3”, BRAM3);
$readmemh(“itcm2”, BRAM2);
$readmemh(“itcm1”, BRAM1);
$readmemh(“itcm0”, BRAM0);
end
`endif
// --- AHB Outputs ---
assign HRESP = 1’b0; // OKAY
assign HREADYOUT = 1’b1; // always ready
assign HRDATA = {BRAM3[haddrQ],BRAM2[haddrQ],BRAM1[haddrQ],BRAM0[haddrQ]};
endmodule

Using an Arm toolchain such as Keil MDK (Microcontroller Development Kit) or DS-5, we can create
a hex file that can be read by $readmemh for SRAM initialization, using the fromelf utility with the
following command-line:

$> fromelf --vhx --8x4 image.elf –output itcm

This generates four hex files (itcm0, itcm1, itcm2 and itcm3), one for each byte lane, which need to
be available during the FPGA synthesis. The tool merges the data into the FPGA bitstream so that the
SRAM content can be set up during FPGA configuration stage.

2.2.3 Memory handling in ASIC designs


In ASIC designs, SRAM and NVM blocks cannot be generated from Verilog RTL in behavioral
synthesis. Typically, you need a specific memory generation tool (SRAM compiler) to create the SRAM,
and for embedded flash, you need to instantiate the flash macro manually.

In most cases, to connect a SRAM block to AHB, you can use the “cmsdk_ahb_to_sram” block, possibly
with a little bit of glue logic for signal protocol conversion. Additional considerations apply when low-
power support is a requirement, as SRAM macros usually have some low-power modes or even state
retention modes.

16
Chapter 2 | Introduction to system design with Cortex-M processors

To connect embedded flash macros to AHB, you need a flash interface controller. The interface on the
flash macros is vendor and process node-specific. However, Arm has worked with multiple embedded
flash vendors to define a Generic Flash Bus protocol (GFB, https://developer.arm.com/docs/ihi0083/a),
so most parts of the flash controller are generic; only a smaller part of the interface is process-dependent.
Arm provides generic flash controller IP, which is licensable as a part of the Corstone-101 product.

Since embedded flash macros are often relatively slow (e.g., around 30MHz to 50MHz access speed)
and many Cortex-M designs run at over 100MHz, cache systems are often required to reach desired
performance levels. To address this need, Arm also offers cache units such as the AHB flash cache,
which is part of the Cortex-M3 DesignStart Pro.

2.2.4 Memory endianness


When designing memory systems, one of the considerations is endianness. Most Cortex-M systems
today are based on little-endian memory systems. However, it is possible to create big-endian
Cortex-M systems as these processors support big-endian configuration options. When doing this, it
is important to make sure that the software developers of the product are aware so that they can use
correct compilation switches in their software projects.

Bits [31:24] [23:16] [15:8] [7:0] Bits [31:24] [23:16] [15:8] [7:0]

0x00000008 Byte 0xB Byte 0xA Byte 9 Byte 8 0x00000008 Byte 8 Byte 9 Byte 0xA Byte 0xB

0x00000004 Byte 7 Byte 6 Byte 5 Byte 4 0x00000004 Byte 4 Byte 5 Byte 6 Byte 7

0x00000000 Byte 3 Byte 2 Byte 1 Byte 0 0x00000000 Byte 0 Byte 1 Byte 2 Byte 3

Figure 2.3: Data arrangement in a Little-Endian system. Figure 2.4: Data arrangement in a Big-Endian system.

Please note that the endiann configuration only affect data accesses (including read-only data). Instructions
are always encoding as little endian. Also, access to the Private Peripheral Bus (PPB) is always in little endian.

2.3 Defining the peripherals


A microcontroller is not complete without a range of peripherals for various input/output and hardware
control functions such as timers. For the most basic Cortex-M based systems, we would expect to find
digital peripherals like:

„„General-purpose input/output (GPIO);

„„Timers;

„„Pulse Width Modulator (PWM) – usually for motor or power electronic system control;

„„UART for serial communication;

„„SPI (Serial Peripheral Interface) for external hardware modules such as LCDs;

„„I2C / I3C – commonly used for sensors.

17
System-on-Chip Design with Arm® Cortex®-M processors

In addition to these basic peripherals, a simple system might also integrate a group of registers for
various system control functions (e.g., clock source control, selection of low-power modes). This could
be integrated as part of the peripheral system, but additional care must be taken for system security
reasons. Typically, system management functions need to be restricted to privilege accesses only.

More information on digital peripheral designs is covered in Chapter 8 (page 171).

Microcontrollers also have analog interfaces like ADC (Analog to Digital Converter) and DAC (Digital
to Analog Converter). However, many FPGA devices do not support such peripherals. For ASIC
designs, typically the ADC and DAC IP need to be sourced from specialist IP providers.

2.4 Memory map definition


The architectures used in the Cortex-M processors define a memory map that allocates address ranges
into regions. This allows the built-in peripherals like the interrupt controller and debug components
to be accessed by simple memory access instructions, thus allowing system features to be accessible
in C program code. Having a predefined memory map also allows the Cortex-M processors to
be optimized for performance. For example, a memory region called CODE at the beginning of
the memory is dedicated to program memory, and a memory region called SRAM starting from
0x20000000 is dedicated to data memory. In the Cortex-M3 processor, CODE and SRAM regions use
separated buses to allow the system to utilize the performance benefits of a Harvard bus architecture.
It is possible to use the memory regions differently, but it may not be able to get the best performance
by doing so.

The general layout of the memory map is shown in the diagram below (Figure 2.5).

0xE00FFFFF 0xE000EFFF
0xFFFFFFFF
Reserved Private Peripheral System Control
Bus (PPB) Space (SCS)
Private peripherals including
building interrupt controller 0xE0000000 Private Peripheral Bus
(NVIC) and debug components
0xDFFFFFFF 0xE0000000 0xE000E000
Mainly used as external
External Device 1GB
peripherals.
0xA0000000
0x9FFFFFFF
Mainly used as external
External RAM 1GB
memory.
0x60000000
0x5FFFFFFF
Mainly used as peripherals. Peripherals 0.5GB
0x40000000
0x3FFFFFFF
Mainly used as static RAM. SRAM 0.5GB
0x20000000
Mainly used for program 0x1FFFFFFF
code. Also, provides CODE 0.5GB
exception vector table after 0x00000000
power-up

Figure 2.5: Memory map overview.

18
Chapter 2 | Introduction to system design with Cortex-M processors

The top 512Mb of the System Level Memory contains a region for system control and reserved areas.
This bus provides access to the built-in interrupt controller and various debug components. Within the
PPB memory range, a special range of memory is defined as System Control Space (SCS). It contains
the interrupt control registers, system control registers, debug control registers, and so on. The
remaining system-level memory space from address 0xE0100000 is reserved.

By having a predefined memory map, it makes porting of applications easier as all of the Cortex-M
systems have a similar look and feel, and an identical address range for NVIC and SysTick timer, etc.
It also simplifies the boot code as there is no need to program the system to define the memory
attributes for different memory/device types.

There are some restrictions concerning what the memory maps look like:

1. In many Cortex-M processors, including Cortex-M0, Cortex-M0+, Cortex-M1, Cortex-M3, and
Cortex-M4, the initial vector table address must be zero after reset.

2. In Cortex-M3 and Cortex-M4 processors, there is an optional bit band feature that allows the first
1MB of SRAM and the first 1MB of Peripheral region to be bit addressable. When this feature is
enabled, the bit-band alias region is remapped to bit band address range, and therefore the bit-band
alias address range cannot be used for data memory or peripherals.

3. In Cortex-M1 and Cortex-M7 processors, the instruction TCM and data TCM has fixed memory
addresses (TCM sizes are configurable). Both of these TCMs are optional.

External to processor
(System bus)

0x3FFFFFFF
DTCM (1MB maximum)
SRAM 0x20000000
0.5GB
0x20000000
0x1FFFFFFF

CODE
External to processor
0.5GB
0x00000000 (System bus)
ITCM upper alias
(1MB maximum)
0x10000000

ITCM lower alias


(1MB maximum)
0x00000000

Figure 2.6: TCM memory map in Cortex-M1.

For example, in the Cortex-M1 processor, there are two TCM interfaces: the ITCM interface is
primarily for instruction memory (including literal data access inside a program), and the DTCM is
primarily for data transfers. If the TCM size is set to 0, the TCM interface is not used, and the transfers
are carried out on the system bus. The maximum size of the TCM supported on Cortex-M1 is 1MB for
each TCM interface.
19
System-on-Chip Design with Arm® Cortex®-M processors

The TCM interfaces on the Cortex-M1 processor are designed to be used with typical RAM blocks in
modern FPGA architecture. The accesses are single-cycles (i.e., have no wait state) and are limited to
a maximum size of 1MB each.

It is possible to add additional memory blocks on the system bus of the Cortex-M1 processor. The
original design of the Cortex-M1 system bus is based on AHB Lite protocol (AMBA version 3), it is
generic and allows wait state and error response. Please note that the Cortex-M1 design integrated
with the FPGA design tool might have been customized for a specific FPGA design environment and
might, therefore, have some cycle timing differences as a result.

The TCM interfaces on the Cortex-M7 processor are designed to be used with RAM blocks for ASIC
designs and support wait states. The maximum TCM size is 16MB each, but in practice, the TCM sizes
used in Cortex-M7 based microcontrollers are likely to be in the range of 64KB to 512KB. A large TCM
can increase the cost of the silicon due to the size of the area used and can have an impact on the
maximum clock frequency that can be achieved. For Cortex-M7, you can also add additional memories
on the AXI master interface.

Peripherals are typically placed in the Peripheral region of the memory map (0x40000000 to
0x5FFFFFFF). In most designs, peripherals are grouped into address ranges based on the bus segment
that they are placed in. For example, a Cortex-M based system can have multiple AHB and APB
peripheral buses. Bus bridges can be used to allow these buses to run at different clock frequencies.

When using Cortex-M3 and Cortex-M4 processors, if the designer would like to take advantage of the
bit-band feature which allows peripheral registers to be bit addressable (using bit-band alias), then the
peripherals that use this feature need to be in the first 1MB of the peripheral region. Similarly, when
supporting the bit-band feature for SRAM, the SRAM must be placed in the 1MB of the SRAM region.

When using Cortex-M23 and Cortex-M33 processors with TrustZone security extension enabled, the
memory map design needs to divide memory spaces into Secure and Non-secure ranges. More details
on this topic are covered in Section 3.5 AHB5 TrustZone support.

2.5 Bus and memory system design


When designing the bus system for a Cortex-M processor system, many factors need to be considered:

„„The bus interface on the Cortex-M processor being used – different Cortex-M processors can have
different bus interfaces (e.g., Harvard versus Von Neumann bus architecture).

„„The performance of memory blocks (e.g., if embedded flash memories are used for program storage
and the design need to provide high performance, then a cache unit should be considered).

„„The bus bandwidth of other bus masters in the system. For example, a USB controller is likely to
have a bus master interface and needs high data bandwidth to SRAM. In such cases, you might
need to have multiple blocks of SRAM and design the bus system to allow the processor and the

20
Chapter 2 | Introduction to system design with Cortex-M processors

USB controller to have concurrent access to SRAM blocks. Another type of common bus master is
DMA controller – DMA operations enable high-performance data transfers and device-driven data
transfers without software intervention.

„„The clock speed of peripheral buses – your designs might have multiple peripheral buses with
multiple clock speeds to enable low-power operations for some peripherals and higher performance
for peripherals that can benefit from lower access latency.

„„Security – with TrustZone based systems for Cortex-M23 and Cortex-M33 processors, security
management in bus system design is an important area to ensure that security measures cannot
be compromised. For some of the other Cortex-M systems without TrustZone, you might still
want to have some levels of security level management to handle the separation of privileged and
unprivileged software components.

Later on in this book, we cover some of the processor-specific bus system design concepts in Chapter 4.

2.6 TCM integration


In the case of system designs for Cortex-M1 and Cortex-M7 processors, memory blocks can be
connected to the processor using the TCM (Tightly Coupled Memory) interfaces. In most designs,
SRAM macros generated by SRAM can be connected to the processor via simple glue logic.

For microcontroller designs with the Cortex-M7 processor, it is unlikely that you will connect slow
memory blocks like an embedded flash to instruction TCM because accesses to TCM memories
bypass the caches. Therefore, for Cortex-M7 system designs, slow program memories are expected to
be connected via the AXI master interface.

For details of TCM integration, please refers to the Integration and Implementation Manual (IIM) in the
product bundle.

2.7 Cache integration


Another type of memory that needs to be integrated is caches. Currently, these Cortex-M processor
products support cache(s):

„„The Cortex-M7 processor supports optional built-in instructions and data caches (they are
optional).

„„The Cortex-M35P processor supports an optional built-in program cache (sometimes referred to as
instruction cache but technically it is a unified cache that can cache both instruction and read-only
data).

For details of cache RAM integration on these processors, please refers to the IIM in the product
bundles.

21
System-on-Chip Design with Arm® Cortex®-M processors

2.8 Defining the processor’s configuration options


The source codes of the Cortex-M processors are highly configurable. You can configure the options
using Verilog parameters in the module instantiation. Also, some of the newer Cortex-M processors
have configuration scripts to help set up configurations of the product bundle.

System designers using the Cortex-M processor source code need to study the configuration options
documented in the Integration and Implementation Manual (IIM) carefully to select the right options
for their applications. Some other parts of the product bundles also need to be configured with
matching options. If the options of some parts of the deliverable are not configured correctly, items
like the execution testbench might not work correctly.

2.9 Interrupt signals and related areas


Assigning interrupt numbers and connecting interrupt signals from peripherals to the processor is
possibly one of the easiest parts of the system design task. Normally you have several interrupt signals
from peripherals to be connected to the processor. The allocation of interrupt signals affects the C
head files for software development, including the vector table definitions and interrupt numbers,
which are both visible to the software.

The maximum number of interrupts supported by the Cortex-M processors are listed in Table 2.1:

Processor Maximum number of interrupts

Cortex-M0, Cortex-M0+, Cortex-M1 32

Cortex-M3, Cortex-M4, Cortex-M7, Cortex-M23 240

Cortex-M33, Cortex-M35P 480

Table 2.1: Maximum number of interrupts in the Cortex-M processors.

If the number of interrupt signals exceeds the maximum number support, it is possible to merge
multiple interrupt lines and share one interrupt service routine (ISR) and determine which interrupt to
be serviced in the ISR by software.

On all current Cortex-M processors, the interrupt signals:

„„Are active high and must be synchronous to the processor’s system clock signal;

„„Can be level triggered or pulse triggered. If using pulse triggered, the duration of the pulse must be
at least one clock cycle.

The unused or not implemented interrupt input pins should be tied to 0 and must not be allowed
to enter unknown state ‘X.’ (e.g., if a peripheral outputs X in its interrupt line when the peripheral is
powered down, the signal level must be clamped to 0 before the power down happened). Issues with
unknown or ‘X’ signal values generally affect simulation but represent possible unexpected values
when using ASIC or FPGA.
22
Chapter 2 | Introduction to system design with Cortex-M processors

If the peripheral interrupt is generated at a different clock domain, a synchronization circuit (such as
the example in Figure 2.7) is needed to remove potential metastability issue and to prevent transients
from forming unexpected pulses.
Interrupt
DFF DFF DFF
Output
Interrupt input D Q D Q D Q
(Different clock
domain)
CLK CLK CLK

Double flip-flop
Remove pulses form by
synchronization to
glitches
prevent metastability

Figure 2.7: Interrupt synchronizer to convert interrupt signal from one clock domain to processor’s clock domain.

Cortex-M processors have a Non-Maskable Interrupt (NMI) input. In common embedded systems the
NMI could be connected to:

„„Voltage monitoring logic (also known as brownout detector) to ensure that the system is shut down
correctly when support voltage drops to a certain value or

„„The NMI could be connected to a watchdog timer to carry out remedial actions if the system has
stopped normal operation.

NMI is unlikely to be used as the interrupt for normal peripherals. This is because the built-in
interrupt controller NVIC already provides interrupt prioritization, so each peripheral can already be
programmed as the highest priority by just using the normal IRQ connection. Also, a fault generated
within the NMI handler can cause the processor to enter lockup state, which can be problematic
for some applications. Faults in normal interrupt handlers allow the Hard Fault handler (or other
configurable fault handlers) to be triggered and executed.

Another characteristic of the NVIC is that it can handle interrupt requests in the form of pulse as well
as level signal. If a peripheral generates an interrupt request in the form of a pulse signal, the request is
held by pending status within the NVIC until the interrupt request is processed, or when the pending
status is cleared manually. If a peripheral generates an interrupt request in the form of a level signal,
the interrupt handler must clear the request at the peripheral.

The key advantage of a pulsed interrupt is that it saves a few clock cycles in the ISR that there is no
need to clear the interrupt requests at the peripherals. However, in many cases, a level-triggered
interrupt is preferred because:

„„Cross clock domain synchronization of level-triggered interrupts is simpler than pulsed interrupts.
In the case where pulse interrupt synchronization logic is used, two successive interrupt request
pulses could be merged into one after the synchronizer due to the latency of the synchronization,
which can be confusing.

23
System-on-Chip Design with Arm® Cortex®-M processors

„„If the interrupt event occurred when the processor is reset, the interrupt event could be lost.

„„Level trigger interrupts can remain at a high level to indicate an additional service is needed by the
peripheral (e.g., when additional data is available in a receiver’s FIFO).

„„Easier for debugging (e.g., in Verilog simulation, where it is hard to tell if there has been an interrupt
event unless the event information is kept by, for example, a waveform database).

„„The peripheral design can be reused for other processors that do not support pulsed trigger
interrupts.

In addition to the number of interrupts, there are other configuration options related to interrupt handling:

Number of interrupt priority levels – In Armv7-M and Armv8-M Mainline processors, the
programmable interrupt priority level registers has configurable width from 3-bits to 8-bits. Typically,
the options of 3-bit to 4-bit are used, and some devices do support 5-bit. Most applications do not
need many interrupt priority levels, so eight levels (3-bit) is likely to be sufficient.

Wakeup Interrupt Controller (WIC) – An optional block to handle interrupt detection while the
processor is in-state retention power gating (SRPG) or when the processor’s clock is completely
stopped. If the WIC feature is implemented and enabled, the interrupt masking information is
transferred from NVIC to WIC automatically before entering sleep mode. The WIC then takes
over the role of interrupt event detection and can generate a wakeup request to power management
blocks in the system when an enabled interrupt event is detected. The interrupt pending status
is held in the WIC when the processor is waking up and transfers the interrupt request to NVIC
when the processor is back up. At the same time, the masking information inside the WIC is cleared
automatically by hardware as the NVIC is back running.

2.10 Event interface


Apart from the Cortex-M1 processor, all other Cortex-M processors have an event input (typically
named RXEV – receive event) and an event output (typically named TXEV – transmit event). The RXEV
input is used to wake up a processor from Wait-For-Event (WFE) sleep operation, and TXEV output
allows a processor to send an event to another processor in WFE sleep using the SEV (Send event)
instruction. These signals are active high single-cycle pulse.

The event interface is typically used in multi-core systems to allow one processor to wake up another
during spinlocks. In RTOS semaphores, if a processor is waiting for a spinlock, it can enter sleep
mode using WFE to save power and wakes up if there is an interrupt to serve or if there is an
event from another processor. By crossing over the event interface signal (as shown in Figure 2.8),
processors in a dual-core system can wake up each other from WFE sleeps using the SEV (send event)
instruction.

24
Chapter 2 | Introduction to system design with Cortex-M processors

Cortex-M Cortex-M

TXEV TXEV

RXEV RXEV

Figure 2.8: Example connection of event interface in a dual-core system.

Events could also be generated from peripherals or DMA controller, but normally interrupts are more
suitable for that purpose as we need software to react to those hardware events vis ISRs.
For single-processor systems, it is fine to tie RXEV to 0 and leave TXEV unconnected.

Please note: The event interface on the Cortex-M processor is unrelated to the definition of events in
RTOS. In RTOS, an application thread waiting for a certain operation X to be carried out can call an OS
API that waits for an event Y. This API call also takes the thread out of the ready task queue. When the
specified operation X has been carried out (e.g., in another thread or an ISR), the other thread or ISR that
carried out the operation X can call another OS API to set the OS event Y. This puts all the waiting threads
that were waiting for the operation X to be put back in the ready task queue to resume operation.

2.11 Clock generation


There are several clock signals on the Cortex-M processors. Over the years there have been different
design approaches and therefore the clock and reset signal names vary between different processors.

Most of the existing Cortex-M processors provide:

„„Free-running clock (if gated, all logic in the processor stopped and needs external logic blocks such
as WIC to handle interrupt detection and wakeup);

„„System clock (can be gated during sleep mode);

„„Debug clock(s) – this includes the JTAG or Serial Wire debug clock signals for debug interface, and
also a clock signal for internal debug components which can be gated if there is no active debug
connection.

The free running clock, system clock and debug clock (except the clock for the debug interface and
DAP interface on Cortex-M3/M4 processors) must be synchronous and in the same phase. The
separation of clock signals is to allow the system power to be reduced by gating off some of the clock
signals when they are not needed.

„„In Cortex-M0 and Cortex-M3 processors, the design exported GATEHCLK signal is asserted when
the processor is in sleep mode, and there is no debug connection. This signal can be used to gate off
the system clock.

„„In some of the Cortex-M processors, the clock gating logic is done internally and so might not have
all these clock signals visible on the top-level.
25
System-on-Chip Design with Arm® Cortex®-M processors

It is important not to gate off the system clock when the processor is running. In system-level designs,
there can be multiple clock sources, and a glitch-less clock switching circuit would be needed. The clock
switching circuit is outside of the processor and is normally application and process node dependent. In
FPGA designs, you can design an FSM that controls the PLLs (Phase-Locked Loops) and gate-off the clock
signals to the processor subsystem during PLL configuration changes.

PLL config

Reference Generated Clock buffer


clock source clock Clock
Phase Locked Clock output
Loop (PLL) gate

Lock
Enable
status Control

Finite State Clock is


Machine (FSM) stable?

Figure 2.9: Example clock generation arrangement in a FPGA system design.

Depending on the FPGA design tool being used, the system clock generation/control logic might be
generated by the tools. In this case, there is no need to develop your own clock generation/control logic.

In ASIC designs, you might have the following clock sources:

„„External crystal oscillator for medium speed (e.g., 1MHz to 12MHz) – this might be turned off by
default after a reset to save power. Instead of using a higher frequency crystal to generate higher
frequency clocks, it is more common to use a PLL to generate a high clock speed when needed to
avoid having a high-frequency clock running all the time to save power.

„„Internal RC oscillator for medium speed (e.g., 1MHz to 12MHz). This will use less power than a crystal
oscillator, but will not provide an accurate frequency reference for timing or peripheral interfaces.

„„External 32KHz oscillator for real-time clock (might also be used for system management).

PLL config

Clock
PLL
Internal R-C switch
oscillator
On/off
Clock buffer
control Glitch Free running
free clock output
Clock
Fast switch
crystal Clock buffer
oscillator Clock Gated
On/off gate system clock
control
output
Real-time Power
clock management
32KHz (RTC) control
crystal
oscillator

Figure 2.10: Example clock generation arrangement in an ASIC system design.

26
Chapter 2 | Introduction to system design with Cortex-M processors

In ASIC/SoC implementations, the system can boot-up from the internal RC oscillator and switch
over to external crystal oscillator or PLL for clock source when needed. PLL can provide higher clock
frequency for high-performance operations.

2.12 Reset generation


In the Cortex-M processors there are usually at least two reset signals, in some cases three signals:

„„System reset;

„„Debug reset;

„„Debug interface reset (e.g., nTRST) for JTAG interface;

„„Optionally you might find a power-on reset, which resets both the system and debug logic.

If power-on reset is present, it resets both the system and debug system. The reason that we separate
the reset into two signals is to allow the processor to be reset without affecting the debug system.
Otherwise, the debug settings like breakpoints, watchpoints, and the debug connection from the
debugger to the core, would be lost each time the processor core is reset.

The processor also outputs a reset request signal called SYSRESETREQ. This is controlled by a register
bit in the Application Interrupt and Reset Control Register (AIRCR) inside System Control Space. This
allows:

„„Software to request a system reset, for example, in the case of fault error handling;

„„Debugger to request a system reset. This is essential to allow the debugger to request a reset of the
targeted processor.

Designers must make sure that:

„„SYSRESETREQ only generates a system reset but not debug reset or power-on reset;

„„SYSRESETREQ does not generate a system reset in a combinatorial path (in other words – it must
be registered by registers that are not affected by the system reset), as the SYSRESETREQ output
is affected by a system reset and the use of a combinatorial path for reset generation causes a reset
glitch.

All of the Cortex-M processors use an asynchronous active-low reset signal and must be de-asserted
synchronously to the system and debug clock to prevent timing violations. This ensures that most of
the registers can be reset when the clock is not running. However, most of the Cortex-M processors
require the reset to last at least two clock cycles. This arrangement has the following benefits:

„„Enables synchronization flip-flops, which present in double DFF synchronizers to be reset.

27
System-on-Chip Design with Arm® Cortex®-M processors

„„In the case where the assertion of reset causes timing violations and leads to metastability, the
multi-cycle nature of reset ensures the metastability is cleared up. To ensure reset de-assert occurs
at the correct time, a simple reset generator could be used for a Cortex-M0/M0+/M1/M3/M4
processor. Figure 2.11 shows such an example.

Registers to ensure Buffer to


external power-on-reset is generate
synchronized to HCLK debug reset

1 D Q D Q DBGRESETn
buffer
External power- clr clr
on-reset
(active low)
Registers to hold reset
for 2 cycles after
HCLK SYSRESETREQ

SYSRESETREQ D Q D Q SYSRESETn
buffer
clr clr

Figure 2.11: A simple reset generator for the Cortex-M processors.

Assuming the Cortex-M1 is used (some other Cortex-M processors have different signal names for
reset signals): The Cortex-M1 processor generates the SYSRESETREQ signal. Since the Cortex-M1
processor can be reset by SYSRESETn, the SYSRESETREQ signal must not drive SYSRESETn in a
combinatorial path. Otherwise, it could result in a race condition where SYSRESETREQ gets cleared
in a very short time after assert, as it gets cleared by its output. This could result in some parts of
the processor getting reset and other parts not. For this reason, the SYSRESETREQ signal must be
registered by a separated flip-flop that is not affected by SYSRESETn before being used to generate
SYSRESETn. In the example above (Figure 2.11), the reset request from the SYSRESETREQ is held in
two registers that are reset by DBGRESETn, or if using Cortex-M3/M4, you can use power-on reset in
Cortex-M3/M4 processor.

We can also design the reset generator so that it can optionally reset the system if it enters lock-up
state. To make this behavior controllable, a programmable register would be needed in your FPGA/
system design to specify if a lock-up state can cause a reset. This register is not provided in the
Cortex-M processor core as such requirement is application dependent. During software development,
the control signal at this external reset control register can be set to 0 to disable the automatic reset.
In a production system, the reset control register can be set to 1 so that when the system enters lock-
up state, the SYSRESETn is activated automatically.

28
Chapter 2 | Introduction to system design with Cortex-M processors

Registers to ensure Buffer to


external power-on-reset is generate
synchronized to HCLK debug reset

1 D Q D Q DBGRESETn
buffer
External power- clr clr
on-reset
(active low)
Registers to hold reset
for 2 cycles after
HCLK SYSRESETREQ

D Q D Q SYSRESETn
buffer
clr clr

SYSRESETREQ
LOCKUP

Programmable register to
Reset control register allow the system to be
reset on a lock-up

Figure 2.12: A reset generator to allow automatic reset at lockup state.

Depending on the FPGA design tool being used, the system reset controller might already be included.
In this case, there is no need to develop your own reset controller.

2.13 SysTick
The SysTick timers in the Cortex-M processors support external reference “clock.” Technically the reference
“clock” is not a clock signal, as it is sampled by D-flip flops inside the SysTick at the processor’s clock speed.

The SysTick interface also provides a calibration input, which is fed to the SysTick calibration value
register:

Signal SysTick calibration value register

STCALIB[25] NOREF (bit 31) 0 – reference clock is implemented


1- reference clock is not implemented

STCALIB[24] SKEW 0 – TENMS calibration value is exact


1 – TENMS calibration value is skewed (inexact)

STCALIB[23:0] TENMS SysTick reload value for 10ms (100Hz)

Table 2.2: Signals for SysTick calibration value register.

The support for SysTick reference clock and calibration value are optional.

„„If no reference clock is provided, STCALIB[25] needs to tied high.

„„If TENMS is not used, STCALIB[23:0] should be tied low, and STCALIB[24] needs to tied high.

29
System-on-Chip Design with Arm® Cortex®-M processors

In CMSIS-CORE, an alternate way for software to determine system clock speed is provided that uses
a software approach: the SystemCoreClock variable should provide the clock frequency information,
and that is initialized and updated by the software when the clock settings are updated.

2.14 Debug integration


Debug integration typically involves several interfaces:

„„Interface for debug connection (JTAG or Serial Wire Debug) – for connecting a debugger to the
hardware target to carry out halting, stepping, restart, resume, setting breakpoints/watchpoints,
access to memories and peripherals. Debug connection is also used for downloading code and flash
programming.

„„Interface for trace data (connecting ATB from the processor and ETM to trace port) – enables the
debugger to obtain real-time trace information, either using trace port protocol which contains
multiple data bits (usually 4-bit) and a clock signal or using a single pin trace output protocol for
trace with lower bandwidth (e.g., instrumentation trace, event trace). The trace interface is optional
and is not available on Cortex-M1, Cortex-M0 and Cortex-M0+ processors.

„„CoreSight timestamp generation – CoreSight timestamp feature integrates timing information


into the trace package. Real-time trace operation can take advantage of this to allow the debugger
to restructure timing information. To allow this to work, some Cortex-M processors and ETM
(Embedded Trace Buffer) have a timestamp interface. Typically, a simple counter is used to generate
the timestamp value.

„„Debug authentication control – Cortex-M processors provide hardware interface signals to


allow other hardware blocks in the system to control whether debug and trace operations are
allowed. Typically, debug authentication is controlled by security management IP blocks based on
certificate-based authentication methods. For Armv8-M processor systems, there are separate
debug authentication signals to define debug access permissions for Secure and Non-secure
environments.

„„Debug system clock and reset generation, and power management – depending on which
processor is used, the debug system can have its own clock and reset signals, and in some designs,
debug logic can be powered down or clock gated if not being used.

More details on the debug interface are covered in Chapter 5. Please note, here we only cover single-
core designs. In the case of multi-core designs, the debug integration should be handled by Arm
CoreSight SoC-400/600 products.

30
Chapter 2 | Introduction to system design with Cortex-M processors

2.15 Power management features


Cortex-M processors (except Cortex-M1 which is designed for FPGA) support a range of low-power
features.

„„Sleep modes – architecturally, the processor can have sleep and deep sleep, but these sleep modes
could be extended with additional system-specific registers to have addition granularity of sleep
characteristics. The processors have sleep mode status output signals so that system designers can
use these signals to control clock gating and other power management hardware.

„„Sleep hold interface – in the case where a system designer utilizes sleep mode signals to turn off
hardware resources (e.g., program ROM), the wake-up process can take a while (e.g., hundreds
to thousands of clock cycles). In such cases, it is essential to be able to hold off the processor’s
program execution, and the sleep hold interface is designed exactly for this purpose. To use this
feature, the system designer needs to design a simple Finite State Machine (FSM) to handle the
handshaking with the sleep hold interface.

„„Wakeup Interrupt Controller (WIC) – explained in this chapter earlier, the WIC is an optional
feature that allows interrupts or other wakeup events to be detected when the processor is in
a powered-down state, retention state, or if the clock to the processor is gated off. The system
designer can customize the example WIC design if needed.

„„Debug power management – the debug interface modules provides handshaking signals to
indicate whether there is a debugger connection, which allows system designers to implement
power management for the debug system of the processors if needed. For example, in Cortex-M0,
Cortex-M0+, Cortex-M7, Cortex-M23, Cortex-M33, and Cortex-M35P processors, there is a
separate debug power domain that can be powered down if there is no debug connection.

System designers are also likely to integrate additional power management features for memory
blocks, clock generation and distribution systems, and some of the peripherals.

2.16 Top-level pin assignment and pin multiplexing


One of the tasks that chip designers need to do is to define the top-level signals of the devices. Often,
many of the pins on the chips carry multiple functions. For example, a pin might be configurable to
work as a GPIO pin, a communication interface pin, or a debug/trace pin. You can find examples of pin
multiplexing in Cortex-M3 DesignStart Eval.

Apart from the debug and trace signals, normally there is no need to expose other interfaces of the
Cortex-M processors directly to the top-level of the devices. For external interrupt generation, usually,
that is handled by GPIO blocks so that external hardware can trigger interrupts via GPIO. In some
cases, chip designers can also implement a signal path to allow off-chip hardware to generate an
event pulse to the Cortex-M processor so that it can wake up from WFE (Wait-for-event) instruction;
however, this is not essential for many systems.

31
System-on-Chip Design with Arm® Cortex®-M processors

When designing top-level pins, several areas related to the Cortex-M processors should be
considered:

„„In most cases, the debug interface pins (JTAG or Serial Wire Debug) need to be accessible at the
device’s top-level by default. For Cortex-M3, Cortex-M4 and Cortex-M33 processors, the debug
interface module supports dynamic protocol switching, so it is possible to expose just two pins
of the SWD debug by default. If there is a need to switch over to JTAG, then you can program a
device-specific pin multiplexer (mux) control register to expose the other pins for JTAG, and then
apply a switchover sequence to start JTAG operations.

„„The SWD interface requires a tristate pin for the data connection (SWDIO), which is enabled when
SWDIEN is high.

„„If the debug interface is multiplexed with other peripheral I/O pins, the peripheral I/O operations
can cause a debug connection to be disconnected.

„„The debug and trace interface provides a range of status signals to allow some of the signals to be
multiplexed with functional pins. It is also possible to use device-specific programmable registers
to help control the pin multiplexing. However, in such cases, the device vendor needs to provide
the details of the setup sequence for various debug tools to allow them to work correctly with
the device.

„„When creating systems using Armv8-M processors with TrustZone, the debug connection might
contain Secure information, and therefore the pin multiplexing logic needs to prevent Non-secure
software from seeing activities in the debug connection.

2.17 Miscellaneous signals


Cortex-M processors provide various status signals that can be used by system designers. For
example, in Figure 2.12, we show that the LOCKUP status could be used to generate system resets
automatically. The availability of other status signals depends on the processor you use. Please refer
to documentation in the product bundle for more information.

Newer Cortex-M processors support a CPUWAIT signal. This is used to delay the start-up of the
processor after releasing from reset. In most single-core systems, this pin can be tied low. In multi-core
SoC designs when the Cortex-M subsystem is running a program in SRAM, the CPUWAIT signal can
be used to delay the boot-up so that a different bus master can transfer the program image into the
SRAM. After the program image is loaded, the CPUWAIT signal can be released, and the Cortex-M
processor can start executing the program.

32
Chapter 2 | Introduction to system design with Cortex-M processors

2.18 Sign off requirements


For designers using the Cortex-M processors for ASIC/SoC design projects, please note that the
Cortex-M family of products have some sign-off requirements documented in the IIM of the product
bundle. This contains a checklist to help designers to minimize the risk of incorrect implementations.

33

You might also like