Papers by Jordi Cortadella
2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2017
Voltage noise is the main source of dynamic variability in integrated circuits and a major concer... more Voltage noise is the main source of dynamic variability in integrated circuits and a major concern for the design of Power Delivery Networks (PDNs). Ring Oscillators Clocks (ROCs) have been proposed as an alternative to mitigate the negative effects of voltage noise as technology scales down and power density increases. However, their effectiveness highly depends on the design parameters of the PDN, power consumption patterns of the system and spatial locality of the ROCs within the clock domains. This paper analyzes the impact of the PDN parameters and ROC location on the robustness to voltage noise. The capability of reacting instantaneously to unpredictable voltage droops makes ROCs an attractive solution, which allows to reduce the amount of decoupling capacitance without downgrading performance. Tolerance to voltage noise and related benefits can be increased by using multiple ROCs and reducing the size of the clock domains. The analysis shows that up to 83% of the margins for voltage noise and up to 27% of the leakage power can be reduced by using local ROCs.
IEEE Access, 2020
Worst-case execution time (WCET) analysis of systems with data caches is one of the key challenge... more Worst-case execution time (WCET) analysis of systems with data caches is one of the key challenges in real-time systems. Caches exploit the inherent reuse properties of programs, temporarily storing certain memory contents near the processor, in order that further accesses to such contents do not require costly memory transfers. Current worst-case data cache analysis methods focus on specific cache organizations (LRU, locked, ACDC, etc.). In this article, we analyze data reuse (in the worst case) as a property of the program, and thus independent of the data cache. Our analysis method uses Abstract Interpretation on the compiled program to extract, for each static load/store instruction, a linear expression for the address pattern of its data accesses, according to the Loop Nest Data Reuse Theory. Each data access expression is compared to that of prior (dominant) memory instructions to verify whether it presents a guaranteed reuse. Our proposal manages references to scalars, arrays, and non-linear accesses, provides both temporal and spatial reuse information, and does not require the exploration of explicit data access sequences. As a proof of concept we analyze the TACLeBench benchmark suite, showing that most loads/stores present data reuse, and how compiler optimizations affect it. Using a simple hit/miss estimation on our reuse results, the time devoted to data accesses in the worst case is reduced to 27% compared to an always-miss system, equivalent to a data hit ratio of 81%. With compiler optimization, such time is reduced to 6.5%.
Springer Series in Advanced Microelectronics, 2002
Chapter 2 has already outhned the main concepts and techniques behind our approach to the design ... more Chapter 2 has already outhned the main concepts and techniques behind our approach to the design of asynchronous control circuits. The key stage in this approach is logic synthesis from Signal Transition Graphs (STGs), a model which offers important advantages to the asynchronous controller and interface designer. On one hand, STGs are very similar to Timing Diagrams, which can be
The main purpose of this book is to present a methodology to design asynchronous control circuits... more The main purpose of this book is to present a methodology to design asynchronous control circuits, i.e. those circuits that synchronize the operations performed by the functional units of the data-path through handshake protocols.
Integrated Computer-Aided Engineering, 1998
Decisions taken at the earliest steps of the design of an electronic circuit may h a ve a signi c... more Decisions taken at the earliest steps of the design of an electronic circuit may h a ve a signi cant impact on the characteristics of the nal implementation. This paper illustrates how p o wer consumption issues can be tackled at algorithmic and architectural level during the design of application-speci c integrated circuits ASICs in an embedded system scenario. A set of RTL transformations aiming at reducing power consumption are proposed and the potential bene ts evaluated. The common idea behind the transformations is to reduce the activity of the data-path functional units e.g. adders, multipliers by minimizing the switching activity of their input operands. Functional units contribute highly to the power consumption of the data-path. Preliminary evaluations obtained by simulation show that signi cant improvements can be achieved. Finally, the paper demonstrates how some of the presented transformations can be automated and incorporated in high-level synthesis tools.
Springer Series in Advanced Microelectronics, 2002
As we discussed in the previous chapter, some violations of STG and SG implementabihty conditions... more As we discussed in the previous chapter, some violations of STG and SG implementabihty conditions, such as output persistency and consistency, are considered to be essential for implement ability, and thus if violated they must be fixed by the designer. Complete State Coding (CSC) is also required for implement ability, however a specification that does not satisfy it can be
Proceedings - International Conference of the Chilean Computer Science Society, SCCC, 2008
Asynchronous data communication mechanisms (ACMs) have been extensively studied as data connector... more Asynchronous data communication mechanisms (ACMs) have been extensively studied as data connectors between independently timed concurrent processes. In this work an automatic method for synthesis of re-reading ACMs is introduced. This method is is oriented to the generation of hardware artifacts. The behavior of re-reading ACMs is formally defined and the correctness properties are discussed. Then it is shown how to generate the ACMs specifications and how they can be translated into a proper hardware implementation. Verilog has been used as the target language to describe the hardware being synthesized.
Abstract—An approach for cell routing using gridded design rules is proposed. It is technology-in... more Abstract—An approach for cell routing using gridded design rules is proposed. It is technology-independent and parameteri-zable for different fabrics and design rules, including support for multiple-patterning lithography. The core contribution is a detailed-routing algorithm based on a Boolean formulation of the problem. The algorithm uses a novel encoding scheme, graph theory to support floating terminals, efficient heuristics to reduce the computational cost and minimization of the number of unconnected pins in case the cell is unroutable. The versatility of the algorithm is demonstrated by routing single- and double-height cells. The efficiency is ascertained by synthesizing a library with 127 cells in about one hour and a half of CPU time. The layouts derived by the implemented tool have also been compared with the ones from a commercial library, thus showing the competitiveness of the approach for gridded geometries. Index Terms—Detailed routing, cell generation, satisfiabilit...
The energy consumption due to I O pins is a substantial part of the overall chip consumption. Thi... more The energy consumption due to I O pins is a substantial part of the overall chip consumption. This paper gives an overview of the Working Zone Encoding WZE method for encoding for low p o w er the external address and data buses, based on the conjecture that programs favor a few working zones of their address space at each instant. In such cases, the method identi es these zones and sends through the address data bus only the o set of this reference data value with respect to the previous reference data value to that zone, along with an identi er of the current w orking zone. This is combined with a one-hot encoding for the o set. The paper then focuses on preliminary work on the following two topics: reduction of the e ect of the WZE delay on the bus access time by o v erlapping this delay with the virtual to physical address translation. Although the modication to allow this overlapping might increase the bus energy, simulations of the SPEC benchmarks indicate that for a page size of 1 KB or larger the e ect is negligible. extension of the technique for the data bus to some multimedia applications which are characterized by having packed bytes in a word. For two t ypical applications, the data-only data bus and data-only address bus I O activity is reduced by 74 and 51 with respect to the unencoded case, and by 68 and 33 with respect the best of the rest of the encoding techniques.
This tutorial aims at motivating the audience to consider asynchronous circuits as a competitive ... more This tutorial aims at motivating the audience to consider asynchronous circuits as a competitive alternative to solve some of the design problems inherent to submicron technologies. One of the main reasons why designers are reluctant to incorporate asynchrony in their systems is the difficulty to design asynchronous circuits. Asynchronous circuits are promising to tackle problems such as electro-magnetic interference, power consumption, performance, and modularity of digital circuits. The tutorial will introduce state-of-the-art tools and methodologies for their design. It will cover aspects such as specification, architectural design and controller synthesis tools, of asynchronous circuits. The tutorial will concentrate on a particular design methodology for control circuits based on specifications with Signal Transition Graphs. It will also cover design strategies for the microarchitecture, data-path and control circuits that have been successfully applied in the design of the asy...
2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)
Bi-decomposition is a design technique widely used to realize logic functions by the composition ... more Bi-decomposition is a design technique widely used to realize logic functions by the composition of simpler components. It can be seen as a form of Boolean division, where a given function is split into a divisor and quotient (and a remainder, if needed). The key questions are how to find a good divisor and then how to compute the quotient. In this paper we choose as divisor an approximation of the given function, and characterize the incompletely specified function which describes the full flexibility for the quotient. We report at the end preliminary experiments for bi-decomposition based on two AND-like operators with a divisor approximation from 1 to 0, and discuss the impact of the approximation error rate on the final area of the components in the case of synthesis by three-level XOR-AND-OR forms.
Despite its development several decades ago and several very beneficial properties asynchronous l... more Despite its development several decades ago and several very beneficial properties asynchronous logic design, which is data driven and runs as fast as possible in all situations, is rarely used nowadays. Reasons are of course its disadvantageous properties such as bad testability but also required sophisticated knowledge for designers and missing tools. In this paper we draw a path to tackle the latter points by suggesting a tool/way to generate multiple circuit implementations from a single description. We are aiming to convert specifications written in various input languages, e.g. C or VHDL, to an unified Internal Representation (IR). This IR is composed of building blocks (semantic vocabulary) specified through the Abstract State Machine (ASM) based formal method. The ASM artifact is then used to generate the circuit in the desired (a)synchronous design style. As short term goal we aim to train developers by reading synchronous descriptions and converting them to asynchronous de...
Current interest in elasticity is motivated by the difficulties with timing and communication in ... more Current interest in elasticity is motivated by the difficulties with timing and communication in large synchronous designs in nanoscale technologies [1]. Elastic circuits promise novel methods for microarchitectural design that can use variable latency components and tolerate static and dynamic changes in communication latencies, while still employing standard synchronous design tools and methods. At Intel, there is an extensive investigation of hardware design based on a specific elastic protocol, called SELF (Synchronous Elastic Flow)[2]. Our aim here is to present theoretical foundations of SELF. Every elastic circuit E implements the behavior of an associated standard (non-elastic) circuit C, as in the adder example above. For each wire X of C, there are three in E : the data wire DX , and the single-bit control wires VX and SX (valid and stop). This triple of wires is a channel of E . A transfer along the channel occurs when VX = 1 and SX = 0, thus requiring cooperation of the ...
2020 IEEE International Symposium on Circuits and Systems (ISCAS), 2020
This paper presents a new transistor placement method applied to the ASTRAN EDA tool, an open-sou... more This paper presents a new transistor placement method applied to the ASTRAN EDA tool, an open-source solution for the automatic design of complex digital gates. Although it currently reaches an optimized solution through a Threshold Accepting approach, ASTRAN does not guarantee a minimum-width placement. In this paper, a method based on Boolean satisfiability is proposed, ensuring an optimal solution for the transistor placement task through modeling the problem into a set of Boolean variables and clauses aware of four design rule constraints. Experiments comparing the proposed method and the current ASTRAN placement technique have shown reductions in the layout area. Furthermore, our method achieved a significant improvement regarding runtime, an essential feature for designing digital circuits and systems on-demand.
Simultaneous switching noise has become an important issue due to its signal integrity and timing... more Simultaneous switching noise has become an important issue due to its signal integrity and timing implications. Therefore a lot of time and resources are spent during the PDN design to minimize the supply voltage variation. This paper presents the self-adaptive clock as an alternative to tolerate the critical path delay variation due to supply noise thanks to its self-adaptable nature. A self-adaptive clock generation circuit is proposed in this paper and its benefits, in terms of clock period reduction, are assessed under a realistic supply noise obtained through simulation for different switching activities.
Bi-decomposition is a design technique widely used to realize logic functions by the composition ... more Bi-decomposition is a design technique widely used to realize logic functions by the composition of simpler components. It can be seen as a form of Boolean division, where a given function is split into a divisor and quotient (and a remainder, if needed). The key questions are how to find a good divisor and then how to compute the quotient. In this paper we choose as divisor an approximation of the given function, and characterize the incompletely specified function which describes the full flexibility for the quotient. We report at the end preliminary experiments for bi-decomposition based on two AND-like operators with a divisor approximation from 1 to 0, and discuss the impact of the approximation error rate on the final area of the components in the case of synthesis by three-level XOR-AND-OR forms.
A novel approach for timing-driven logic decomposition is presented. It is based on the combinati... more A novel approach for timing-driven logic decomposition is presented. It is based on the combination of two strategies: logic bi-decompositionof Boolean functions and treeheight reduction of Boolean expressions. This technologyindependent approach allows to find tree-like expressions with smaller depths than the ones obtained by state-of-theart techniques. Experimentalresults show an average delay reduction of more than 20% with regard to speed up in SIS.
This chapter reviews the impact of Petri nets in one of the domains in which they have played a p... more This chapter reviews the impact of Petri nets in one of the domains in which they have played a predominant role: asynchronous circuits. The author also discusses challenges and topics of interest for the future.
Uploads
Papers by Jordi Cortadella