10 1 1 113 3740 PDF
10 1 1 113 3740 PDF
10 1 1 113 3740 PDF
of Doctor of Philosophy
COLUMBIA UNIVERSITY
2005
c
°2005
puter for the solution of differential equations. To investigate the capability of analog
computing in a modern context, a large VLSI circuit (100 mm2 ) was designed and
functional blocks, switches for their interconnection, and circuitry for the system’s
program and control. This chip is controlled and programmed by a PC via a data
acquisition card. This arrangement has been used to solve differential equations with
The utility of a VLSI analog computer has been demonstrated by solving sto-
equations (ODEs). Additionally, techniques for using the digital computer to refine
the solution from the analog computer are presented. Solutions from the analog com-
puter have been used to accelerate a digital computer’s solution of the periodic steady
state of an ODE by more than an order of magnitude.
An analysis has been done showing that the analog computer dissipates 0.02
to 20 % of the energy of a digital signal processor, when solving the same differential
equation.
Contents
List of Figures vi
Acknowledgments xiv
Chapter 1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
i
Chapter 3 Design of the VLSI Circuit 26
3.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.2 Integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.5 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.3.6 Logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
ii
Chapter 5 Circuit Measurements 107
Solutions 148
iii
6.1 Solving Partial Differential Equations on the Analog Computer Using
puter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
niques 188
iv
8.3 Computation Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
puter 200
v
List of Figures
3.1 Architecture of the VLSI analog computer with expanded view of one
Macroblock. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.8 Input variable gain current mirror implementing 20:1 mirror ration.
vi
3.9 Composite device. The Composite device on the left can implement
3.14 Digital to analog converter used to generate tuning currents for the
integrator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.15 Iout Vs. DAC word. Ideal and two nonideal characteristics. . . . . . . 66
3.22 Schematic of the fanout circuit in is smallest and middle signal range. 72
nonlinear block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
vii
3.28 Minimum and maximum function. . . . . . . . . . . . . . . . . . . . . 81
viii
5.12 Exponential blocks’ input-to-output transfer characteristic from one
chip. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
menting the minimum and maximum functions. Outputs from the block.142
ix
5.23 A typical programmable nonlinear block’s characteristic when imple-
18 and 50. Coefficients scaled by 0.998 and 0.9982 . The most bowed
6.5 Per discretization point block diagram of the heat equation. Imple-
mentation 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.6 Per discretization point block diagram of the heat equation. Imple-
mentation 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.8 First order nonlinear SDE. Small noise (σn(t) = 0.292). Time domain
6.9 First order nonlinear SDE. Small noise (σn(t) = 0.292). Statistics. . . 170
6.10 First order nonlinear SDE. Medium noise (σn(t) = 0.462). Time domain
6.11 First order nonlinear SDE. Medium noise (σn(t) = 0.462). Statistics. . 172
6.12 First order nonlinear SDE. Larger noise (σn(t) = 0.922). Time Domain
x
6.13 First order nonlinear SDE. Larger noise (σn(t) = 0.922). Statistics. . . 174
computer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
0.4, γ = 0.67 and ω = 2π. The thick lines correspond to the analog
computer’s solution while the thin lines correspond to PSIM’s solution. 187
xi
List of Tables
input current mirror. Rows two through four indicate the conducting
5.2 Measured results for the VGA block. Largest input range. . . . . . . 123
5.3 Measured results for the VGA Block. Middle input range. . . . . . . 123
5.4 Measured results for the VGA block. Smallest input range. . . . . . . 124
A.1 Truth table for GLOBAL ADDRESS DECODER’s row sel. . . . . . . 205
A.2 Truth table for GLOBAL ADDRESS DECODER’s col sel. . . . . . . 205
xii
A.3 Truth table for GLOBAL ADDRESS DECODER’s write enable signals.205
xiii
Acknowledgments
This dissertation was made possible by my advisor, Yannis Tsividis. His unwaver-
ing enthusiasm for the project has been infectious and his steady confidence in my
abilities, though frightening in the early years, lays the foundation for the research
career ahead of me. He is also compelling proof that one need not choose between a
mittee: Drs. David Keyes, Bob Melville, Ken Shepard, and Charles Zukowski. Your
comments and time spent reading this work have been critical to its improvement.
Special thanks are due to Bob Melville who has helped greatly in making contacts
with mathematicians and in finding applications for the analog computer. To Ken
Shepard, through the four courses he taught me, I owe my ability to “talk the talk”
has been invaluable. His enthusiasm and generosity with his time are greatly appre-
ciated.
During my time in 422 Shapiro I have been fortunate to meet and work along
xiv
side many great students, namely former students Shanthi Pavan, Greg Ionis, Nagi
Krishnapura, Dandan Li, Yorgos Palaskas and Sanjeev Ranganathan and all of the
current students. Greg, Sanjeev, Tuku Chatterjee, George Patounakis and Nebojsa
Stanic have cajoled many a racalcitrant computer into behaving properly. Babak
Soltanian and Yorgos have been a delight with whom to share the confines of 422G.
I am grateful to Mehmet Ozgun, Tuku, and George for reading drafts of parts of this
thesis.
The late Marlene Mansfield’s no-nonsense style, her effectiveness (and “affectiveness”)
and her commitment to her students were greatly appreciated and are missed. Betsy
Arias, John Baldi, Stacey Miller, Jim Mitchell, Bilha Njuguna, Lourdes de la Paz,
Elsa Sanchez, Jodi Schneider, and Azlyn Smith have kindly helped on numerous
occasions to solve problems with my registration or other issues that are frequently
of my own causing.
The last 5 years would not have been spent at Columbia had it not been for
the influence of Professor Ajoy Opal from the University of Waterloo, my 4th year
project advisor. His suggestion that I apply to Columbia was the push that got the
ball rolling. I am glad to have had his support and I thank Professors Andrew Heunis,
George Kesidis and Tajinder Manku for their advice over the years at UW.
I am indebted to my parents for their love and support over the years. I
all the impromptu physics lessons over the years. He demonstrated the importance
xv
of skepticism, intellectual curiosity, and precise language when discussing science.
Thanks are also due to my brother, Michael, an often quoted source of theories on
There are many others who have helped steer this research and my academic
interests before coming to Columbia. It is only by accident that they are omitted
xvi
1
Chapter 1
Introduction
1.1 Motivation
Analog computers of the 1960s were physically large, tedious to program and required
significant user expertise. However, once programmed, they rapidly solved a variety
cretization artifacts, albeit with only moderate accuracy [1]. Analog computers were
superseded by digital ones long ago; the technical community has hardly considered
Digital computers can solve differential equations with very high accuracy.
However, they may suffer from a variety of convergence problems. Further, some
simulations may take a long time. This can preclude repeatedly solving the equations
analog computers could be revived in a modern VLSI context [2], with significant ad-
symbiotic environment with a digital computer (Fig. 1.1), complementing the latter
and acting as a co-processor to it. This combination makes possible several features:
parameter changes.
faster than is possible with a digital computer, and with guaranteed convergence
to a physical solution.
Historically analog computers primarily solved the differential equations necessary for
system simulation. Analog circuits can solve ordinary differential equations (ODEs)
written in the form ẋ = f (x, u, t), where x is a vector of state variables of length
techniques exist (e.g. method of lines [3]) for converting partial differential equations
Interest in analog computers decreased in the 1960s and 1970s as digital com-
puters became more advanced. One of the few remaining applications of analog
log computation are some efforts in the area of reconfigurable analog hardware [4],
sometimes referred to field programmable analog arrays (FPAAs). These have gained
the interest of researchers, and some products have been released, though they typi-
cally are for linear filtering applications. While implementing a filter is synonymous
Large analog VLSI systems have been designed to solve many problems by
emulating neural systems [5]. These are special purpose circuits capable of performing
For the purpose of solving differential equations, some discrete-time (DT) de-
vices have been designed [6], [7]. These, however, may still add artifacts stemming
from the discretization of time that occurs when continuous-time ODEs are mapped
While no general purpose VLSI analog computers exist today, researchers have
built custom analog computers to investigate solitons [8] and nonlinear oscillators [9].
The idea of coupling an analog computer to a digital computer has been studied
in detail over the years. One of the most interesting ways of pairing the two types of
Chapter 2 describes how analog and digital computers each solve differential equa-
tions, and their respective strengths and weaknesses. The design of a large VLSI
of differential equations solved by the analog computer and Chapter 7 describes how
the analog computer’s solution can be used and refined by a digital computer in a
way that speeds up the digital computer’s solution. Chapter 8 compares this ana-
speed. Chapter 9 gives some suggestions for future work in this field.
6
Chapter 2
Computer Techniques
The functions that are solutions to ODEs are continuous in time; however, digital
computers calculate the solution at a series of time points. The differential equations
are mapped to a set of difference equations and solved. How this mapping is done
has important consequences on the speed of the digital computer’s solution and the
required spacing of the time steps, and determines some of the stability properties of
the solution.
7
to [11], [12] and [13]. What follows is an overview of some of the basic methods.
ẋ = f (x, t) (2.1)
y = g(x, u, t) (2.2)
and suppose that its solution is desired over the interval of time (t0 to tf ) be-
ginning with initial condition x(t0 ) = x0 . This discussion assumes that the functions
g are algebraic equations whose values are easily computed once x is known. As such,
the majority of the computational effort in solving for y is spent solving for x and the
discussion of equations in this form will center around the solution of x. The solution
to Eq. 2.1 at a time, t1 near t0 , can be calculated from x0 . The choice of the time
step from t0 to t1 is based on a number of error estimations, and is beyond the scope
x1 = x0 + m(t1 − t0 ) (2.3)
Possible values for the slope m in Eq. 2.3, are f (x0 , t0 ), f (x1 , t1 ) or some combination
Since x1 is the only unknown, and it is isolated on the left side of the equation, it can
be calculated explicitly and very quickly. Routines that are explicit are referred to as
8
predictor routines. Examples of predictor routines include the method above, known
as Forward Euler and Heun’s method which are one-step methods, since the solution
at only one time step is used to calculate the solution at the next time step. Other
predictor methods include various Runge-Kutta methods which use the solution at
Generally speaking, taking smaller time steps leads to higher accuracy. How-
ever, as the time step is lengthened, the degradation in accuracy may be far from
graceful for a method such as Forward Euler. Taking time steps longer than the
shortest time constant in the system can lead to terribly inaccurate results due to
numerical instability and floating point rounding error, even if the dynamics asso-
ciated with that time constant have long since decayed. That is, fast dynamics in
the system require short time steps for the entire simulation. This characteristic of
Forward Euler makes it inappropriate for the simulation of many types of systems,
most notably, stiff systems. A system is said to be stiff if it contains dynamics that
On the other hand, if f (x1 , t1 ) is used for the slope m in Eq. 2.3, larger time
tion, it may not be possible to solve for x1 explicitly, and hence this type of method
is said to be implicit (and a corrector method). The solution for x1 is typically found
using some sort of iterative root finding scheme such as Newton-Raphson iterations.
9
These iterations are not guaranteed to converge if they begin with a starting value for
x1 that is separated from the correct value of x1 by a local extremum. The number of
iterations required to reach a given level of convergence will be smaller if the initial
value for x1 is closer to its correct value. As such, numerical routines often use a
as Forward Euler, and then refined (corrected) by a method such as Backward Euler.
Other examples of corrector/implicit routines are the trapezoidal method, and the
gear2 method.
Each method has its own unique characteristics. The trapezoidal method can
give rise to trapezoidal ringing, a phenomenon in which the state variables exhibit
false oscillatory behaviour when the system is excited by a step function. For all of
system. Ideally the stability characteristics of the CT and DT systems would be the
a CT system that did not exhibit BIBO stability would be mapped to a DT system
that also did not exhibit BIBO stability. This is very important as one of the most
important reasons for doing a simulation may be to determine if the system is stable.
The above methods also apply to cases in which x and f are vectors. Little is
changed in the predictor method, but the corrector methods require even more effort
Waveform Relaxation
In the methods discussed in the previous section, the solution for every state variable
at the next time step is calculated. The solution, up to and including the current
time step, is known, and the solution at the next time step is an unknown. An
alternative to this technique is called waveform relaxation [14]. In this technique, the
solution over a time interval (t0 to tf ) for one state variable (say xi ) is calculated,
while the other state variables are treated as knowns, and then the solution for the
next state variable (say xi+1 ) is calculated over the same time interval, assuming the
others are known. This is repeated until the solution for all state variables has been
calculated, at which point the process repeats, recalculating the state variables, one
at a time, over the solution interval, assuming that the other variables are known.
This approach is used in large systems with only weak coupling between state
to a periodic input. The system has reached this so-called periodic steady state (PSS)
where T is the period of the state variables. T frequently is also the period of the
The condition in Eq. 2.6 allows us to consider the solution to the ODE over
only one period, which is discretized into n points. The derivatives at the last point
will depend on the value at the first point and vice versa, stemming from Eq. 2.6.
If the system has m state variables, over the n points there are a total of m × n
ponents are complex numbers, the frequency domain technique has the same number
The following is a discussion of some of the strengths and weaknesses of the digital
approach.
reap the benefits of the large number of bits with which numbers are represented, in
some cases the solution must be computed at very finely-spaced time points. This
12
Dynamic Range: Digital computers have very large dynamic range, owing
Requisite User Expertise: While users must become familiar with the soft-
ware running on the computer, they can be successful with limited understanding of
system theory and limited knowledge of the inner workings of the computer.
quickly. However, the large number of computations needed to solve large systems
puter, several problems can ensue, namely non-convergence, changes in stability prop-
erties and other artifacts stemming from the approximation of the derivatives. These
problems primarily stem from the mapping to discrete-time, and not because the
ẋ = f (x, u, t) (2.7)
y = g(x, u, t) (2.8)
equations is realized by denoting the output of the ith integrator as the ith state
variable (xi ). Therefore, the input to the ith integrator is ẋi . Circuitry necessary to
implement the ith function in f , (fi ), is used, the output of which is applied to the
input of the ith integrator. The circuits necessary to implement the ith function in g,
1
ẋ = (−x + u) (2.9)
τ
14
y=x (2.10)
Here, m, n and p are equal to 1. The result of the implementation procedure described
above is shown in Fig. 2.1. The mapping of this equation to an analog computer
1
requires an integrator, an amplifier that implements the gain τ
and the means to
sum signals. To find the solution to this equation, over the time interval from 0 to a
u 1 x x, y
τ
final time point, tf with initial condition x(0) = x0 , for a particular input u0 (t), the
output of the integrator must be set to x0 , and u0 (t) over the interval from 0 to tf is
applied to the input of the analog system. The output y, measured over the interval
The input u0 (t) can be generated by a digital computer and applied to the
analog system via a digital to analog converter (DAC). Historically, the outputs were
plotted directly onto paper, but in modern analog systems the outputs are typically
Scaling
between the units of the equations’ variables to the units of the electrical quantities
the following example. Assume that Eq. 2.9 and Eq. 2.10 reasonably model the
temperature of a barrel of hot water placed outside on a cool day, where x(t) is
the average temperature of the water in the barrel at any time instant, t, x0 is the
initial temperature of the water in the barrel, u0 (t) is the temperature of the outside
air and τ is the time constant with which the barrel cools. For argument’s sake,
block diagram in Fig. 2.1, when it is mapped to the analog computer’s circuits, some
correspondence between the equation’s variables (x, u) which have units [C] and the
electrical quantities in the analog computer must be made, as the equation’s variables
The temperature of the water will start at its initial temperature of 70 C and
t → ∞. If the integrator being used is a voltage-mode circuit with a linear input and
1 V
to the electrical quantities by scaling them by the factor 10 C
. With this scaling, the
output of the integrator will vary from 7 V to 1 V. Clearly, at least some prediction
Most analog computers have some means of detecting if signals go out of range,
16
allowing the user to rescale the equations, shrinking their range, and resimulate the
system. This detection is important since a user may not be able to predict the
guard against the variables going out of range, one could scale the equations so that
they remain very small. However, this is unwise since the signals will be closer to the
A more subtle form of scaling is needed when the variables in the equation
do not change significantly relative to their average value. For example, consider
the case for the same ODE, where the initial temperature is 9 C and the outside
temperature is 8.9 C. There is no need to scale the variables for them to fit within
1V
the limits of the circuit. That is, the trivial ratio of 1C
could map temperature to
voltage. However, the small change in x from 9 V to 8.9 V over the simulation time
might lead to inaccurate results since the noise of the circuits will be larger relative
to the 0.1 V change in x than it was for the 8 V change in the earlier example.
For linear systems, the solution can easily be split into a constant part and a
time-varying part, allowing the time-varying part to be expanded into a larger part
of the range of the circuits. That is: x = XDC + xvar where the subscript DC denotes
the constant part of the variable and var denotes the time-varying part. Likewise,
y = YDC + yvar and u = UDC + uvar . We can set YDC = XDC and yvar = xvar . The
differential equation becomes: ẋvar = τ1 (−xvar + uvar ). XDC = YDC = UDC = 9 and
uvar (t) = −0.1 V. Now the time-varying parts of u, x, and y range from 0 to -0.1
90 V
C. A mapping of 1 C
can be used meaning that the electrical variable representing
17
eoffset
u 1 x x, y
τ
Figure 2.2: Realization of a first-order, LTI ordinary differential equation. The inte-
grator has an input offset.
xvar will change by 9 V over the simulation and would be effected by noise to a much
While this shifting in the variables is obvious and easy to do for simple linear
systems, it is much more difficult for nonlinear systems. Often, the shifting of variables
In the example described by Eq. 2.9 and Eq. 2.10, when τ = 1800 s, the
1 1
amplifier in Fig. 2.1 has a gain of τ
= 1800
= 5.55 × 10−4 . This very low gain has
several consequences:
• The output signal of the amplifier may be small relative to the amplifier’s output
• The signal applied to the input of the integrator may be very small relative to the
the y(t) that the analog computer computes will be the sum of the responses
of the system to u(t) and to eof f set (t) where eof f set (t) is the integrator’s input-
referred offset, depicted in Fig. 2.2. The DC gain from eof f set to y is τ , which
• Implementing this system with τ = 1800 s means that the solution will be
changing with the same, very long time constant that the actual system (barrel
of water) has. The analog computer is not useful if it takes several hours to
referred to as time scaling. Consider the system of equations in Eq. 2.7 and Eq. 2.8,
with solution y(t). Suppose a new set of equations, with the same set of functions for
f and g is defined:
ys = g(xs , us , t) (2.12)
It can be shown that ys (t) = y(Kt). That is, if the input signal to every integrator
a time-scaled version of the original system, provided the inputs are time-scaled by
the same factor. This property can be used to speed up a simulation, and to avoid
having very small inputs to the integrators. Another aspect of time scaling stems
from the fact that the integrators in analog computers usually do not implement the
less than one second. While the term time constant normally refers to the parameter
−t
τ in a function of the form e τ it will also be used to refer to the reciprocal of the
1
ẏ = x (2.13)
τ
where y is the output of the integrator and x is the input of the integrator. τ is the
following:
• Select a K to generate a new set of equations as shown in Eqs. 2.11 and 2.12. A
good choice would be one that, based on any knowledge of the system, leads to
Eq. 2.10, the best choice for K, would be τ , thereby eliminating the amplifier
the solution from the analog computer, yAC , scale its time values by the time
constant of the analog computer’s integrators (τAC ) in the following way: ys (t) =
t
yAC ( τAC ). Since ys (t) = y(Kt), y(t) = ys ( Kt ) = yAC ( KτtAC )
In many physical phenomena, functions are defined in terms of more than one vari-
able (e.g., space and time). Frequently, rates of change are also specified for more
20
than one variable, giving rise to partial differential equations (PDEs), which cannot
immediately be solved on an analog computer in the way that ODEs can be.
or time. The latter approach, which leaves space a continuous variable, is called
Method of Lines: In the DSCT technique, also known as the method of lines,
the spatial partial derivatives are approximated as finite differences, leaving a set of
coupled ODEs, which can be solved in the usual fashion on the analog computer.
∂2T
α = Ṫ (2.14)
∂x2
where T (x, t) is the temperature at a point x and time t along a uniform rod oriented
k
is given by: α = C
, where k is the material’s thermal conductivity and C is the
∂T
∂x
can be approximated by a difference in several ways, though for all, x is
which xi − xi−1 = h. The following are three possibilities for the approximation:
∂T Ti − Ti−1
|x i = (2.15)
∂x h
21
∂T Ti+1 − Ti
|x i = (2.16)
∂x h
∂T Ti+1 − Ti−1
|x i = (2.17)
∂x 2h
Eq. 2.15 is referred to as Backward Euler, Eq. 2.16 is referred to as Forward Euler,
and Eq. 2.17 is referred to as Central Differences. To approximate the second partial
Euler is used to approximate the first partial derivative, Backward Euler is typically
used to approximate the second partial derivative from the first, giving:
These two approximations have different properties that relate to the accuracy of the
solution on an analog computer which will be discussed in Sect. 6.1. However, both
techniques can also be extended to PDEs of two and three dimensions in a similar
The technique is called the method of lines because the problem which is
defined over the (x, t) plane is solved along lines parallel to one of the axes. In the
22
DSCT technique, the solutions of the ODE are computed along lines in the (x, t)
plane spaced equally in x and parallel to the t axis. In the CSDT technique, the
solutions to the ODEs are computed along lines in the (x, t) plane equally spaced in
characteristics, expressions for different lines in the (x, t) plane are found which trace
level curves of the solution T (x, t). If the solution to the PDE at only one point in
the (x, t) plane is desired, it may be possible to integrate only along one curve from
an initial value, since these curves, for some PDEs, do not depend on one another
For the following discussion, an attempt has been made to note which deficiencies
Accuracy: Analog computers are not very accurate. Solutions are usually
accurate to no better than 0.1 % - 1 %. These errors stem from the following:
tors.
which accurate processing can be guaranteed. Due to noise in the analog circuits,
small signals become unduly corrupted by noise. Large signals are processed with
distortion. Generally speaking, analog computers have relatively poor dynamic range,
improve the dynamic range of an analog computer, though not to the level of a
digital computer:
• Dynamic bias schemes to reduce the noise when signals are small, but also to
• Class-AB operation.
plugging and unplugging of patch cords and by turning potentiometers to tune gains
24
and time constants. This could take a significant amount of time for large systems,
and required significant user expertise. This problem does not apply to VLSI analog
computer that sets the states of electronic switches to control the connectivity of the
blocks, and programs DACs that generate tuning signals to tune gains and time
constants.
If a user wished to use a function not within a classical analog computer system,
the function is sufficient and through the use of VLSI circuits, a large amount of
requires user expertise to perform. However, because the interface to a modern ana-
log computer is typically through a digital computer, much of the scaling can be
time constant of the integrators and the interval over which a solution is desired,
and not by the complexity of the system being simulated. That is, doubling the
number of state variables in an ODE does not increase the simulation duration. While
some digital computers make use of parallelism, every analog computer is inherently
parallel.
number of parts and the significant mechanical assembly needed. Also, many would
25
be tuned at the factory, requiring the time of a technician. VLSI circuits can be
inexpensive when manufactured in quantity and electronic assembly costs are low,
is sought are solved directly. Firstly, an analog computer will never settle to a
metastable equilibrium as a digital computer may and it is less likely to suffer from
ists, but the digital computer cannot find it. An example of this is the situation in
which Newton-Raphson iterations fail because the starting point for the iterations
was separated from the solution by local extremum. Secondly, because the system is
in continuous time, there can never be aliasing, nor can there ever be artifacts intro-
digital computer. An anti-aliasing filter is needed at the input to the ADC. However,
the bandwidth of the signal can be estimated by the frequencies of input signals and
and vice versa. As such, an attempt has been made to combine them in a way that
best utilizes modern VLSI technology and best exploits their respective strengths.
Chapter 3
3.1.1 Overview
The VLSI analog computer (AC) that is the subject of this thesis is composed of 416
functional blocks, a large number of signal routing switches, memory that holds the
states of the switches, memory that holds programming data for the functional blocks,
and circuitry enabling the programming of the chip and the control of simulations
the circuits are current-in/current-out) and hence signals are added by connecting
or subtracted from another signal. The chip contains the following circuits:
• 80 integrators.
27
• 16 logarithms.
• 16 exponentials.
• Sign.
• Absolute value.
• Saturation.
When two programmable nonlinear blocks are used together, the pair can
implement:
• Minimum.
• Maximum.
• Greater than.
• Less than.
28
• Track.
The chip contains 160 blocks (fanout blocks) that allow a signal to be fanned
Macroblock
Figure 3.1: Architecture of the VLSI analog computer with expanded view of one
Macroblock.
Fig. 3.1 shows the architecture of the chip with an expanded view of one MB.
Fig. 3.2 shows a detailed view of one Macroblock. Each block’s input is con-
nected to a wire running horizontally and each block’s output is connected to a wire
running vertically. These wires extend outside of the MB to allow for the connection
29
X
Fanouts
VGAs/Mult Y
Integrators
Closed switch
Functional blocks’ outputs
Functional blocks’ inputs
Figure 3.2: Architecture of one Macroblock within the analog computer.
between blocks in different MBs. For simplicity each block is shown with one input
and one output though some blocks have more than one input or output. Each wire
pass-transistor switches and SRAM that holds their states wherever two groups of
wires cross one another. The switches can be closed to connect horizontal and vertical
wires. The solid, bold line (within the expanded MB) shows how the output of block
Fig. 3.3 shows the interconnection of the Macroblocks with one another and
off-chip. To route a block’s output to the input of a block not in the same MB, shared,
global wires are used. The dotted, bold line shows how the output of a block in MB
30
B W
Outputs to off−chip
Demultiplexers
Functional blocks’ outputs
Figure 3.3: Architecture of the VLSI analog computer showing top-level connections.
W is routed to the input of a block in MB Z. Sixty-four analog signal inputs enter the
interconnection network at the top and bottom of the chip through 1:2 demultiplexers
depicted in Fig. 3.3. The solid line in Fig. 3.3 shows how a signal from off-chip can
applied to the horizontal wires can be routed off-chip through multiplexers on the left
and right sides of the chip, for a total of 64 outputs. The bold, dashed/dotted line
is routed to off-chip.
The chip was designed and fabricated in a 0.25 µm CMOS process from the Taiwan
devices for a 2.5 V VDD and thick-oxide devices, capable of tolerating 3.3 V but with
minimum lengths of 0.3 µm and 0.35 µm for the PMOS and NMOS devices, respec-
metal process with minimum metal line widths ranging from 0.32 µm for Metal-1 to
All circuits have class-A inputs and outputs, with the exception of the logarithm
circuits, which have class-AB inputs and the exponential circuits, which have class-
AB output circuits. To accommodate larger dynamic range, most class-A signal ports
have 100 nA, 1 µA, and 20 µA signal ranges. For some ports, the largest signal range
is 10 µA.
and as such, it is difficult to predict how the performance of one circuit will effect
the overall accuracy of the system being simulated. While one could respond to
this by designing circuits to meet extremely high performance standards, the usual
costs would be incurred, namely increased design time, complexity, area and power
consumption. Instead, some moderate performance targets were set such that all
nonidealities affect the instantaneous accuracy of a block equally. For example the
integrated equivalent input noise specification was set to be the same, as a percent of
full-scale signal, as the nonlinear distortion. The targets are summarized below:
• Maximum deviation from linearity for linear blocks: 0.1 % at half of full scale.
σIDS
• Matching of critical pairs of transistors: IDS
= 0.1%.
• Output resistance: > 1000 x input resistance. Recall that the blocks are current-
in, current-out.
3.3.2 Integrator
A block diagram of the integrator is shown in Fig. 3.4. Wire labels follow the following
to the current flowing in the wire. Labels not adjacent to arrows denote the voltage
33
/ & /0
&' $ *+ &' $
%$
$ & $
+
$
-,
% $ $ -,
) $ +
-,)
&*$
,-+.
&*
,-+. /
$ $
-,
% $
)$ $ +
+ -,) -,
%
(
1 &
&' &'
*+ 1
)
/) & /1
! #
! "# /"
! "#
of the wire with respect to ground. Signals iin+ and iin− form the circuit’s class-A
differential input. Signals iout+ and iout− form the circuit’s class-A differential output.
The integration operation is performed by the block labeled “Integrator Core”. The
wires labeled “iin1+ + 1µA”, “iin2+ + 1µA”, “iin1− + 1µA” and “iin2− + 1µA” apply
class-A analog input signals to the core, along with 1 µA biases. The integrator core,
d
(ioutc+ − ioutc− ) = K (iin1+ + iin2+ − iin1− − iin2− ) (3.1)
dt
where K equal to half of the unity gain angular frequency of the integrator.
34
The blocks labeled “A:B” are single-input, dual-output current mirrors with
B
iin1+ = iin2+ = − iin− (3.2)
A
and
B
iin1− = iin2− = − iin+ (3.3)
A
Along with composite devices COMP1 through COMP5, “A:B” blocks allow
the integrator to have multiple input signal ranges, while always supplying each of the
core’s inputs with a bias of 1 µA. The blocks labeled “B:A” are single-input, single-
output current mirrors with programmable gains, having the following input-output
A
iout+ = ioutc+ (3.4)
B
and
A
iout− = ioutc− (3.5)
B
The core of the integrator, through wires “ioutc+ + 1µA” and “ioutc− + 1µA”
applies signals and bias to the output mirrors, which allow for three different output
signal ranges. Input and output mirrors are adjusted so that the input signal limit
is equal to the output signal limit. This gives rise to the following input-output
d
(iout+ − iout− ) = 2K (iin+ − iin− ) (3.6)
dt
35
While an open loop integrator does not, strictly speaking, have a time constant,
for the purpose of this thesis, the term “time constant” will refer to the time constant
that the integrator would have if it were placed in unity-gain negative feedback. If
1
the open loop integrator has a transfer function of H(s) = τs
, the closed loop system
will have a time constant of τ . For open loop integrators, the term time constant will
refer to this τ . It is seen that τ is the inverse of the unity-gain angular frequency of
the integrator.
The time constant of the integrator in Fig. 3.4 is dependent on the two copies
of the tuning current IT U N E which are generated by the block labeled “10-bit DAC”.
The block OVFL raises the digital signal OVFL when the integrator’s differential
output signal is near its limit. The block labeled CMFB regulates the common mode
level of the integrator’s differential output current. The two blocks labeled “Offset
Cancellation” perform dynamic cancellation of the integrator’s input and output off-
sets. The block labeled Memory stores the DAC’s input word, range settings, and
Control signal VCAP helps reset the integrator. SIN controls the offset cancel-
lation sequence. The signals data[0:15] specify the data to be programmed to the
block’s memory. The signal address[0] latches the data into the block if the address
lines, address[1:5] are all high. A particular block is identified by the address lines in
• Around the chip run five address signals (a[1:5]) and their complements (a[1 : 5]).
36
• The ith address input of the Memory block is connected to either a[i] or a[i].
• The particular block is activated whenever all of its five address inputs are high.
For example, if the five address lines of a block are connected to a[1:2], a[3 : 4] and
Integrator Core
iin1- + 1uA
Vdd Vdd Vdd Vdd
VBIAS
VBIAS VBIAS
M1 M2 M5 M6 M7 M8 M9 M10
The schematic of the core of the integrator is shown in Fig. 3.5. It consists
an earlier design as possible. An earlier, smaller, version of the chip used log-domain
integrators without range-selecting input and output mirrors (blocks A:B and B:A),
requiring that they operate over a wide range of bias currents. The externally-linear,
37
sistors, its usable range (linear, and not overly noisy) can be over many decades of
bias current. However, when MOSFETs are used, the upper range of current must
be kept small enough that the devices stay weakly inverted. Reducing the current
too much leads to a poor maximum signal-to-noise ratio, since the signal range falls
faster than the noise level does as the circuit’s bias currents are reduced. When the
integrator for the chip described in this thesis was designed, the log-domain core was
kept, but the range-selectable mirrors were added. It was simpler to design them
tegrators found in the right and left halves of Fig. 3.5. Transistors M1 through M12
operate in weak inversion. Transistors M13 through M18 keep the impedance low at
the drain of M1, M3, M6, M7, M10 and M11, respectively, allowing the input and
tuning currents to enter the circuit at a low-impedance point. The transistor pairs
M19/M20 and M21/M22 form unity-gain current mirrors. We will perform an analy-
through M6, M13 through M15, M19, and M20 and the capacitor C on the left side
• The following pairs of transistors are identical to one another: M1 and M6; M2
is described by [17]:
vGS
iDS = SIS exp( ) (3.7)
nφt
W
Where S is the device’s aspect ratio (i.e. L
), IS is a constant of proportionality with
units of current, vGS is the device’s gate to source voltage, n is the subthreshold slope
factor for the device and φt is the thermal voltage (kT /q). Eq. 3.7 can be rearranged
to give:
iDS
vGS = nφt log( ) (3.8)
SIS
verted MOSFETs (or BJTs) is called a translinear loop. The analysis of this circuit
will proceed in a fashion very similar to that of other translinear circuits [18].
There are two translinear loops in the circuit, which are composed of: M1, M2,
M3 and M4; and M6, M5, M3 and M4. Even though each of these loops starts and
ends with a different element, they form electrical loops since the gates of the start
and end devices are connected to the same voltage. Around each loop a Kirchoff’s
39
When Eq. 3.8 is substituted for each of the gate-source voltages in Eq. 3.9 and
Eq. 3.11 and Eq. 3.12 can be manipulated into the following form:
Eq. 3.13 and Eq. 3.14 will be used later. For now, consider the Kirchoff’s Current
Since M19 is a PMOS device, its drain current is defined upward from its drain to
VDD . Because M19 and M20 are identical, they act as a unity-gain mirror, mirroring
the drain current of M5 into the capacitor. Therefore M19 conducts the same current
as M5.
The output of the integrator’s core is iDS4 . Since the circuit is an integrator,
we are interested in an expression for the time derivative of the output variable. The
Where VG4 is the voltage at the gate of M4, with respect to ground, and, vS4 is the
voltage at the source of M4, with respect to ground. Recognizing that the first part
of the right-hand side is simply iDS4 , and that V̇G4 is zero, since the gate of M4 is
v̇S4
i̇DS4 = − iDS4 (3.19)
nφt
Note that vS4 and vS3 are equal to one another. Since iDS3 is kept constant by IT U N E ,
VGS3 is also a constant. Also, because the gate of M3 is connected to one terminal of
the capacitor, the rate of change of vG3 will be the same as the rate of change of the
1
v̇S4 = v̇G3 = v̇C = iC (3.20)
C
Substituting Eq. 3.17 into Eq. 3.20 and combining this result with Eq. 3.19
gives:
1
i̇DS4 = iDS4 (iDS2 − iDS5 ) (3.21)
nφt C
41
Now we rearrange Eq. 3.13 and Eq. 3.14 and isolate iDS2 and iDS5 , respectively giving:
iDS1 IDS3 S2 S4
iDS2 = (3.22)
iDS4 S1 S3
iDS6 IDS3 S5 S4
iDS5 = (3.23)
iDS4 S6 S3
Eq. 3.22 and Eq. 3.23 can be substituted into Eq. 3.21 to give:
S2 S4 IDS3
i̇DS4 = (iDS1 − iDS6 ) (3.25)
S1 S3 nφt C
This equation describes the behaviour of the circuit in terms of total drain-source
currents and not the signal quantities labeled in Fig. 3.5. Eq. 3.25 can be cast in
and therefore:
When Eq. 3.26, Eq. 3.27, and Eq. 3.29 are substituted into Eq. 3.25, we get:
S2 S4 IT U N E
i̇outc+ = (iin1+ − iin1− ) (3.30)
S1 S3 nφt C
42
In Eq. 3.30 IT U N E has replaced IDS3 . It can thus be seen that we have a differential
input, single-ended output integrator whose time constant can be tuned through
IT U N E .
A similar analysis can be carried out for the right-hand integrator, which can
be combined with Eq. 3.30 assuming that the following sets of transistors are identical
to one another: M1, M6, M7 and M10; M2, M5, M8 and M9; M19 and M20; M21
S2 S4 IT U N E
i̇outc− = (iin2− − iin2+ ) (3.31)
S1 S3 nφt C
d IT U N E S2 S4
(ioutc+ − ioutc− ) = (iin1+ + iin2+ − iin1− − iin2− ) (3.32)
dt nφt CS1 S3
Eq. 3.32 will be related to the behaviour of the entire integrator once the
operation of the blocks labeled “A:B” and “B:A” in Fig. 3.4 is described. Eq. 3.32 is
not to suggest that the circuit, as described thus far, is fully differential. Rather, it
is pseudo-differential.
resetting the integrator, and ensures that transistors M2, M5, M19, and M20 become
for the integrator to reach a state in which the operation described above applies.
For example, if vC = 0 and the drain voltage of M5, vD5 , is at VDD , transistors M2,
M5, M19 and M20 are all off and regardless of the current flowing in M1 and M6,
the capacitor will stay discharged. This does not contradict the analysis above, since
43
the analysis assumed that transistors M2, M5, M19 and M20 are each on and in
saturation.
The integrators have a nominal time constant of 40 µs. Each integration capac-
area of the integrator, despite efforts to shrink the capacitor without changing the
nominal time constant. From Eq. 3.32 it is clear that reducing S2 (and the aspect
ratios of M5, M8 and M9 since M2, M5, M8 and M9 are assumed to be identical)
proportionately with C keeps the capacitor area small without changing the transfer
aspect ratio of the other transistors in the translinear loops. This also reduces the
current through M2, M5, M8 and M9 and increases their contributions to the inte-
44
grator’s output noise. How exactly this affects the noise of a given simulation is very
dependent on the details of the system, even for very small systems. For example,
consider the system shown in Fig. 3.6. Assume signals ni (t) and no (t) are uncorre-
respectively. Here, ni represents all noise sources in the integrator on the input side of
the integration capacitor and no represents all noise sources on the output side of the
the noise analysis below assumes that the system is linear, which is a valid assumption
when the signal the integrator is processing (input and output) is small. The noise
where
1
HLP (f ) = (3.34)
j2πf + g
j2πf
HHP (f ) = (3.35)
j2πf + g
HLP (f ) is the transfer function of this system from the input of the integrator
to the output of the system, whether the input is noise at the input of the integrator
or an input signal to the system. HHP (f ) is the transfer function from the output
of the integrator to the output of the system. This noise analysis assumes that the
gain block g is noiseless. Clearly, the relative contribution of input noise to the total
optimal allocation of the circuit’s noise is dependent on the system being simulated.
45
The quartet of devices M2, M5, M8 and M9, the dominant source of input noise,
were sized such that they contributed approximately half of the core’s noise when
the integrator was in the configuration discussed here, with g = 1, and the noise was
integrated up to 1 MHz.
! "
# #" "#$ %
& ' ( " ) "( # (* %+ ( ' ,
%# #"( + $' ( - ' , - .
# % # ( #! , # ( ! "/ +
(%+ !' %
The core of the integrator uses weakly inverted MOSFETs whose relationship
between gate to source voltage (VGS ) and drain current (IDS ) is exponential. It is
this characteristic that makes the core externally linear. However, for larger drain
46
currents, the devices become moderately inverted and their current-voltage charac-
teristics are no longer exponential. One could make the devices wider, extending the
be at the penalty of circuit area, since all capacitances would increase. Alterna-
tively, the length of the active devices could be decreased as their width is increased,
maintaining a fixed area. This would reduce their output resistance and their expo-
nential characteristics would be limited by short channel effects. Instead, to allow for
a larger range of input and output current, settable-gain input and output current
mirrors were used (labeled A:B and B:A, respectively, in Fig. 3.4). Fig. 3.7 shows a
simplified schematic of the input mirror. The numbers above the dashed boxes in the
figure indicate the number of unit devices of which each transistor is composed.
Each input mirror has one analog input current and two equal output currents
so that each polarity of the integrator’s input can be applied to each of the core’s two
differential to single-ended halves (Fig. 3.4). The input mirror consists of mirroring
devices M13-M18, a unity-gain buffer amplifier (M19-M24 and SBIAS1 through SBIAS4 ),
a by-pass to the amplifier (SAMP ), some devices for compensation (SCAP and M27),
and some control logic. By appropriately controlling the gates of M7-M18 (unit size is
W = 1 µm, L = 0.3 µm), the circuit achieves mirroring ratios of 20:1, 1:1, 1:10 from
input to each of its two outputs. Device M10, since its gate is connected to VDD ,
never conducts. It is included so that the capacitive loading at the drain of M4 is the
same as the loading at the drain of M1. The input bias to the mirror is adjusted (20
47
µA, 1 µA, 100 nA) so that the output bias is always 1 µA. Table 3.1 details how the
Mirror Ratio M1 M2 M3 M4 M5 M6
20:1 M13 M8 M9 M16 M11 M12
1:1 – M8 M15 – M17 –
1:10 M7 M14 M15 – M17 M18
Table 3.1: Control for current steering switches in the integrator’s variable-gain input
current mirror. Rows two through four indicate the conducting device connected to
the device in row one.
Each device listed in the first row of the table above is connected to two current
steering devices. The devices listed in rows 2 through 4 indicate which of the two
current steering devices associated with the device in the first row is on. The entry
“–” denotes that neither current steering device is on. What precisely is meant by
“ON” is explained below. For the moment, it can be assumed that the current-
steering devices act as switches. The scenario in which the circuit implements a
current mirroring ratio of 20:1 is depicted in Fig. 3.8. Transistors draw with a bold
line are conducting while the others are not. M13 is on thereby connecting M1 to
the output terminal iOU T 1 . M2, M3, M5 and M6 are connected to the input iIN for
a total of 20 unit devices, while M4 is connected to iOU T 2 . This means that there
is one unit device supplying current to each output. M1 through M6 have the same
gate-source voltage. Assuming that they are in saturation, the connections described
The block labeled “Control Logic” takes a two-bit signal, r, as its input and
48
! "
# #" "#$ %
& ' ( " ) "( # (* %+ ( ' ,
%# #"( + $' ( - ' , - .
# % # ( #! , # ( ! "/ +
(%+ !' %
Figure 3.8: Input variable gain current mirror implementing 20:1 mirror ration. De-
vices drawn in bold are on.
generates the necessary control signals for the switches SBIAS1 through SBIAS4 , SCAP
The simplest way to operate the mirror would have been to directly connect
the input (iIN ) to the gates and drains of M1-M6. This, however, would load the
input with a large capacitance (∼ 11 pF), since the unit transistor of M1-M6 is large
(W = 10 µm, L = 10 µm). When put in parallel with the circuit’s input resistance,
the circuit’s frequency response would suffer. The input resistance of the circuit is
connected to the input. For the mirroring ratio of 1:10, the input current, and gm , is
the smallest and the input resistance is largest. This combination of input capacitance
and input resistance would result in a pole in the mirror’s frequency response near
40 kHz.
To prevent the gate capacitance of the mirror’s large devices (M1-M6) from
limiting the bandwidth of the input mirror when the input resistance is high, the
input is not connected directly to the gates of M1-M6. For the 100 nA and 1 µA
ranges, switches SBIAS1 and SBIAS4 are on, switches SAMP , SBIAS2 and SBIAS3 are off,
and M19-M24 form a unity-gain buffer from the voltage at iIN to the gates of M1-M6.
Mismatch between M22 and M23 will cause the buffer to have an input offset and
affect the input voltage of the mirror, but will not change the mirroring ratio. Since
matching between M22 and M23 is relatively unimportant these devices can be made
much smaller than M1-M6 and therefore do not load the input. When the input is
shielded from M1-M6, the input is still loaded with a capacitor. This comes from the
wire that connects the input of the block to the switching grids, typically ∼2 pF. The
feedback loop, from the input, through M22, M23, M2, and M8 can be unstable on
the 1 µA range, unless SCAP connects a small compensation capacitor (M27, 0.4 pF)
to the input. For the largest input range, SBIAS1 and SBIAS4 are off, switches SAMP ,
SBIAS2 and SBIAS3 are on. This switches off the buffer, creating a simple current
The unit device is large enough so as to ensure good matching. For devices
that are weakly inverted, the relative standard deviation in current for two equally
50
∆IDS AVT
= √ (3.36)
IDS nφt W L
where AVT is a process dependent constant, usually quoted in [mV/µm] and W and
L are the dimensions of the transistor in [µm]. For the TSMC025 process AVT '
5mV/µm and nφt = 40 mV, meaning that devices that are 100 µm2 in gate area
match to about 1 %. When set to the 20:1 range, the mirror’s matching of iOU T 1 and
iOU T 2 to one another relies on the matching of single unit devices, which are 100 µm2
in area.
The gate voltages of M7, M8, M9, M11 and M12 are raised to VDD to shut
the devices off or lowered to gnd to allow them to connect the mirroring devices to
the input. Similarly, the gate voltages of M13-M18 are raised to VDD to turn them
off, but when they are connecting the mirroring devices to the output, the gates are
lowered to only VDD /2. This creates cascode pairs of devices and increases the output
The output mirror circuits labeled “B:A” are similar to that in Fig. 3.7 with
For convenience, the equation describing the input-output behaviour of the integra-
51
d IT U N E S2 S4
(ioutc+ − ioutc− ) = (iin1+ + iin2+ − iin1− − iin2− ) (3.37)
dt nφt CS1 S3
Recall that the input and output mirrors are adjusted so that their mirroring ratios
are the reciprocals of one another. For example, when the input mirrors are set to
have the ratio of 20:1, the output mirrors have the ratio 1:20. If “A:B” is the mirroring
A A
“B:A” is the mirroring ratio of the output mirrors, iout+ = i
B outc+
and iout− = i
B outc−
.
When these relationships are substituted into Eq. 3.37, the input-output behaviour
d IT U N E S2 S4
(iout+ − iout− ) = 2 (iin+ − iin− ) (3.38)
dt nφt CS1 S3
Composite Devices
Composite devices COMP1 through COMP5 in Fig. 3.4 each have nine long channel
devices (W = 1 µm, L = 20 µm) and several short channel devices used as switches.
depending on the levels of digital control signals. Fig. 3.9 shows the three configura-
tions without switches. Fig. 3.10 shows a detailed schematic of the composite device.
The circuit’s short channel devices are depicted by switches. Transistors M1 through
M9 are long channel devices. The label adjacent to each switch indicates the signal
that controls the switch. When the signal is high, the switch is closed. Table 3.2
' '
% %
! "
##$ %&
'
%
Figure 3.9: Composite device. The Composite device on the left can implement the
three configurations of nine devices shown in the figure.
of 100 nA, 1 µA and 20 µA, respectively. While the currents are not exactly pro-
portional to the aspect ratio of the composite device, the level of inversion of the
equivalent device changes by only 2.5 while the current changes by 200. This scheme
Figure 3.10: Composite device. Switches are drawn in the place of MOSFETs.
• Efficient use of area, since every device is being used at all times.
• Nearly constant VDSSAT . VDSSAT would be constant if the aspect ratio changed
The net device is large (W L = 180 µm2 ) so that one composite device matches
Table 3.2: Control signal levels for each configuration of a composite device.
Common-mode Feedback
inputs only affect the differential output of the circuit. Without common-mode feed-
back the integrator, regardless of the differential-mode feedback around it, operates
correctly and it will saturate the input of the circuit to which it is connected.
The common-mode feedback system (enclosed in a dashed line) along with the
core of the integrator (enclosed in the box drawn with a dotted line) is shown in
• M4-cmfb, M-12-cmfb. These devices copy the output current of each side of the
core of the integrator, assuming they are in saturation, since the gate and source
voltages of M4-cmfb and M12-cmfb are the same as M4 and M12, respectively.
• MCM1, MCM2, MCM3. These devices compute the common output, and sub-
tract it from the input to the core. The drains of M4-cmfb and M12-cmfb are
connected together, thereby summing their drain currents. This sum is twice
55
CMFB
Vdd Vdd Vdd
MCM2 1x MCM3 1x
MCM1 2x
ioutc+ ioutc-
+ 1uA ITUNE M15 M17 ITUNE
M13 M16 + 1uA
VBIAS
VBIAS VBIAS M24 M23
M4 M3 M11 M12 VBIAS VBIAS
M4-cmfb
+
VC C C M12-cmfb
-
M14 M18
vCAP
the common mode output of the integrator. Diode connected MCM1 mirrors
this sum to MCM2 and MCM3 scaled by a factor of 12 . The difference between
the mirrored currents and the current sources is applied to two of the core’s
Transistors M4-cmfb and M12-cmfb do not alter the operation of the core of
the integrator as derived earlier. The operation was derived by writing a series of
KVL equations and KCL equations, none of which assumed that the drain-to-source
currents of M3, M4 and M14 summed to zero. The current through output device
M4 is determined by its source voltage, since its gain is connected to a fixed voltage.
56
and the voltage across the capacitor. Connecting M4-cmfb does not alter the M4’s
source voltage.
gain, negative feedback. If the integrator had infinite DC gain (i.e., it is an ideal inte-
grator) the error between the actual common-mode output and the desired common-
mode output would be driven to zero in steady state by the integration operation.
However, since the real integrators have finite DC gain, the error between the actual
common-mode output of the integrator and the desired common-mode output (1 µA)
than by generating two copies of each output (by using two devices similar to M4-
cmfb for each output), generating two sums, and mirroring one sum to M6 and the
other to M7. The unavoidable mismatches between the two feedback paths will result
Offset Cancellation
The integrator has two modes of operation. In one, its input offset is dynamically
cancelled before a simulation is run while in the other, no such cancellation takes
place.
Fig. 3.12 shows a simplified block diagram of the integrator with an expanded
view of the offset cancellation circuitry. In the mode in which no offset cancellation
57
! !"
%& %&
%&
%&
%& %&
#
' '
"
(" (#
! !#
!$
Figure 3.12: Block diagram of the integrator highlighting offset cancellation circuitry.
takes place, signal inf mode is high (VDD ) and signal SIN is low (gnd). The signal
inf mode is short for “infinity mode”, indicating that the integrator could operate
in this mode indefinitely, while in the other mode, due to the dynamic nature of
the offset cancellation scheme, the integrator can operate properly for a finite time.
Raising inf mode and lowering SIN connects the gates of composite devices COMP1
through COMP4 to the bias voltage (VB ) generated by the diode-connected COMP5.
There is no limit to the duration over which integrators in this mode can be operated,
To cancel the offset dynamically, the output of the integrator is not connected
to any other circuits, inf mode is lowered to ground and SIN is raised to VDD (see
58
Fig. 3.12). This connects the integrator in unity-gain, negative feedback. To see how
this configuration puts the integrator in negative feedback, consider the following
argument: Assume that with iin+ and iin− equal to zero, the integrator has reached
equilibrium. This assumption requires that the integrator is not in positive feedback.
We will see that in fact the integrator is in negative feedback, making this assumption
valid. Assume that iin− decreases by some ∆i. That is, more current is pulled from
B
the lower A:B block. Therefore, iin1+ and iin2+ increase by A
∆i. This decreases
ioutc− , as predicted by Eq. 3.37. Since the output of the integrator is not connected to
another circuit during offset cancellation, the current flowing into COMP4 decreases,
thereby decreasing vGO− . Since the gate of COMP2 is connected to the gate of
COMP4 through M3, vGI− also decreases. This reduces the current COMP2 conducts,
reducing the current that is pulled from block A:B, and hence reducing the effect of
the disturbance at the input. Because the system responds to reduce the effect of an
The above sequence ignored what happened to ioutc+ and the feedback through
the Offset Cancellation block in the upper portion of Fig. 3.12. Imagine that the
decrease in iin− is accompanied by an equal increase in iin+ , thereby making the input
disturbance differential. Similar reasoning shows that the upper feedback network will
tend to compensate for the increase in iin+ . Also, the integrator’s response to the
increase in iin+ will reduce the effect of the decrease in iin− and vice versa. In fact,
the offset cancellation scheme only responds to the differential component of input
The discussion above of the system’s response to an input when the system
an ideal system with an input. When the system reaches steady-state, the necessary
input to cancel the offset will be applied by COMP1 and COMP2. To store this input,
SIN is lowered, and the voltage needed to apply this input is held on CHold1 , CHold2
and the capacitors inside the upper offset cancellation block, ignoring some nonideal
behavior discussed below. This procedure also cancels the integrators’s output offset.
That is, when the process is finished, the outputs (iout+ and iout− in Fig. 3.12) of the
While this procedure should exactly cancel the integrator’s offsets, it does not
because the charge on CHold1 and CHold2 is changed as SIN is lowered by charge
injection from M3’s and M4’s channel charge and by capacitance division between
Cgd of M3 and M4 and the hold capacitors. To alleviate these problems dummy
switches (MD3 and MD4), connected to an inverted version of SIN , are connected to
If this procedure is done when other blocks’ outputs (except for other integra-
tors) are applied to the input of the integrator, the output offsets of those blocks are
also canceled. For example, suppose that the output of an amplifier is connected to
the input of the integrator. If the system had no offset cancellation ability, the output
offset of the amplifier would degrade the simulation in the same way, and to the same
60
extent that the input offset of the integrator would. However, if the output of the
amplifier is connected to the input of the integrator when the cancellation scheme
is executed, the output offset of the amplifier is nulled in the same fashion that any
Because charge is stored dynamically, charge will leak, changing the voltage on
CHold1 and CHold2 and the performance of the integrator will deteriorate. However,
since the leakage from CHold1 and CHold2 in one offset cancellation block should happen
at a similar rate to that in the other offset cancellation block, the common-mode
feedback of the circuit can maintain adequate performance of the integrators in this
dynamic mode for longer than if the circuit were single ended.
The dynamic scheme can reduce the circuit’s output resistance since a capaci-
tive feedback path from the output back to the gate of COMP4 exists, through the Cgd
of the composite device. Even if the composite device has infinite output resistance,
CHold2 + Cgd
Ro = (3.39)
gm Cgd
Care was taken in the layout of transistors M1 through M4, capacitors CHold1
and CHold2 and the wires carrying SIN and SIN to ensure that the coupling capac-
itances between SIN /SIN and nodes vGI− and vGO− were minimized. A guard ring
consisting of densely spaced vias from the substrate up to Metal-5 surrounds CHold1
and CHold2 . A grounded layer of Metal-3 separates the wires carrying SIN /SIN from
Overflow Detection
OVFL
OVFL Detection
1x 1x
M7 M5 12.5 x
M6
M2
An analog computer will give erroneous results if any of its signals exceed the
range over which an individual block can accurately process them. For every circuit
except the integrator, the size of the output is uniquely determined by the size of the
input. By ensuring that the input is limited, it can be guaranteed that the output
On the other hand, there is not a 1-1 correspondence between the input signal
level and the output signal level for the integrator, owing to the integration operation.
Regardless of how judiciously one limits the size of the input, the output will still
saturate if a small input is applied for a long enough time. Therefore, the block that
62
is in greatest need of circuitry to detect saturation (or overflow) at its output is the
integrator.
The circuit that does this is shown in Fig. 3.13. Enclosed in the dotted box
is the part of the integrator’s core that is relevant to overflow detection. Enclosed in
the dashed line is the circuitry that detects the integrator’s output saturation. This
is labeled “OVFL Detection” in Fig. 3.4. The connection between the core and the
overflow detection circuitry comes through the wires labeled vS+ and vS− in both
The gate of transistors M0 through M3 (in the block labeled OVFL Det.) are at
current flows out of the diode-connected PMOS transistor M5. The current mirror
consisting of M5-M7 would mirror this current, divided by 12.5, if M6 and M7 are in
1
saturation. The mirroring ratio of 12.5
occurs because the aspect ratio of M5 is 12.5
times that of M6 and M7. M4-ovfl has the same gate and source voltages as M4 in
the core of the integrator. If M4-ovfl is in saturation, it conducts 1/10th the current
that M4 does, since the former is 1/10th the width of the latter. For the purpose
of this discussion, the term “saturation current” refers to the approximate current a
given transistor would conduct if it were in saturation for its present vG , vS , and vB .
The signal OV F L goes high if either (or both) of the drains of M6 and M7 are at
a voltage near VDD . This will occur if the saturation current of M4-ovfl or M12-ovfl
63
occurs when the saturation current of either M4-ovfl or M12-ovfl is less than 20 nA
which corresponds to either M4 or M12 conducting less than 200 nA, or 20 % of its
full scale range. M4 or M12 conducting this little current means that the signal has
reached 80 % of its full scale range, since the bias current for M4 and M12 is 1 µA.
When M4 or M12 is conducting only 200 nA, the other device is conducting 1800 nA,
and therefore the overflow circuitry detects when the output is nearing saturation in
both the positive and negative direction. The currents processed by this circuit are
small, and the OV F L flag may not toggle precisely at 80% of full scale, however, it is
somewhat unimportant when exactly the flag is raised, so long as it gets raised before
The digital output OV F L is latched into a scan chain that can be read from
the chip after a simulation has finished. A signal indicating whether an overflow has
As noted in the discussion of the core of the integrator, the integrator’s time constant
block labeled DAC in Fig. 3.4 which supply the core’s two copies of IT U N E . The DAC
takes as its reference a current of about 1 µA and a 10-bit digital word. From these,
it generates two currents ranging from near 0 to 3 µA, which are ideally equal to one
another.
64
IOUT
IDUMP
Vdd
M1 M3 M5 M7 M9 M11 M13
b0 b1 b2
M2 M4 M6 M8 M10 M12 M14
Vdd Vdd
IIN/2 IIN/4 IIN/8
M15 M16
Node 1 Node 2 Node 3
IIN/2 IIN/4 IIN/8
IIN
Figure 3.14: Digital to analog converter used to generate tuning currents for the
integrator.
The operation of the DAC is much like an R-2-R ladder, in that at each stage
the current is divided in two, with half coming from the next stage and the other half
either coming from the output or from a dump node. However, here the elements are
transistors. Fig. 3.14 shows three bits of the structure. Signals IOU T and IIN refer
to the output and input, respectively, of the DAC and should not be confused with
similarly named signals in Fig. 3.4. The arrow has been omitted from the symbol
for the NMOS transistors used in Fig. 3.14. All transistors in the figure are NMOS
devices. For the time being, assume that each NMOS transistor is the same size
(W/L), and that nodes IOU T and IDU M P are at voltages high enough to keep all
and M14), with gates connected to VDD form an equivalent device of W/2L. The
is on. The pair that is on has the same gate and source voltage as M13/M14. Since
it is assumed that nodes IDU M P and IOU T are at a voltage high enough to keep all
devices connected to them in saturation, the current through the pairs of devices is
determined almost exclusively by their gate and source voltages. Therefore, from the
point of view of the current flowing into Node 3, the b2-controlled pair and M13/14
act like two devices in parallel, forming an equivalent device of 2W/2L which will,
in this application, behave like a device of W/L. Now this equivalent device is in
series with M16 (W/L) forming a device equivalent to W/2L. This analysis continues
until we see that the pairs controlled by b0 will form a device of W/2L in “parallel”
with a collection of transistors to the right of node 1, which, regardless of the state
of signals b1 and b2, form a device of W/2L. Hence, IIN is split in two, with half
flowing from the right and half flowing from either IOU T or IDU M P . Therefore, the
state of b0 determines if IIN /2 flows from IOU T or IDU M P . This splitting occurs for
each successive bit. Bit b1 determines if IIN /4 flows from IOU T or IDU M P and b2
3), whose two outputs are applied to the integrator core as the two copies of IT U N E .
The actual DAC used differs somewhat from that described above in that the series
devices (those whose gates are always connected to VDD ; M15 and M16 in Fig. 3.14)
are slightly shorter than the shunt devices (M1-M14). This has the effect of skewing
66
1
1024
Figure 3.15: Iout Vs. DAC word. Ideal and two nonideal characteristics.
the DAC’s IOU T vs. DAC word transfer characteristic to be non-monotonic. When
the input of a 10-bit R-2-R DAC is 511, b0, the most significant bit is low, and the
9 less significant bits are high. That is, the b0 bit is directing its current (IIN /2)
from the “dump” node, while the others are directing their currents from the “out”
node. When the input is incremented to 512, b0 is high, and b1-b9 are low. Now, b0
directs its current from the “out” node, while the others direct their currents from
the “dump” node. In the absence of mismatch, the output current for an input of 512
1
is one step ( 1024 IIN ) larger than the current for an input of 511. Fig. 3.15 A shows
this ideal case. However, if M15 is longer (less conductive) than the other transistors,
67
the characteristic in Fig. 3.15 B results. As shown in the figure, this larger step in
the IOU T Vs. DAC word characteristic means that a range of outputs cannot be
generated. In the context of the integrator this would mean that the integrator could
not be tuned to a range of time constants. On the other hand, if M15 is shorter
than the other transistors the characteristic in Fig. 3.15 C results, which is non-
not, since the calibration scheme for the integrator measures the time constant vs.
DAC word characteristic and stores the results in a look-up table. When a particular
time constant for the integrator is desired, the DAC word that gives the time constant
closest to the desire one is selected. This scheme does not rely on the measured time
constants being in any particular order. Note that no range of unrealizable values of
IOU T or time constants results from M15 being shorter than the rest. Since there will
troublesome than in the other, the length of the series devices (M15 and M16) was
chosen to be shorter than the length of the shunt devices, so that in the presence of
The DAC used in the integrator is the 10-bit version of the 3-bit DAC in
Fig. 3.14.
(MULT), stored in the circuit’s memory, determines whether the circuit behaves as a
68
&' &'$
%
&' &'%
"
!
&'"
#
&'#
Figure 3.16: Top level diagram of the VGA / 2-input multiplier circuit.
2-input multiplier or a variable gain amplifier. The depiction of the internals of the
core are for conceptual purposes only; it doesn’t have separate circuitry for the VGA
The input signals of the circuit are first processed by range-selectable current
mirrors, which have the same gain settings as those in the input of the integrator. IB1
and IB2 are set to 20 µA when CM1/2 have gains of 20:1, 1 µA when CM1/2 have
gains of 1:1 and 100 nA when CM1/2 have gains of 1:10. Hence, the bias component
mult
iO- +2IT iO++2IT
Vdd Vdd
i2- +IT i1++1uA i2++IT Vdd
Vdd i1- +1uA
Vdd mult Vdd
M11 M17
M13 M15
vb
M1 M2
vb
M3 M4 M5 M6 M7 M8
vb
M9 M10 vb
M12 M14 M16 M18
Port 2 operates in a somewhat different fashion. CM3/4 assume the same gains
IT IT IT
as CM1/2 but IB3 and IB4 are set to 1µA
20 µA, 1µA 1 µA and 1µA
100 nA where IT is
a current generated by the DAC. For all settings, the bias component of the signal
applied to the core of the circuit at Port 2 is IT . This scheme, and that used for
Port 1, allows for the circuit to process signals over a wide range while keeping small
the range of currents over which the devices in the core of the circuit must operate.
This is desirable since the core has devices that must remain weakly inverted, and
as discussed in Sect. 3.3.2 this range is limited. The core of the VGA/two-input
multiplier circuit [20] (Fig. 3.17) uses weakly-inverted MOSFETs (M1 though M10
) in translinear loops in a fashion similar to that of the integrator. When the signal
MULT is low, the drains of M4 and M5 are connected together, and so are M6 and
M7; the circuit acts as a VGA with a gain of 2IT /1 µA from i1 to io assuming i2
is zero. This is depicted in Fig. 3.18. When MULT is high, the circuit becomes a
70
vb
M1 M2
vb
M3 M4 M5 M6 M7 M8
vb
M9 M10 vb
(i+ − + −
1 − i1 )(i2 − i2 )
i+ −
o − io = (3.40)
1 µA
vb
M1 M2
vb
M3 M4 M5 M6 M7 M8
vb
M9 M10 vb
The output current mirrors can be set to gains of 10:1, 1:1 and 1:20. IB5/6 are
IT IT IT
set to 1µA
100 nA, 1µA 1 µA and 1µA
20 µA for the three ranges, respectively.
The gain of the VGA is set through a combination of tuning IT , for fine adjust-
ments, and stepping the gains of the input and output mirrors, for coarse adjustment,
71
! !
" # $ "%% " # $ "%%
Fanout blocks are needed to apply copies of one signal to the input of several
blocks. The schematic of one is shown in Fig. 3.20. Device names beginning with
the letter “C” denote composite devices similar to that in Fig. 3.10. When the signal
HIGH is low, the circuit’s input range is 18 µA, transistors M1, M4, M8, M11 and
M13 are off, M3 and M10 are on, and the circuit behaves like a simple current mirror.
Fig. 3.21 shows a simplified schematic of the fanout block in this mode. Raising
signal HIGH, turns on M1, M4, M8, M11 and M13, and turns off M3 and M10.
This activates the source-follower stages (M2 and M9). Fig. 3.22 shows a simplified
schematic of the fanout block in this mode. These stages are used to shield the input
of the circuit from the large gate capacitance of the four composite devices on each
72
Figure 3.21: Schematic of the fanout circuit in its largest signal range.
side of the circuit. M2 and M9 are much smaller in gate area than the composite
devices. When the input range is not its highest, the input resistance of the circuit
increases such that putting it in parallel with the large gate capacitance would slow
! !
" # $ "%% " # $ "%%
Figure 3.22: Schematic of the fanout circuit in is smallest and middle signal range.
If composite devices CN1 to CN8 and CP1 to CP8 are all in the same con-
73
figuration as one another, the circuit has a gain from input to each output of 1. By
configuring input composite devices differently from output composite devices non-
unity gain can be achieved. The composite devices are controlled in groups of four.
These groups are: CN4, CP4, CN5, and CP5; CN1, CP1, CN6, and CP6; CN2, CP2,
CN7, and CP7; and CN3, CP3, CN8, and CP8. Within each quartet, all composite
devices have the same configuration, however, each quartet may have a different con-
figuration than the others. Gain is possible since the gate-source voltage of the input
composite device (CN4) is the same as each of the output devices (CN1, CN2 and
CN3), but their W/L ratios may be different and assuming the voltage at the outputs
is high enough to keep the devices in saturation, current is proportional to the aspect
ratio of the composite device. The possible range and gain settings are summarized
in Table 3.3, with “l” denoting the lowest signal range, “m” denoting the medium
3.3.5 Exponential
Single-ended to differential
current mirror
Vdd
Differential to single- io+
ended current mirror I2
iin+ + A2
1
iin- M2 -1 io-
−
Vdd C
Vdd
I1 +
A1 R D2 I2
D1 − M1
I3
Fig. 5.12. The input to output behavior of the circuit is given by:
à !
i+ − i−
i+
o − i−
o = 2I1 exp R in in
(3.41)
nφt
where φt is the thermal voltage and n is the diodes’ slope factor, assuming that
the diodes match, and that the two current sources carrying I2 , which keep a min-
imum current flowing through M2, exactly cancel. The diodes are formed by the
source/drain to well junction of a PMOS transistor. Amplifier offsets scale the out-
put by a multiplicative factor only and their effect can be canceled through adjusting
AB circuit. Class-AB mirrors have the following advantages over a class-A mirror,
assuming that the class-A circuit’s bias has been selected to handle the largest signal
expected.
• lower power dissipation, both when no signal is applied, and when the largest
signal is applied.
• lower output current noise for small signals, since the current through the active
devices is smaller.
• graceful degradation should the input exceed the maximum expected signal.
Most discussion of class-A versus class-AB circuits centers around the first
IM AX . To do this, the devices are biased with that current, at least. Assume that the
input and output devices’ aspect ratio match one another’s to 1 % and the input and
output bias sources match exactly. Assume also that the only source of drain current
mismatch is mismatch in their aspect ratios. When no input signal is present, the
output offset of this mirror will be 0.01 × IM AX . Now, consider a class-AB mirror
intended to handle the same maximum current of IM AX , and assume that input and
output devices match in the same fashion. In this case, when there is no input present,
the devices are conducting a much smaller current, IST AN D . Now the output offset will
be 0.01 × IST AN D , which is smaller than for the class-A mirror since IST AN D ¿ IM AX .
76
Similar arguments can be made for the way in which output currents of a mirror will
match one another in the case of a mirror with multiple outputs. Offsets due to
mismatches between the threshold voltages of devices are also smaller for class-AB
and their potential for instability. The former stems from the more complicated
biasing network needed to establish the minimum current and the latter stems from
a combination of there being more stages in the feedback loop and the wider range
of loop gain over which stability must be guaranteed. Even a simple, class-A mirror
voltage is applied across the gate and source terminals of the device and induces a
current to flow from drain to source. However, this loop is stable for any impedance
connected to the diode node, with positive real part, assuming quasi-static operation
of the devices. This assumption is valid for all but the highest frequencies of operation
On the other hand, the greater complexity of a class-AB circuit can lead to
instability. The schematic of the class-AB mirror used at the output of the exponential
circuit is shown in Fig 3.24. It is based on a class-AB output stage, but with multiple
the current output ioutf b and whatever current flows in or out of ioutf b is copied to
Figure 3.24: Schematic of the class-AB mirror used in the exponential circuit.
through M20 are biased with a current of IST AN D . Discussion will focus on the output
devices M9-M12, since they control the output current that is fed back to node vIN .
When vIN is increased, M9 and M10 conduct less current and M11 and M12 conduct
more. The range of output current is much larger than IST AN D . However, for the
circuit to sink a large output current, the input signal must propagate through C, M31,
M21 and M4. This path can have enough excess phase to make the circuit unstable
when connected as a mirror, even with the feed-forward capacitor C. The circuit was
stabilized by adding a grounded capacitor to the vIN /ioutf b node in Fig. 5.12.
3.3.6 Logarithm
The logarithm circuit (Fig 3.25) uses a class-AB current mirror at its input, similar
to that used at the output of the exponential circuit, that converts the differential
input to a singled ended signal. This current is forced through a diode formed by the
78
shown in Fig. 3.26, compares the voltage at the source/drain of M1 to the voltage at
the source/drain of M2 and generates a differential output current. The reference cur-
rent (IREF ) through M2 determines the input current that gives zero output current.
à + !
iin − i−
in
i+
o − i−
o = Gm nφt log (3.42)
IREF
The transconductor is similar to [21], with input devices M11 and M19 biased
in the triode region. Cross-coupled devices M10 and M18 cancel the charge injected
through the gate-drain capacitance of M11 and M19 and improve the circuit’s high-
79
∂(i+ −
o − io ) W IDSM 8
Gm = + − = µCOX VDS ≈ (3.43)
∂(vIN − vIN ) L (VGS − VT )0
where VT is the threshold voltage of the input devices, VGS and VDS apply to the
input devices when the differential input voltage is zero, and µ, W , L and COX are
parameters of the input devices. M1 through M7 generate the bias voltage for M8
and M16. This bias is not signal dependent, despite the gate of M5 being connected
−
to vIN since this input is tied to a DC voltage in the logarithm circuit. The current
through M8 will be the difference between that flowing through M6 and M1.
+ −
Eq. 3.42 is only valid for positive values of vIN − vIN . Whereas the logarithm
function has a definition for negative (and complex) arguments, this logarithm circuit
80
will saturate as its input is reduced, before the input reaches zero. The value to which
Figure 3.27: Nonlinear functions that can be implemented with one programmable
nonlinear block.
There are four programmable nonlinear blocks within each Macroblock. Each
can implement the following where x is the input and y is the output:
• sign: y = c1 for x < 0 and y = c2 for x > 0 where c1 and c2 are constants,
where c1 and c2 are constants, programmable for each pair of blocks. (Fig. 3.27
81
C)
Two neighboring blocks can be used together to implement the following where x1
(Fig 3.28).
functions from the single block list; y3 = c1 when x2 > 0 and y3 = c2 when
• Sample and hold: y1 = x∗1 when x2 > 0; y1 = 0 when x2 < 0; x∗1 is the value
the functions from the single block list; y3 = c1 when x2 > 0 and y3 = c2 when
• Track and hold: y1 = x∗1 when x2 > 0; y1 = x1 when x2 < 0; x∗1 is the value
the functions from the single block list; y3 = c1 when x2 > 0 and y3 = c2 when
to V) converts, comparators, switches and some combinational logic (Fig 3.32). Two
DACs which are similar to those used in the integrator. These voltages are compared
against those corresponding to the input current. The outputs of the comparators are
processed by the combinational logic to determine when the various switches should
characteristic has an abrupt slope change, the circuit is switching internally. For
example, when the input signal to a block implementing the saturation function is
between the lower and upper saturation limits a copy of the input signal is connected
to the output. When the input level reaches the upper saturation limit, the input
copy is disconnected from the output and replaced by a copy of the upper saturation
limit. Mismatch between the input and its copy and the upper limit and its copy
this, an attempt was made to select sizes for critical devices that limit mismatch in
characteristics to 1 %.
For the dual block functions, there are more comparators, more combinational
logic, and a mirror in which the gates of the output devices are separated from the
gates of the input devices by switches, allowing for the input current at particular
times to be sampled.
85
As mentioned in Sect. 3.1 connections between blocks are made through complemen-
tary pass-transistor switches (Fig. 3.33) whose state is stored in an adjacent SRAM
SRAM cell which holds the state of the switch. Programming the cell, by raising
wordline, when bit is pulled to ground will close the switch. Because the input volt-
age of blocks is near VDD /2, complementary transistors help keep the on-resistance of
the switch low. M7 and M8 have W = 0.9 µm and L = 0.24 µm and M9 and M10 have
W = 1.5 µm and L = 0.24 µm. These dimensions are a compromise between very
narrow devices which would reduce the capacitive loading on the inputs and outputs
of blocks and wider devices which have lower on-resistance. It is important to realize
that each path from the output of one block to the input of another is loaded by
almost 90 switches, each containing 2.4 µm of drain diffusion. Signals that must be
86
routed from one macroblock to another are loaded by an additional 160 switches.
cross collections of output wires. However, between the output of a given block and
its connection to the wire that enters the switching matrix is the circuit in Fig. 3.34.
to i+ −
out or to iout (Fig. 3.34). The latter is done to invert the signal. The signal close is
a control signal that, when raised, connects the switch, with the polarity determined
by the SRAM-cell’s state. For reasons outlined in Sect. 3.4.3, the outputs of the
blocks are connected and disconnected from the switch arrays at various times during
The SRAM cells are programmed using a standard scheme using clock (CLK)
and write enable (W R EN ) signals. A wide fan-in NOR-based word line decoder
87
converts 6 address bits to up to 64 word lines. When CLK is high, every bit and bit
line is pre-charged to VDD . When CLK is lowered, if the particular column has been
The functional blocks are organized within the macroblock as shown in Fig. 3.2.
Within the macroblock, input wires for the blocks in a given row run above the row
of blocks with the exception of the wires for the row of nonlinear blocks, which run
below the row. This is different from how it is depicted in Fig. 3.2, as the latter is
intended only for conceptual purposes. Most output wires run to the right of a given
column of blocks with the exception of the wires connected to the outputs of the
nonlinear blocks. The three outputs for the pairs of nonlinear blocks (not including
the log/exponential blocks ) run in the column in between the pair. From bottom to
top, this scheme leads to the following count of input wires (pairs) in each row: 5, 10,
5, 11 since integrators each have one input, VGA/2-input multipliers each have two
inputs, fanouts each have one input and the nonlinear blocks each have one input.
From left to right the columns have the following numbers of output wires: 11, 8, 11,
8, 10. There are a total of 31 input wires and 48 output wires. Where each group of
input wires crosses a group of output wires, there is an array of complementary pass
transistor switches and SRAM cells which hold the state of the CMOS switches. A 3
88
by 3 array of these switches is shown in Fig. 3.35. To simplify the schematic, signals
in the figure are single-ended and switches are only NMOS. A schematic of the actual
SRAM-cell/switch used is in Fig. 3.33. The bit terminal of all of the cells (Fig. 3.33)
in a given row are connected together, as are the bit terminals. Collectively, all of
the signals that connect bit and bit terminals are known as bit lines. The wordline
with the notable difference between this and other memory applications that this is
a write-only memory, as there is no need to read its state. Word lines run vertically,
parallel to the wires connected to the outputs of blocks, and bit lines run horizontally,
parallel to the wires connected to the inputs of blocks (Fig. 3.35). In terms of physical
layers, the horizontal input wires run on metal-5 (M5), output wires on M4, word
• With all word lines low, all bit lines are pre-charged to VDD , whenever both a
• When the clock signal falls, the switch’s word line is raised, closing M5 and M6
(Fig. 3.33).
• Also when the clock signal falls, either the bit signal or the bit signal is pulled
• When the clock signal rises, the word line signal falls, disconnecting the cell
89
There are 31 inputs within a macroblock and 31 pairs of bit lines running parallel.
The state of the switches is programmed by an 8-bit word, meaning that the 31 bit
lines are divided into 4 groups (3 groups of 8 and 1 group of 7), selected by two
address bits. The footer circuitry is such that writing a 0 to the cell pulls the bit side
low and closes the switch. Each word line corresponds to a particular output and
typically each output is connected to only one input. This means that only one of
If the bit lines ran parallel to the outputs and multiple outputs in the same
group of 8 were connected to one input, more than one bit in the eight bit word
would be a zero. To properly generate the programming data for the memory, multi-
In the scheme that is used, multiple outputs connected to one input requires that
multiple word lines go high in separate programming words. This means that only
output wires (numbering 8, 10, or 11) there is an array of switches, making for 9
possible array sizes. To reduce the number of blocks needing to be laid out, only
three array sizes were used: 5 by 11, 10 by 11 and 11 by 11, where the first number is
the number of rows (i.e. number of input wires to connect) and the second number is
the number of columns (i.e. number of output wires to connect). In several instances,
only 8 columns are needed, meaning that three columns are unused. Of 55 columns
90
of cells in the memory arrays, only 48 are needed. The improper connection of the
unused columns of switches led to the most significant design flaw on the chip, which
is described below.
In addition to the arrays of switches wherever groups of input wires cross groups
of output wires, there are more complicated switches (Fig. 3.34) that separate each
output from its vertical wire in the grid. Each of these output switches can connect
its input side to its output side directly, or with cross-coupling. This allows signals to
be inverted, and possibly subtracted from others. These switches take a signal, close,
which, when high, closes the switch with the polarity determined by the state of the
SRAM cell. When close is low, the switch is open. These switches are connected such
that their word line is connected to the word line associated with the column in the
switch arrays corresponding to their output wire, so that the output switch can be set
whenever a connection is made. The bit lines of these output switches are connected
together and are controlled by a separate footer circuit. The complete programming
word is 9-bits, consisting of the 8-bits discussed above and the single bit that controls
the polarity of the output’s connection. A signal passes through two switches to get
from the output of one block to the input of another when the two blocks are in the
same macroblock.
The input and output wires within each macroblock extend outside the mac-
roblock, allowing for connections between blocks in different macroblocks. Below each
row of macroblocks are 16 pairs of wires over which the output wires of blocks within
the macroblocks extend. Each output can be connected to any of these 16 wires
91
through a 16-row, 48-column array of switches and memory. Beside each column of
macroblocks are 16 pairs of wires over which the input wires of blocks within the
macroblocks extend. Each input can be connected to any of these 16 wires through
a 31 row, 16 column array of switches and memory. Wherever the groups of 16 hori-
external switches span 64 columns and 47 rows. Together, they can be thought of
as 47 (31+16) row, 64 (48+16) column array, with a section (31 rows, 48 columns)
removed. The thick, dashed line in Fig. 3.3 shows how a signal is routed from the
’X’s indicate three switches that are closed to allow this connection. The signal must
also pass through the output switch associated with the block inside marcoblock W,
Each end of the vertical and horizontal wires between macroblocks can be
connected off chip. The horizontal wires are used to route signals from blocks on-chip
to off-chip. At each end of the 16 wires are 16 switches. Adjacent switches have
their output sides connected together. Each switch’s input side is connected to one
of the 16 wires. This schemes lets the pair of switches act as a 2-1 multiplexor. The
8 output sides are connected off-chip. Likewise, there are groups of 16 switches at
the top and bottom of the groups of vertical wires which allow connections from off-
chip to be made to the inputs of the chip’s blocks. These are connected such that
adjacent switches’ act as 1:2 demultiplexors with input sides connected together and
Two choices are possible for how the word lines could be controlled. In the first, all
55 word lines could be addressed and controlled, while in the other, the 7 word lines
connected to the extra, unused columns of cells could be disabled, leaving only 48
word lines. Both schemes require 6 bits to address, however, circuits for an earlier chip
that had 48 word lines had been designed and were used; thus the latter scheme was
used. The cells in the unused columns of switches/SRAM cells were each connected
• The input side of the switch was connected to the input wire, and hence to the
input of a block.
• The bit lines of the SRAM cells were connected to the bit lines used for the
This approach has the flaw that the SRAM cells in the unused columns cannot
be altered. While it is not necessary to program these cells to close switches, this
scheme doesn’t allow them to be reset properly when the chip is powered on. An
SRAM-cell consists of two back-to-back inverters (M1/M2 and M3/M4 in Fig. 3.33),
and is a bistable system. If the circuit had perfect symmetry, the state into which
noise, with a 0.5 probability for each state. If an unused SRAM cell/switch powers
up (when the chip is powered on) in the closed state, the input wire connected to the
switch is shorted to ground since the output side of the switch is connected to ground.
With 7 unused switches connected to each input wire, all 7 SRAM cells would have
to power up in the OFF state for an input to not be shorted to ground. If the two
states of a given SRAM cell were equally likely, this would occur with a probability of
( 21 )7 = 1
128
. Clearly, if the power-on behaviour of the memory behaved in this fashion
127
almost no block would be functional, since there would be a 128
probability that its
In HSPICE and Nanosim simulations, all blocks were functional, and this error
was not detected. This suggests that some electrical condition exists that favours the
OFF state over the ON state when the chip is powered on. The capacitance between
the input/output wires of the switches and the bit side of the SRAM cell is larger
than that to the bit side of the SRAM cell. On power-on the inputs of the blocks will
start to charge up to a voltage determined by the input current mirrors of the block,
assuming that they are not shorted to ground by an SRAM cell. It is speculated that
this increase in voltage is capacitively coupled more strongly to the bit side and nudges
it sufficiently that the bit side reaches VDD and the bit side is pushed to gnd. The
coupling is stronger because the PMOS devices are wider than the NMOS devices.
For this trend to hold up in practice, the actual waveforms on start-up would have
to be the same as those in simulation, and this nudge needs to be larger than the
random noise in the system, absent from HSPICE and Nanosim simulations, which
94
might offset this push. The prevalence of this problem and solutions to it are discussed
in Chapter 5.
Every functional block, with the exception of the logarithm blocks, has some memory
for holding range settings, DAC input words, and other parameters. The memory of
each block is a collection of level-sensitive gated latches. Input data come from the
same data lines as are used to program the switch memory. The output of a 6-input
address lines, and their complements, are routed around each macroblock. 5 inputs of
the AND gate are connected either the true or complement polarity of each address
signal, while the sixth signal is connected to a control signal, which is toggled to latch
in the data.
This section describes control of the chip after the states of the switches and the
block-memory has been programmed. The following signals are involved in simulation
control:
• CAP CON : This corresponds to the signal VCAP in Fig. 3.5. During the pream-
• SIM : All of the close signals associated with the output switches for the inte-
95
grators are connected to this signal. During the preamble to a simulation, this
• CON IN P : All of the close signals associated with the output switches for
every block except the integrators are connected to this signal. During the
• S IN : This corresponds to the signal V2 in Fig. 3.4. This connects the integrator
Typical waveforms for the four simulation control signals are shown in Fig. 3.36.
• The signal CAP CON is raised T8 after CON IN P falls, resetting the integra-
• T1 is the duration over which CAP CON is high. This must be long enough to
reset the integrators. The time constant for resetting them is much smaller than
their integration time constants, since the resistance through which each capac-
itor is charged or discharged is much smaller than the 1/gm of the transistors
but is likely longer than that because it is during this interval that the chip is
programmed.
• T2 is the interval between CAP CON falling and CON IN P being raised. The
requirement on T2 is T2 ≥ 0.
• T3 is the interval between CON IN P being raised and S IN being raised. The
• T4 is the duration of the offset cancellation procedure. During this interval, the
integrator may have drifted. T4 needs to be longer than a few time constants
of the integrator with the slowest time constant. Integrators have a nominal
time constant of 40 µs. If the output of the integrator had drifted (because of
an input offset) to the point of saturating the integrator to its fullscale output,
it would take 7 time constants for its output (and input since it is in unity-
gain feedback) to reach < 0.001 of full scale. For integrators with nominal
time constant, T4 ≥ 280 µs. This would reduce the input offset to be no more
than 0.1 % of fullscale. One could wait longer; however, the nonidealities of the
scheme (charge injection and capacitance division) prevent the offset from being
smaller than about 0.1 %. Lengthening T4 does not improve offset cancellation.
• T5 is the interval of time between the end of the offset cancellation phase and
the connection of the output of the integrators to the rest of the system. Ideally,
this interval of time should be small; this prevents the outputs of the integrators
97
• T6 is the duration of the actual simulation. Shortly after SIM is raised, the
inputs of the system are applied and the outputs are observed. In the case of
systems that are nominally low pass, it may be prudent to delay applying the
inputs to the system so that the response of the system to residual offsets can
be observed, allowing this effect to be subtracted out from the total response
of the system.
98
Chapter 4
4.1 Overview
A brief overview of the analog computation environment can be found in Sect. 1.1
and in Fig. 1.1. Critical aspects of the computation environment are to allow:
The rest of this chapter describes the implemented system. Suggestions for
In the current environment a user controls the analog computer in the following way:
• The user draws a block diagram in Matlab’s Simulink. Simulink creates a model
• Matlab programs interpret the “.mdl” file that Simulink produces and generate
• Matlab programs, via a data acquisition card and some interface circuits on a
PCB, program the connectivity and various parameters of the analog blocks.
• The data acquisition card generates inputs and measures outputs for the chip.
A user selects blocks from a separate Analog Computer Library, which have the
they have the necessary behavioural workings to allow them to be used in Simulink
simulations. This is useful for the purpose of characterizing the chip, since a user
can draw a block diagram, simulate it using both Simulink and the analog computer,
and compare the results. To enable the simulation in Simulink, the Sum block must
be used to perform addition rather than simply connecting outputs together, since
102
the simulation on the AC, the user (at present) must explicitly use the Fanout block to
enable the output of one block to be connected to the inputs of multiple blocks. Each
Simulink block is mapped to its on-chip counterpart through its instance number.
• If there are N blocks of a particular type of block in each macroblock, the mac-
roblock in the top, left (0th row and 0th column) of the chip contains instances
0 to N − 1.
• Column indices increase moving from the left side of the chip to the right side.
• Row indices increase moving from the top side of the chip to the bottom.
• Block instances are counted by moving across the row before moving to the next
row.
For example, Fanout120 (there are 10 Fanout blocks per macroblock) is the
0th Fanout block in the macroblock in the 3rd row and 0th column (bottom, left
macroblock).
Matlab programs generate the necessary bit stream to program the connectivity
of the blocks and the parameters within each block. They remove the Sum blocks,
outputs with 12-bit resolution and 8 digital I/Os. Input sampling occurs at a max-
imum rate of 1.25 MSamples/s. However, the per-channel rate is, at most, 1.25
MSample/s divided by the number of active channels. When multiple input channels
are active, sampling takes place in a round-robin fashion, meaning that samples are
on both channels, synchronously. The card is connected to the chip’s printed circuit
The small number of digital outputs (8) of the data acquisition card, and
the large number of digital inputs of the chip (> 32) requires the use of a serial to
Digital data (a < 1 : 15 >, d < 0 : 15 >, RST , W R EN ) are clocked into the shift
register chain. The data acquisition card’s digital outputs are used for the following:
• a < 0 >: This is the a < 0 > signal for the chip’s block memory. See Chapter 3.
• S IN and S OU T . The signal for the offset cancellation scheme in the integra-
tor is actually composed of two signals, one of which controls M3 and the other
switch grids.
The signal SIM is controlled by one of the data acquisition card’s analog out-
and the other analog output VIN which serves as the forcing function for the system
being simulated. Typical waveforms for the control signals are found in Fig. 3.36.
The chip’s PCB has eight LM13700 transconductor chips, which each have two
ductors is connected to an input port of the chip. The input to the transconductor-pair
can be connected to VIN , to ground, or left open (Fig. 4.1). Variable resistors allow
The two polarities of output current from seven output ports are converted to
tance amplifiers. Seven inputs of the data acquisition card measure these differential
voltages. The 8th input of the card can measure the output current of a block’s DAC
4.2.1 Calibration
Before a simulation has been run with a particular chip, the time constants of in-
tegrators and the gains of VGAs are measured for each integrator and VGA for a
finely spaced sequence of DAC settings, and the results are stored for future recall in
look-up tables. When a simulation is run, the gains of VGAs and the time constants
of integrators are set. When a particular gain or time constant is programmed, the
the look-up table. It was noted in Chapter 3 that the DACs were designed to be
non-monotonic. However, the amount by which the series transistors were scaled in-
troduces non-monotonicity only every 8 levels. For example, the DAC’s current may
be smaller for word 32 than 31, but its output increases monotonically between 32
and 39. For this reason, the behaviour for the VGAs and integrators was measured
for DAC words in the sequence 7, 8, 15, 16, 23, 24 . . . . . Algorithms assume
that both the τ of integrators and the gain of VGAs are linear over the 7 bit range
stant is included in the model, allowing a user to correct for it. That is, if a user
wants a multiplier that implements y = x1 ∗x2 but the instantiated model implements
y = 21 (x1 ∗ x2 ) the Simulink block will indicate this allowing the user to follow the
Chapter 5
Circuit Measurements
5.1 Integrator
The various configurations used to test the integrators are shown in Fig. 5.1. These
diagrams are drawn taking into account that the blocks are current-mode. Therefore,
the connection of two lines at the input of a block indicates that two signals are added
and applied to the input. For example, in set-up E an output of F1 and an output
of F2 are added and applied to the input of the integrator under test (IUT). The “-”
sign below the feedback path in set-ups C, E and F indicates that the connection was
made with negative polarity. That is, in set-up E, the output of F2 is inverted before
being added to the output of F1 . Because the blocks are current-mode, the input of
the IUT in set-up B is not floating; it has a signal of 0 applied to it, analogous to
a voltage mode input being grounded. Blocks ADC and DAC are part of the data
108
acquisition card used in the test set-up. Transresistance amplifiers (TRA) are needed
because the ADC measures voltage while the chip’s output signals are currents. The
circuits on chip are all differential, meaning that there are actually two TRAs and
the ADC measures the difference between their two outputs. Similarly, there are
two transconductors connected to the DAC, applying opposite currents to the blocks
feedback. The input of F1 and F2 ’s middle output are set on the high range. The
middle output of F1 and input of F2 are set on the same range as the integrator. In
set-ups C and E the bottom output of F2 is set on the same range as the IUT while
in set-up F it is set on the high range. The bottom output of F1 is set on the high
109
range. This scheme allows the integrator to be tested over its different signal ranges,
while the test equipment always applies and measures large signals corresponding to
the high range. This is important because the off-chip capacitances are larger than
those on chip. Loading the input of a block with such a large capacitance can slow
the block’s response when the input is set to one of the lower signal ranges, owing to
the larger input resistance of the block on a lower signal range. An explanation of
Offsets
Fig. 5.2 shows a model of an integrator with an input offset, IOF F SET,IN . When
iIN (t) = 0, and the output of the integrator is set to zero at t = 0, the output will
110
t
iOU T (t) = IOF F SET,IN (5.1)
τ
where τ is the time constant of the integrator as defined in Sect. 3.3.2. When the
tRAF2
VOB (t) = IOF F SET,IN (5.2)
τ
where AF2 is the gain of F2 from input to its middle output. The input offset can
be calculated from Eq. 5.2 by differentiating with respect to time and solving for
The offset was measured when the integrator’s offset cancellation scheme was
used and when it was not. A line was fit to the measured output of the integrator
over an interval of time following the connection of the integrator’s output to the
measurement set-up. The slope of this line was interpreted as an average value for
dVOB
dt
.
Table 5.1 shows results for the input offset of the integrator when the integra-
tor’s offset cancellation scheme was used and when it was not used. The three columns
of data correspond to measurements of the integrator over its three input/output sig-
nal ranges. The numbers reported are the root mean squared average of the offsets
DC Gain
When a step input of height u is applied to the input of an ideal, open loop integrator,
the integrator’s output will be a ramp (iOU T = τt u). However, most real integrators
are modeled by the diagram in Fig. 5.3, where the gain g (g ¿ 1) models loss in the
integrator. This system, from input to output has a transfer function given by:
1 1
H(s) = sτ (5.4)
g g +1
gain of the integrator. In theory one could measure the DC gain of the integrator
u
by exciting an open loop integrator by a very small step input (small enough that g
is less than the output limit of the integrator) and measuring the output. However,
112
open loop integrators are difficult to measure, since the input referred offset will also
induce an output. Instead a combination of set-up E and F are used. Recall that
Figure 5.4: Integrator with finite DC gain connected in unity-gain negative feedback.
the DC gain is the steady-state ratio of the output of the integrator to the input
of the integrator. This ratio can be measured whether or not the integrator is in
open loop. Fig. 5.4 shows a simplified block diagram of set-ups E and F when the
integrator is set to be on the high signal range, meaning that F1 and F2 have unity
gain. The main difference between set-ups E and F is the quantity routed off-chip for
transfer function from iT EST to iOU T , HCL−LP (the quantity measurable in set-up E),
is given by:
1
HCL−LP (s) = (5.5)
sτ + 1 + g
113
The subscript CL-LP is used because this is the low-pass transfer function of the
1
closed loop system. The DC gain of HCL−LP is 1+g
. The transfer function from
iT EST to iIN , HCL−HP (the quantity measurable in set-up F), is given by:
sτ + g
HCL−HP (s) = (5.6)
sτ + 1 + g
The subscript CL-HP is used because this is the high-pass transfer function of the
g
closed loop system, the DC gain of which is 1+g
. If a step of height of u is applied to
1
both of set-ups E and F, the steady-state output of set-up E will be u 1+g while the
g
output of set-up F will be u 1+g . A ratio of the steady-state outputs is g1 , the DC gain
of the integrator. In terms of the output voltages of the set-ups this ratio is expressed
as:
VOE (tf ) − VOE (t−
0)
ADC = − (5.7)
VOF (tf ) − VOF (t0 )
a time shortly before t0 . tf denotes the time at which the output voltages can be
comes from the loss term g being negative. For active-RC integrators, the DC gain is
a large positive number stemming from the finite gain of the opamp. In the case of
the integrator used in the AC, the finite DC gain stems from non-zero conductances
that are in parallel with the integration capacitors, and mismatch within the pairs
of current mirror devices M19/M20 and M20/M21 (Fig. 3.5). In the event that the
mirror’s gain is greater than unity, the integrator is modeled with g < 0. To maintain
114
applied to the input of the integrator, meaning that the numerator of Eq. 5.7 is
negative. This leads to the calculation of a negative DC gain. This, however, does
not mean that the output of an open loop integrator, when the input is excited by a
very small step input, will reach a steady state of some large negative output, as the
offset the small amount positive feedback, and the system is unstable. When a small
step is applied the integrator’s output grows exponentially in the positive direction
until it saturates.
Table 5.1 reports the smallest magnitude of DC gain (min(|ADC |)) for each
signal range on the chip that was tested. In addition, it reports a type of average
in the row labeled µ(|ADC |). This is intended to represent the average DC gain.
of the integrators. In the event that the current mirrors (M19/M20 and M21/M22) of
one integrator had ratios such that they perfectly canceled the losses of the integrator,
the integrator would have infinite DC gain, and the average of this integrator with
all of the others would also be infinite. DC gain measurements serve to characterize
the size of the loss term g. We express it as the reciprocal of g, namely DC gain,
in a measure of the average size of the absolute value of this loss term, but expressed
to take an average of the reciprocals of the DC gains, and invert this average. This
115
Noise
The mean squared value of VOA was subtracted from the mean squared value of
VOC to compute the mean squared output noise voltage. This was converted to a
mean squared current by dividing it by R2 to get the mean squared output noise
part, because it is difficult to take open loop measurements. Also, in many simulations
the integrator will have feedback around it. As stated in Ch. 3 the output noise of
integrator’s low frequency output noise is canceled by the feedback. In feedback, the
input referred noise of the integrator is processed by the transfer function HCL (s) =
1
τ s+1
whereas in open loop, the input noise is processed by the transfer function of
1
the integrator alone, HOL (s) = τs
. The latter transfer function has more gain at
1
frequencies below f = 2πτ
.
The output noise in set-up C includes noise from F2 , while the noise in set-up
A does not. For some initial measurements the output noise of set-up A in Fig. 5.6
was subtracted from the noise in set-up C. In some instances, the calculated noise was
negative, meaning that the addition of the integrator and feedback path reduced the
overall noise of the system. This is possible since the input noise of F2 is reduced by
the loop gain of the system, which for low frequencies is high. Also, the connection of
the output of the IUT to the input of F2 loads the input of F2 by the capacitance of
116
long connection wires, which filter the input referred noise of F2 at higher frequencies.
This filtering is more pronounced when the input range of F2 is on the smallest range,
In response to the measurement of negative noise, it was decided that the total
noise of the integrator and F2 would be reported, but referred to the output of the
1
integrator by scaling the mean squared current by A2F
.
2
Table 5.1 reports the integrators’ output referred RMS noise averaged across
Nonlinearity
For any linear system, if the input u results in the output y, then the input 2u results
in the output 2y. If the output of a nonlinear system when 2u is the input is ŷ,
the nonlinearity of the system will be based on the difference between ŷ and 2y.
This statement is vague because many different calculations are possible. In some
initial measurements of the integrator, a staircase function (Fig. 5.7) was applied to
set-up E, with constant input levels lasting long enough that the integrator would
reach steady state. The output voltage for the sections of the treads (level sections)
during which the output can be considered to be equal to the steady-state value were
averaged to produce a series of points. A line was fit to these average values and the
nonlinearity of the system was inferred by the RMS deviation of the averaged points
from the straight line. However, the integrator itself could be woefully nonlinear but
To do this, the output of set-up E was measured for steps of two different heights, one
twice the other. If the system were perfectly linear, the step response for the larger
step would be exactly twice that of the smaller step. In other words, if a step of u
led to the output y1 and a step of out 2u led to y2 , the nonlinearity of the system is:
δy = y2 − 2y1 (5.8)
over some interval of time. For this measurement, the two steps were compared over
the interval of time during which the output of the system was between 10 % and
90 % of its final value. In terms of the quantities in Fig. 5.1 the nonlinearity of the
The subscript L denotes the output for the larger step input, while S denotes the
output for the half-sized step input. The mean squared nonlinearity was calculated
by the following:
2 2
1 2 VOE (tf − ∆t, tf )|L + 4VOE (tf − ∆t, tf )|S
i2N L = 2
(ierr (t) − 2
) (5.10)
AF2 R
1
Scaling by A2F
converts the measured mean squared output current from the set-up
2
E to a mean squared current at output of the IUT. Eq. 5.10 is more complicated than
simply scaling i2err because ierr contains the effects of noise, in addition to nonlinearity.
Even if the system were perfectly linear, ierr 6= 0 because the two measurements at
the output of set-up E would contain noise and hence a perfectly linear system would
118
appear nonlinear if the mean-squared value of ierr were reported. To account for this,
typical values for the block’s output noise is subtracted. The second term in Eq. 5.10
(fraction over R2 ) represents the total noise of the outputs from each of the two
measurements over an interval of time during which the output has reached steady-
state. This measurement appears in the table as “RMS Nonlinearity”. This row of
the table represents the results for each integrator combined in an RMS fashion.
Time constant (τ )
µ(τ ) is the mean of τ for all integrators on the chip, when the DAC is set to 512.
σ(τ ) is the standard deviation of the measured τ across the integrators on the chip.
time constant of the step response of set-up D was subtracted from the time constant
5.1.2 Results
Measured results for a chip are summarized in the Table 5.1. The three numeric
DAC Characteristic
Fig. 5.5 shows the τ Vs. DAC tuning word for a typical integrator. The non-
VGA Mode
tities
The various configurations used to test the VGAs are shown in Fig. 5.6. The amplifier
under test is denoted by the block labeled “AUT”. The input range of F2 and the
output range of F1 are set to be the same range as the output range of the AUT and
the input range of the AUT, respectively. The input of F1 and the output of F2 are
For many of the tests, a stair-case input function was applied to the chain
(Fig. 5.7). Each tread of the function was long enough that the circuit was considered
to be in steady-state for the majority of the tread. The output of the circuit was
120
−5
x 10 Typical Time Constant Vs. DAC Setting
8
6
Time Constant, s
0
100 200 300 400 500 600 700 800 900 1000 1100
DAC Setting
averaged over most of each tread to compute a set of average values, shown with *
signs. A best fit line was fit to this set of average values using a least-squares fit,
Results for the VGA are summarized in the Tables 5.2, 5.3, and 5.4. The
column labeled “Range” refers to the input and output range settings of the block.
The first letter denotes the input range and the second letter denotes the output
range. “h” refers to the largest signal range (20 µA), “m” to the middle range (1
µA) and “l” to the smallest signal range (100 nA). The column “DAC” refers to the
setting of the 10-bit DAC that controls the gain of the block. The other columns are
121
explained below.
Gain
The gain measurement is done by computing the slope of the input-output transfer
characteristic (using the staircase function) of set-up D (Fig. 5.6) and comparing this
slope to that of set-up C. µ(K) is the arithmetic mean of the gain for all VGAs on the
σ(K)
chip for the particular combination of range and DAC settings. µ(K)
is the standard
deviation of the gain normalized to the average gain over all of the VGAs on the chip.
Nonlinearity
The nonlinearity of the amplifier reported in the tables is the RMS deviation of the
computed using the staircase input) from a linear fit of the characteristic. NL FS
0.1
0.05
Voltage, V
−0.05
−0.1
−0.15
−0.2
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
Time, s
the RMS deviation of the measured characteristic of the amplifier over 80 % and
40 % of its input range, respectively, from a linear fit. The reported numbers are
Offset
The output offset of the amplifier is the difference between the average value of VOB
and VOA , divided by the product of R and the gain of F2 . µ(OOS ) is the arithmetic
mean of the input offsets for all gains on the chip. This indicates a deterministic shift
q
2
in the input-output characteristic of the block. µ(OOS ) is the root-mean squared
123
q
σ(K) 2
Range DAC Input µ(K) µ(K)
µ(OOS ) µ(OOS ) NL FS NL HS Noise
(in/out) (A/A) (%) (nA) (nA) (%) (%) (nA)
h/h 127 1.00 2.3 -94.7 219 0.020 0.019 14.9
h/h 255 1.74 1.9 -97.2 209 0.019 0.018 14.0
h/h 511 2.78 2.5 -104 210 0.19 0.032 13.7
h/m 127 0.053 2.5 -36.3 222 0.022 0.019 13.0
h/m 255 0.092 2.1 -32.2 209 0.020 0.017 10.8
h/m 511 0.155 1.8 -31.9 203 0.020 0.017 9.29
Table 5.3: Measured results for the VGA Block. Middle input range.
Noise
The output noise is computed by subtracting the mean squared value of VOA from
the mean squared value of VOB . The voltage noise is converted to an input referred
current noise by dividing it by R2 , A2F2 , and A2AU T . The Noise column in the table is
q
σ(K) 2
Range DAC Input µ(K) µ(K)
µ(OOS ) µ(OOS ) NL FS NL HS Noise
(in/out) (A/A) (%) (nA) (nA) (%) (%) (nA)
l/h 127 190 3.8 -1.9 2.5 0.053 0.053 0.12
l/h 255 335 2.9 -2.0 2.5 0.053 0.052 0.15
l/h 511 542 3.5 -2.0 2.6 0.22 0.072 0.18
l/m 127 9.69 3.9 -1.6 2.3 0.053 0.055 0.10
l/m 255 17.1 2.9 -1.6 2.3 0.052 0.054 0.098
l/m 511 29.3 2.3 -1.7 2.3 0.054 0.053 0.094
l/l 127 0.987 4.1 -1.5 2.2 0.053 0.054 0.057
l/l 255 1.74 3.1 -1.6 2.3 0.052 0.053 0.051
l/l 511 2.99 2.4 -1.7 2.3 0.054 0.053 0.047
Table 5.4: Measured results for the VGA block. Smallest input range.
5.2.2 Results
Measured results for the VGA are found in Tables 5.2, 5.3, and 5.4.
Multiplier Mode
tities
Fig. 5.8 shows various measurement set-ups for characterizing the multiplier blocks.
The multiplier under test is denoted by the block labeled “MUT”. An ideal multiplier
125
i+ − + − + −
o − io = K(i1 − i1 )(i2 − i2 ) (5.11)
where K has units of A−1 , i1 and i2 are differential input currents and io is a differential
output current. Due to mismatch between devices and noise, a real multiplier may
i+ − + − + −
o − io = K(i1 − i1 + iof f 1 )(i2 − i2 + iof f 2 ) + iof f O + in (t) (5.12)
where iof f 1 , iof f 2 and iof f O are offset currents at the input of port 1, the input of port
2 and the output, respectively. The term in (t) is an output referred noise current.
Eq. 5.12 is a simplified model of a real multiplier, which may have signal depended
noise and may implement a higher order polynomial function. The following sections
126
discuss the measurement of the multiplication coefficient (K), the offset currents and
The reported measurements for the multiplier block, in Table 5.5, are for a
Multiplication Constant
The measurement of the multiplier is limited by the fact that the data acquisition
system has only one free output. To determine K in Eq. 5.12 the same input signal was
applied to both inputs and the multipliers behaviour as a squarer was measured. Two
outputs of F1 (in set-up E) were each connected to the two inputs of the multiplier
with both positive and negative polarity for a total of four input combinations. That
is, each connection has two possibilities for its polarity meaning that there are four
ways to make the two connections. For each combination the system was excited
by the staircase function. A second-order polynomial was fit to the level sections’
average values, leading to four different polynomials. The four second-order terms
were averaged to give one value for K for the multiplier for a particular combination
In Table 5.5 µ(K) is the arithmetic mean of the multiplication constants across
σ(K)
the chip. µ(K)
is the standard deviation of the multiplication normalized to its mean
Offsets
As Eq. 5.12 shows, the inputs and output each have offsets. The output offset of
the multiplier is the difference between the DC output voltage in set-up B and that
in set-up A, divided by R and the gain of F2 . The input offset of the lower input is
D to that in set-up C. To determine the offset of the upper input, a similar procedure
is used, but with F1 connected to the lower input. In Table 5.5, µ(OOS ), µ(IOS1 ),
q q
2 2
and µ(IOS2 ) are the arithmetic means for the given offsets. µ(OOS ), µ(IOS1 ), and
q
2
µ(IOS2 ) are the root mean squared offsets for each port.
Noise
The output mean-squared current noise is the difference between the mean squared
values of VOB and VOA , scaled by R2 and by A2F2 . Table 5.5 reports the root mean-
5.3.2 Results
q q
σ(K) 2 2
Range µ(K) µ(K)
µ(OOS ) µ(OOS ) µ(IOS1 ) µ(IOS1 )
(In & Out) (M A−1 ) (%) (nA) (nA) (nA) (nA)
h 0.08 2.07 -26.0 225 89.3 183
m 1.70 2.65 -9.14 16.5 6.98 10.7
l 17.0 2.54 -0.87 2.23 1.58 2.09
q
2
Range µ(IOS2 ) µ(IOS2 ) Noise
(In & Out) (nA) (nA) (nA)
h 52.1 168 19.6
m 5.85 10.3 1.02
l 1.62 2.29 0.08
5.4 Fanout
tities
Various test configurations are shown in Fig. 5.9. The fanout under test (FUT) was
preceded and followed by fanout blocks (Fig. 5.9) whose ranges were selected in the
following way: F1 ’s input was set on its high range. Its output range was set to be
the same as the input range of the F U T . F2 ’s input range was set to be equal to
the output range of the F U T , while F2 ’s output range was set to be high. Table 5.6
summarizes the measurements of the Fanout blocks. The first column of Table 5.6
gives the input and output ranges of the block. “h” denotes the largest signal range
(9 or 18 µA), “m” denotes the middle signal range (1 or 2 µA) and “l” denotes the
Output Offsets
The offsets reported are the output offsets for each output of each fanout block,
computed by subtracting the average value of VOB from VOA . This DC voltage was
numbers in the left “Op. Offset” column are the RMS output offsets over all fanout
outputs across the chip. The right “Op. Offset” column shows the output offsets
Noise
The mean squared value of VOB was subtracted from the mean squared value of VOA .
To compute the RMS output noise current of the fanout, the square root of the
130
difference was divided by the gain of F2 and by R. The reported numbers are for the
Gain
For this, and other specifications, the stair case function was applied to set-up D and
to set-up C. The gain of the fanout is the ratio of the slope of the best fit line for
set-up D to the slope of the line measured using set-up C. The numbers in the “Gain”
column are the averages over the three paths for each fanout and over the chip. “RMS
Dev” refers to the standard deviation of the gains, normalized to the average gain.
Nonlinearity
The nonlinearity numbers reported are the RMS difference between the averages of
the treads and the line of best fit for an input that is 80 % of the fullscale range of
the F U T . RMS NL refers to the RMS nonlinearity referred to the input of the block.
The reported numbers in the left column are the RMS across all fanout outputs on
the chip, and the right RMS NL column has the results normalized to the input range.
Mismatch
This specification is a measure of the difference between the gain from the input of
the fanout to one of its outputs and the gain to another of its outputs. It is measured
using set-up E. Two outputs of a fanout block are subtracted from one another at
Range (in / out) Gain RMS Dev Op. Offset Op. Offset
name, (uA) / name, (uA) (A/A) (%) (nA) (%)
h, 18 / h, 18 1.00 0.18 132.73 0.74
m, 1 / m, 1 1.00 0.22 11.36 1.14
l, 0.111 / l, 0.111 1.00 0.23 1.26 1.14
h, 18 / m, 2 0.11 0.83 15.67 0.78
h, 18 / l, 0.222 0.01 0.85 11.93 5.37
m, 1 / h, 9 9.00 0.22 101.18 1.12
m, 1 / l, 0.111 0.11 0.61 1.64 1.48
l, 0.111 / h, 9 80.92 0.39 129.05 1.43
l, 0.111 / m, 1 9.01 0.24 13.32 1.33
Range (in / out) RMS NL RMS NL Mismatch Noise
name, (uA) / name, (uA) (nA) (x 1e-6) (%) (nA)
h, 18 / h, 18 4.76 264.34 0.22 1.07
m, 1 / m, 1 0.54 544.45 0.25 0.20
l, 0.111 / l, 0.111 0.06 537.64 0.26 0.03
h, 18 / m, 2 4.87 270.38 0.20 0.16
h, 18 / l, 0.222 11.33 629.67 0.24 0.00
m, 1 / h, 9 0.55 547.58 0.26 2.02
m, 1 / l, 0.111 0.52 523.92 0.26 0.01
l, 0.111 / h, 9 0.06 553.96 0.26 3.26
l, 0.111 / m, 1 0.06 557.97 0.24 0.35
the ratio of the slope the output’s best fit line to the slope of a similar line measured
5.4.2 Results
5.5 Exponential
Various configurations used to measure the exponential blocks are shown in Fig. 5.10.
the chain of blocks consisting of the DAC, the transconductor (Gm ) and F1 allowing
the offset to be cancelled. Set-up C was used to measure the input-output transfer
nφt
The equation above is a modified version of Eq. 3.41. R
has been replaced by IREF .
The block’s transfer characteristic becomes the following when it has an input and
where IIN and IO are the block’s input- and output-offset currents, respectively.
From Eq. 5.15 it is clear that the input-offset current modifies the transfer charac-
the blocks, an attempt was made to determine the size of the output offset current
so that it could be subtracted from measured results. This was done by measuring
the output in set-up C when a large negative input was applied, effectively eliminat-
ing the exponential term’s contribution to the output, leaving only an output due to
the output-offset current of the block. This offset was subtracted from the measured
in Fig. 5.11. The vertical axis has a logarithmic scaling. Deviation from exponential
was computed in the following fashion, assuming the measured data, with output
• The base-10 logarithm of the output current was computed for each data point.
• A line was fit to these data using a least-squares technique over the range of
inputs from -4.3 µA to 6.0 µA. The points on the line will be denoted as yf it
• The deviation of the logarithm of the measured data from the fit line was
134
−5
10
Output differential current, (A)
−6
10
−7
10
−8
10
−9
10
−8 −6 −4 −2 0 2 4 6 8
Input differential current, (A) −6
x 10
ratio of measured data to the fit line, since a difference in logarithm corresponds
• The ratio was converted to a difference. yratio is the logarithm of a ratio. Hence,
10yratio is the ratio, in linear units, between the measured output and the fit
line. Ideally, this ratio would be equal to one. We are interested in quantifying
its difference from one. Therefore, we consider the error in the input-output
• The reported error for the exponential block is the RMS average of ydif f
Fig. 5.12 shows the input-to-output transfer characteristics of all of the expo-
5.6 Logarithm
Various configurations used to measure the logarithm blocks are shown in Fig. 5.13.
the chain of blocks consisting of the DAC, the transconductor (Gm ) and F1 allowing
−4
Exponential block DC input−output characteristics
10
−5
10
Output differential current, (A)
−6
10
−7
10
−8
10
−9
10
−8 −6 −4 −2 0 2 4 6 8
Input differential current, (A) −6
x 10
The equation above is modified version of Eq. 3.42. Gm nφt has been replaced by K.
The block’s transfer characteristic becomes the following when it has an input and
where IIN and IO are the block’s input and output offset currents, respectively.
From Eq. 5.18 it is clear that the output-offset current modifies the transfer charac-
IO
teristic only as an input, multiplicative scale factor 10 K while the input-offset current
attempt was made to determine the size of the input offset current so that it could be
subtracted from the applied input to the circuit during actual measurements. When
i+ −
in − iin < IIN the output of the logarithmic block saturates to its maximum negative
output. IIN was estimated by finding the largest input for which the output was sat-
urated to this large negative output. This was done by gradually increasing i+ −
in − iin
from a negative value (larger than the expect IIN ) until the output was not saturated
to the block’s negative maximum. The value causing the output to increase from the
block’s saturated output was taken to correspond to the input-offset current of the
block.
138
−6
x 10 Log. block DC input−output characteristic
3
Output differential current, (A)
−1
−2
−8 −7 −6 −5 −4
10 10 10 10 10
Input differential current, (A)
−5
x 10 Absolute value transfer characteristic
3
2.5
1.5
0.5
−0.5
−3 −2 −1 0 1 2 3
Input differential current, (A) −5
x 10
blocks. An attempt was made to eliminate the output offset of the circuits connected
mable nonlinear block when it is implementing the saturation function. The output
current to which this blocks saturates is programmable through a 10-bit DAC. This
140
−5
x 10 Saturation transfer characteristic
2.5
1.5
0.5
−0.5
−1
−1.5
−2
−2.5
−3 −2 −1 0 1 2 3
Input differential current, (A) −5
x 10
figure shows results for two different saturation levels. Fig. 5.17 shows the input-
it is implementing the sign function. The output current to which this blocks sat-
urates is programmable through a 10-bit DAC. This figure shows results for two
different saturation levels. Fig. 5.18 shows the input-output transfer characteristic
function. The definition of this function is found in Sect. 3.3.7. The point on the
10-bit DAC. This figure shows results for five different break points.
tions. Fig. 5.19 shows the block’s inputs (the ramp input is applied to input-port 1
141
−5
x 10 Sign transfer characteristic
2.5
1.5
0.5
−0.5
−1
−1.5
−2
−2.5
−3 −2 −1 0 1 2 3
Input differential current, (A) x 10
−5
x 10
−5 Ramp transfer characteristic
5
4
Output differential current, (A)
−1
−2
−3 −2 −1 0 1 2 3
Input differential current, (A) −5
x 10
x 10
−5 Min/Max characteristic
3
−1
−2
−3
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
Time, (s)
x 10
−5 Min/Max characteristic
3
Output differential current, (A)
−1
−2
−3
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
Time, (s)
−5
x 10
3
Output differential current, (A)
−1
−2
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
Time, (s)
x 10
−5 Gate characteristic
3
−1
−2
−3
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
Time, (s)
−5
x 10
3
Output differential current, (A)
−1
−2
−3
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
Time, (s)
and the sinusoid is applied to input-port 2). Fig. 5.20 shows the outputs of the block.
The lower portion of Fig. 5.20 shows a square wave that indicates which input is
greater. That is, the output signal shown is high when the signal at port 1 is greater
than the signal at port 2. The upper portion of Fig. 5.20 shows the results of the
minimum and maximum operations. The thick line shows the maximum, while the
The right most peak of the sinusoid in Fig. 5.19 is greater than the ramp.
However, the circuit behaves as though the ramp is greater than the sinusoid’s peak.
are: gate (or chopper), track-hold, and sample-hold. For these three functions, one of
144
current of 0 A. The output of this comparator determines when the chopper function
chops, when the track-hold function tracks and holds and when the sample-hold func-
tion samples and hold. The operation of these functions is described in more detail
in Sect. 3.3.7. The comparator to which the following paragraphs make reference is
this 0 A comparator.
mable nonlinear block when it is implementing the gate function. The upper portion
of the figure shows the block’s input signal (ramp function) and the output of the
circuit’s “zero comparator”. The comparator’s output is high when the block’s other
input is greater than zero and low when the input is less than zero. The output in
the lower portion of Fig. 5.21 is equal to the input when the comparator’s output is
mable nonlinear block when it is implementing the track-hold function. The upper
portion of the figure shows the block’s input signal (ramp function) and the output
of the circuit’s comparator. The output in the lower portion of Fig. 5.22 is equal to
the input when the comparator’s output is high and is held when the comparator’s
output is low.
mable nonlinear block when it is implementing the sample-hold function. The upper
portion of the figure shows the block’s input signal (ramp function) and the output
145
x 10
−5 Track−Hold characteristic
3
−1
−2
−3
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
Time, (s)
−5
x 10
3
Output differential current, (A)
−1
−2
−3
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
Time, (s)
x 10
−5 Sample−hold characteristic
3
Input differential current, (A)
−1
−2
−3
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
Time, (s)
−5
x 10
3
Output differential current, (A)
−1
−2
−3
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
Time, (s)
of the circuit’s comparator, described above. The output in the lower portion of
Fig. 5.23 is equal to zero when the comparator’s output is high. The input is sampled
when the comparator’s output falls. The sample is held until the comparator’s output
rises.
As described in Chapt. 3, a design flaw that went undetected during simulation leaves
7 SRAM cells connected to each block’s input unprogrammable. If one or more of the
switches powers up in the ON state, that block’s input is shorted to ground, rendering
127
is far less than the 128
fraction of blocks that would fail if the state of the cell into
which it fell on power-up were a random process with equal likelihood. The chip
has 2.5 V and 3.3 V power supplies. Since the 3.3 V controls the digital I/O and
powers up the Vdd to which the electrostatic discharge (ESD) diodes are connected,
it was powered up first, followed by the 2.5 V supply. Both were powered on by
first connecting the supplies’ cables, with the supplies off and then activating the
supplies, with the sources already set to their appropriate voltages. By accident, the
2.5 V wires were once connected when the source was on. When this connection was
done “hot”, the vast majority of the blocks worked. The “hot” connection ramps up
the supply voltage on the chip more quickly than when the source is activated while
147
connected to the chip. This faster slew on the supply resulted in more of the switches
powering up in the OFF state, leaving the vast majority of the blocks functional.
A double pole, single throw switch was put between the power supplies and
the chip’s PCB. This allows the supply voltages on the chip to be ramped up quickly
and simultaneously. With this scheme nearly all circuits are functional. Interestingly,
if many circuits’ inputs are shorted to ground, the current consumption of the chip
is higher, allowing a skilled user to observe this and power on the chip again until
As part of the calibration scheme, after the chip is powered on a routine mea-
sures the voltage at the input of every block, and determines whether the input has
been shorted to ground. If a block is not functional, the software keeps track of it, so
that a user cannot try to use it in a simulation. With such a large chip, even without
this design flaw, one would want the interface software to determine which blocks
meet specifications.
The chip, with all circuits active typically draws 100 to 120 mA from a 2.5 V supply
meaning that its power dissipation is typically between 250 and 300 mW.
148
Chapter 6
Solutions
The class of PDEs whose solution is discussed here is the one-dimensional diffusion
where T (x, t) is a scalar function at a point x and time t. The coefficient α represents
amples of this equation are heat flow and current flow. In the former, T is the
temperature along a uniform rod (Fig. 6.1), oriented in the x direction from 0 to L.
k
The parameter α is equal to C
where k is the material’s thermal conductivity and C
is the material’s specific heat capacity. It is assumed that heat flow is only in the x
direction. That is, the temperature across the rod in the y and z directions is uniform
and heat does not escape out the walls of the rod, except for possibly at the ends of
1
uniform, distributed RC line and α is equal to RC
, where R is the per unit length
lution
curacy and the largest system that the chip can simulate using each of the two
techniques.
• The effects on the overall solution accuracy of random errors in the coefficients
x=0
x
x=L
h
T1 T T
2 3 T4 T
5 T6 T
T0 = T (0, t ) 7 T8 T
9 T10 = T ( L, t )
Figure 6.1: Discretization of solid rod.
T (0, t) = 1, t ≥ 0 (6.4)
When the spatial partial derivative is approximated by the Forward and Backward
Ṫ = AE T + bE T0 (6.5)
151
The subscript E is used to denote that these are the terms associated with
the Euler approximation for the spatial partial derivative. T0 is equal to T (0, t). T
temperature at the ith cross-section of the rod. The rod extends from cross-section
L
Therefore, the distance between consecutive cross-sections, h, is n+1
. The n × n
α
bE = [1, 0, 0, · · · , 0]T (6.7)
h2
where T denotes the transpose operation. According to Eq. 2.18, Ṫn depends on Tn+1 ,
which does not appear in Eq. 6.5, since Tn+1 = T (x, L) = 0, as stated in Eq. 6.2.
tion of Ti−2 , Ti , and Ti+2 . This requires that the temperatures at T0 and Tn+1 be spec-
152
ified, as well as at T−1 and Tn+2 . The simplest approach is to choose T−1 = T0 = 1,
for t ≥ 1 and Tn+2 = Tn+1 = 0 for −∞ < t < ∞. Physically, this would correspond
to there being portions of the rod extending beyond x = 0 and x = L, which are held
is approximated by using Central Differences twice (Eq. 2.19) and the assumptions
above for T−1 and Tn+2 are made, the following set of ODEs results:
The subscript C is used to denote that these are the terms associated with the Central
along the rod. If Central Differences is used to approximate the partial at every point
α
bC0 = 2
[0, 1, 0, · · · , 0]T (6.10)
h
α
bC1 = 2
[1, 0, · · · , 0]T (6.11)
h
We expect that if the rod is uniform in shape and in physical properties the
temperature will reach a steady-state as t → ∞ which will linearly decrease along the
Eq. 6.2 to Eq. 6.4 are applied. This steady-state temperature can be found by solving
Eq. 6.5 or Eq. 6.8 with the left side set to 0. That is, in steady-state, the temperature
does not change and hence the time derivative of temperature is zero. Rearranging
TE,final is the steady-state temperature when the Euler discretization is used. Like-
used. Solutions of Eq. 6.12 in Matlab have shown that it gives the expected linearly
decreasing temperature as x increases; however, Eq. 6.13 does not. In the particular
Many of the properties of the Central Differences technique, which will be discussed
later, can be retained, while forcing it to agree with the correct steady-state behaviour
by using the Euler approximation for the first and last nodes giving:
The subscript C ∗ is used to denote that these are the terms associated with the Central
Differences approximation for the spatial partial derivatives at all nodes except at T1
The top and bottom rows of AC∗ come from the use of the Euler approximation.
However, the rows have entries of 4 and 8 rather than 1 and 2, as is found in AE
because of the 4 in the denominator of the leading scaling factor in AC∗ . The length
155
α
b∗C = [1, 1, 0, · · · , 0]T (6.18)
h2
Under ideal circumstances, both the Euler and the modified Central Differences
approaches, when implemented on the analog computer, would accurately predict the
transient and steady-state response of the sets of ODEs. However, due to a variety of
nonidealities, neither will produce the exact answer. The degree to which the analog
computer’s solution differs from the exact answer is determined by the accuracy of
the circuits that implement the system and by the sensitivity of the system to those
functional elements, the coefficients in the ODEs were varied and their effect on
the solution of Eq. 6.12 and Eq. 6.19 was examined. The approach taken for this
investigation is to change the coefficients in the ODEs and solve for the steady-state
temperature using Matlab. This gives a prediction of how these errors, if present
in the analog computer’s circuits, would change the analog computer’s steady-state
Fig. 6.2 shows steady-state temperature profiles from several randomly gener-
ated sets of ODEs resulting from the two discretizations. In the top section, each
156
Steady State Temperature Profile for fanout gain rand, 1, 0, −2,0, 1 disc
1
0.8
temperature
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
position
Steady State Temperature Profile for fanout gain rand, 1, −2, 1 disc
1
0.8
temperature
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
position
Figure 6.2: Steady-state temperature profiles for Central Differences (top) and Euler
(bottom) with randomized coefficients.
φ = 1 + 0.002δ (6.20)
√
12
where δ is a random variable with uniform distribution over a range of (− 4
<
√
12
δ< 4
) giving it a standard deviation 12 . In the bottom section, coefficients of AE
were scaled in the same way and the resulting steady-state solutions were calculated.
Clearly, the modified Central Differences technique is less sensitive to these random
0.9
0.8
0.7
0.6
temperature
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
position
When the ODEs are implemented on the analog computer, signals associated
with the off-diagonal 1s in the A matrix pass through two fanout blocks whereas the
signals associated with diagonal 2s pass through only one fanout block. To predict the
effect of a systemic error in the gain of fanout blocks (for example: G=0.998 instead
of 1.000), the off-diagonal elements were scaled by 0.9982 and on-diagonal elements by
0.998. Curves in Fig. 6.3 show the steady-state temperature profiles for systems with
systemic errors in the fanout gains, for different numbers of points, using the Euler
when n = 3 is the ideal curve for the steady-state temperature, assuming all gains are
correct (G = 1). Clearly, the larger the number of nodes into which the problem is
discretized, the larger the steady-state error, when the fanout blocks that implement
the coefficients have deterministic errors. The same can be said for the case when
the blocks have random errors, though some of there errors cancel each other out
resulting in smaller deviations from the ideal solution. These errors occur also for the
Central Differences case, though the errors in steady-state temperature are smaller.
As seen in Fig. 6.3, when n is large, the relative error in the steady-state solution is
much greater than the error in the coefficients of the differential equation.
nodes that can be simulated on this AC, and the availability of global wiring resources
for directing signals off-chip. The architecture of the chip is detailed in Chapter 3
and the testing environment is described in Chapter 5; however, some aspects critical
• Below each row of macroblocks are 16 pairs of wires for routing signals between
• Beside each column of macroblocks are 16 pairs of wires for routing signals
• The current measurement set-up has the capability to measure only seven analog
159
outputs at a time. These are connected to two outputs below the upper three
rows of macroblocks and one output below the lowest row of macroblocks.
To reduce the use of global wiring resources, the five integrators in a given macroblock
integrate the ODE for consecutive nodes along the rod. This corresponds to a mac-
roblock implementing five adjacent rows in the state-space description of the ODE,
that is a slice from Ti to Ti+4 . The derivatives of this slice depend on Ti to Ti+4 as
well as Ti−1 and Ti+5 , which means that each macroblock (with the exception of the
one implementing Tn ) requires two global inputs, one from the macroblock imple-
menting the slice from Ti−5 to Ti−1 and one from the macroblock implementing the
slice from Ti+5 to Ti+9 . Tn ’s macroblock needs only one input since it is assumed that
Tn+1 = 0 Likewise, each macroblock (with the exception of the macroblocks imple-
ment macroblocks implementing Tn and T1 ) needs two global outputs to direct signals
to the macroblocks that implement adjacent slices in the ODE. The macroblocks im-
plementing Tn ’s and T1 ’s ODE must each output only one signal to a neighbouring
macroblock, since the former two macroblocks represent the ends of the rod.
With each of the four macroblocks in each row needing two outputs, eight of
the horizontal global wires below the row of macroblocks are used, leaving eight for
output to off-chip. The same numbers apply to the vertical global wires. All 80
integrators can be used and 32 output ports can be used. However, in the present
test environment only 7 of the output ports can be measured, due to limits in the
data acquisition card used. Even without this limit, to measure all 80 state variables
would require that the system be simulated a few times consecutively, and that a
160
In the case of the modified Central Differences technique, each slice of 5 rows
in the ODE, implementing Ti to Ti+4 requires inputs from Ti−2 , Ti−1 , Ti+5 and Ti+6 .
Therefore each macroblock needs four inputs and four outputs. This consumes suf-
can be used for a total of 65 state variables. This requires between 10 and 14 global
horizontal wires for each set of 16, leaving a total of 18 outputs available.
two such examples. The upper portion of the figure shows the lumped circuit that
corresponds to the Euler equations. The labels Vk denote the voltage at the k th node.
The k th row in Eq. 6.5 corresponds to a Kirchoff Current Law equation written at the
k th node in the upper circuit, if each Ti in Eq. 6.5 is replaced by Vi . The lower part of
Fig. 6.4 is a circuit whose electrical behaviour corresponds to the Central Differences
System Implementation
Ti −1 Ti −1
Ti
Ti Ti
Ti +1
Ti +1
Figure 6.5: Per discretization point block diagram of the heat equation. Implemen-
tation 1.
The state-transition matrices of the above ODEs (AE , AC , and AC∗ ) have
obvious patterns to them, and as such, the block diagram implementations of these
systems have a high degree of regularity. For the case of the Euler method, one
implementation for one discretization point is shown in Fig. 6.5, for the case when
α
h2
= 1. Fig. 6.5 represents one row in the AE matrix, except the top or bottom
row. Because the coefficients in AE are integer multiples of one another, this ODE
can be implemented using only integrators and fanout blocks. The coefficient of -2
α
scaling the Eq. 6.5 (see Ch. 2) the cases in which h2
6= 1 can be handled with the
As noted in the previous section, the signals that implement the -2 coefficients
are processed by one fanout block whereas the signals that implement the unity
error in the gain (G 6= 1) of the fanout blocks, the signals implementing the unity
162
Ti+1 + Ti−1
Ti = (6.22)
2
In words, the steady-state behaviour of the heat equation is as follows: The temper-
ature at the ith node reaches the average of its neighbours, as shown in Eq. 6.22.
However, when the gain of the fanout is G 6= 1, and the per-node implementa-
Ti+1 + Ti−1
Ti = G (6.24)
2
In words this means that the it h node reaches a temperature less than the average
of its neighbours when G < 1. This equation predicts the downward bowing of the
those implementing the unity coefficients are scaled by G. The equivalent expression
for the steady-state temperature at a given node, in terms of its neighbours becomes:
Ti+1 − Ti−1
Ti = (6.25)
2G
163
Ti −1 Ti −1
Ti
Ti Ti
Ti +1
Ti +1
Figure 6.6: Per discretization point block diagram of the heat equation. Implemen-
tation 2.
When G < 1, this equation predicts an upward bowing of the steady-state tempera-
ture profile.
If the implementation for consecutive nodes alternates between the two, each
row in the ODE has all of its elements scaled by the same coefficient - either G or G2 .
If the ith row uses the first implementation, the -2 coefficient is scaled by G, while
the unity coefficients it supplies to rows i − 1 and i + 1 are scaled by G2 . Because the
to row i that are scaled only by G, and they supply -2 coefficients to themselves
that are scaled by G2 . Accordingly, the net scaling of each row divides out when one
solves for the steady state temperature, and the temperature at each node becomes
the analog computer. However, this technique can be applied to higher dimensional
PDEs and could be built into more sophisticated simulation software, thereby making
The effects of the deviations in fanout blocks’ gains from 1 can be reduced be
using gain blocks, with gains set to be the reciprocal of the fanout blocks’ gains. For
example, if one path of a fanout block has a gain of 0.998, it could be followed by
1
gain block with a gain of 0.998
. However, there are twice as many fanout blocks as
there are gain blocks, meaning that number of gain blocks would limit the possible
Measured Results
0.8
0.6
0.4
0.2
0
1
0.5
x,
m
0
et
0 5 10 15 20 25 30
er
Time, seconds
s
0.025
0.01
0.005
0
0 5 10 15 20 25 30
Time, seconds
modified Central Differences method are shown in Fig 6.7. A numerical solution to
165
the ODEs was computed using Matlab, against which the analog computer’s results
were compared. The lower portion of the figure shows the maximum error along the
rod, as a function of time as well as the root mean squared error as a function of time.
As seen, an RMS error of about 1 % results, with lower results as time increases.
The scaling is such that the 30 s of the solution in Fig 6.7 takes 1.2 ms to
Prediction of the noise behaviour of a system can be made in several ways. The noise
in a linear system is processed linearly and predictions of the effects of input noise and
internally generated noise can be made quickly and accurately using frequency domain
point and the same analysis is carried out on the linearized system as is done for a
true linear system. This provides reasonably accurate results when the nonlinearities
are soft (i.e., free of discontinuities ), when the operating point is relatively unaffected
by the input signal (i.e., the input signal is relatively small) and when the noise is
small enough so as to not affect the operating point greatly. When the operating
point changes significantly, the transfer functions by which the noise is processed also
166
change.
Techniques have been developed to handle the noise analysis of systems with
computed. It is assumed that the noise does not affect this changing operating point,
since the noise behaviour of the system is predicted by linearizing the system around
systems.
When the noise is large enough to influence the operating point, it is processed
in a nonlinear fashion. In these situations, accurate results are only achieved through
signal is not of interest but rather, the statistics of the solution, given the noise
tral densities are white are have infinite bandwith. This is the same as saying that
the random variables have autocorrelation functions which are dirac delta functions
at the origin. This poses a problem for both numerical analysis and for simulation
quire taking infinitely small time steps and in the latter, the exact equation cannot
be simulated, since no white noise source exists which has infinite bandwidth. How-
computer, some estimation is made of the necessary bandwidth of noise that needs to
167
be accounted for, which in turn dictates how small the time steps in the simulation
must be. For analog simulation, the same bandwidth estimation can be done, which
dictates the specifications of the noise source to be used, in relation to the nominal
appear in a wide range of fields from finance to material science. A discussion of more
Well
∇ denotes the gradient operation. For this example, x is a scalar and U (x) =
function n(t) is a random variable with zero mean and a Gaussian distribution.
well. x is the horizontal displacement of the particle, while ∇U (x) is the steepness of
the well. For this example, the cubic gradient has roots at: -0.852, 0.796 and 0. The
first and second root give rise to stable equilibria. This means that in the absence of
large noise, if the particle is near one of these roots, it will stay near one of them, and
168
if the noise were reduced to zero, the particle would converge to one of these values of
x. The root at zero is an unstable equilibrium, and hence, infinitesimally small noise
will perturb the particle away from it. This is analogous to an inverted pendulum.
Mathematically, the inverted vertical state is a solution, but the smallest noise will
Measured Results
The function n(t) was generated by a Noisecom noise generator, whose noise was
amplified. The purpose of this experiment was to investigate the degree to which the
analog computer can simulate a noisy differential equation, and not to investigate the
degree to which the combination of the noise source and the amplifier produces white
noise. To best compare the analog computer to a digital computer, the noise function
used in the digital computer was a series of samples taken from the noise signal that
For a problem of this sort, mathematicians are usually interested in the sta-
tistics of the solution. Simulations using the analog computer and Matlab were con-
ducted with the variance of the Gaussian noise source, n(t), at 11 different levels.
The simulations were repeated 20 times for each noise level over the interval of time
(0 to 5000 s). A plot of x(t) vs. t resulting from the analog computer’s solution of
When the noise is small, x remains close to one of the stable equilibria and
transitions from one well to the other are infrequent. This is visible in the time
169
0.5
x
−0.5
−1
−1.5
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Time, s
Figure 6.8: First order nonlinear SDE. Small noise (σn(t) = 0.292). Time domain
solution computed by the analog computer.
domain plot Fig. 6.8. The relatively small number of transitions over the simulation
statistics were based on a shorter simulation interval, they could falsely suggest that
sity functions (PDFs) of x(t) were computed. Strictly speaking, these PDFs are
histograms of x(t) over 50 bins. In Matlab, it is important that the samples of x(t)
are for equally spaced values of t. Equal temporal spacing in samples is achieved au-
tomatically for analog computer simulations, since the output of the analog computer
170
computed from a Matlab simulation and an analog computer simulation can be found
in Fig. 6.9.
Because x tends to be close to one of the two equilibria, the PDF is low near
the origin and higher near the equilibria (Fig. 6.9). Agreement between Matlab’s
Simulink and the analog computer is good, as shown by the near coincidence of the
two PDFs.
0.5
−0.5
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
Figure 6.9: First order nonlinear SDE. Small noise (σn(t) = 0.292). Statistics.
Fig. 6.10 and Fig. 6.11 show results for the same SDE, but with somewhat
0.5
x
−0.5
−1
−1.5
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Time, s
Figure 6.10: First order nonlinear SDE. Medium noise (σn(t) = 0.462). Time domain
solution computed by the analog computer.
Fig. 6.12 and Fig. 6.13 show results for the same SDE, but with even larger
noise. Note that because transitions from one equilibrium to the other are so frequent,
solver in Matlab was tried and the speed numbers reported below are for the fastest
one. Tolerances were also relaxed to speed up Matlab, without introducing undue
errors. Speed in Matlab will be determined by the time step of the simulation, which
could be forced small by shortening the sampling interval of the output of the noise
172
0.5
−0.5
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
Figure 6.11: First order nonlinear SDE. Medium noise (σn(t) = 0.462). Statistics.
block. The integrators on the analog computer had a nominal time constant of 40 µs
and the noise source was sampled at 1.25 MS/s, giving a sampling period of 0.8 µs.
This ratio of 50 noise samples per integration time constant was maintained on the
digital computer.
The analog computer was able to compute the solution significantly faster than
a digital computer running Matlab. The 20 simulations for a given noise level took a
total of 96 s (running on a Sun Blade 1000), whereas the analog computer took 4 s, or
less than 4 % of the time. However, this first order system used only the hardware in
one macroblock. If all macroblocks were used, 15 other simulations could take place
173
1.5
0.5
x
−0.5
−1
−1.5
Figure 6.12: First order nonlinear SDE. Larger noise (σn(t) = 0.922). Time Domain
solution computed by the analog computer.
the time.
flow when the parameter α is constant as a function of time and space. A more
interesting problem arises when α is a random variable, varying with both space and
time. In this section the discretized model was changed to model random thermal
174
1.5
PDF of x, Matlab
PDF of x, Analog Computer
Noise PDF
f(x)
0.5
−0.5
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
Figure 6.13: First order nonlinear SDE. Larger noise (σn(t) = 0.922). Statistics.
diffusivity between adjacent nodes. When α is no longer constant, each row of the
assuming h = 1. αi−1,i is the thermal conductivity of the section of the rod between
the (i − 1)st and ith nodes. Those familiar with circuit analysis will see that this is
at node i. See the electrical equivalent circuit shown in the upper portion of Fig. 6.4.
175
where N is the number of interior points in the discretization of the rod. T1,N is a
Each αi,i+1 is the thermal diffusivity between nodes i and i + 1 in the discretized
The subscript DU denotes “diagonal, upper”, since ADU has nonzero entries only on
the main diagonal and the diagonal above the main diagonal. This notation is used
176
for the matrix αDU to show the association between it and ADU . The N by N matrix
The subscript DL denotes “diagonal, lower”, since ADL has nonzero entries only on
the main diagonal and the diagonal below the main diagonal. This notation is used
for the matrix αDL to show the association between it and ADL . The column vector
1
bE equals h2
[1, 0, · · · , 0]T .
Two sets of experiments were conducted. In the first, α3,4 was a random
variable for a system with 7 internal nodes, and in the second, six αs were random
177
variables. The boundary conditions were the same as were used in the deterministic
In the deterministic case, each α corresponds to the gain of an amplifier that processes
is implemented using a two-input multiplier with one input being α and the other is
the difference between adjacent node temperatures. The function used for α was of
α = 1 + n(t) (6.33)
where n(t) is a noise signal with zero mean. A diagram of the implementation of
Measured Results
The following are the results for the first example, where α3,4 is random. Several
simulations of the transient response at the nodes on either side of the region with
random diffusivity are shown in Fig. 6.15. If α34 = 1, T3 and T4 would settle to 0.625
and 0.500, respectively, when there are 7 intermediate points. The nonlinear way in
which the noise affects the solution is clearly visible in that the noise pulls each of T3
Ti
Ti
Ti +1
Ti+1
1
n( t )
Figure 6.14: Circuitry for implementing a random coefficient.
Statistics were generated for the solution at each of the nodes for many sim-
ulations over an interval of time beginning when the system’s response to the input
step had reached a steady state. The start point of this interval was selected, qualita-
tively, to begin at t = 40. Statistics are shown in Fig. 6.16. Agreement is acceptable
between the solutions generated by the analog computer and Matlab. For a linear
noise, symmetrical noise sources will give rise to symmetrical distributions for the
state variables. The asymmetry in the distributions is clear in the analog computer’s
solution and is testament to the need to perform transient simulations, rather than
are shown in Fig. 6.17. These were generated for t > 40s.
179
Ten trajectories of the 3rd and 4th node in the presence of noise
0.8
0.7
0.6
Temperature, fraction of step input
0.5
0.4
0.3
0.2
0.1
−0.1
0 20 40 60 80 100 120 140 160
Normalized Time
Figure 6.15: Transient response for T3 (upper) and T4 . Generated by the analog
computer.
180
Simulink
Analog Computer
3500
3000
2500
Frequency per bin
2000
1500
1000
500
0
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Temperature, fraction of step
2000
1500
1000
500
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Chapter 7
Computer’s Algorithms
7.1 Motivation
Analog computers can find solutions to differential equations rapidly, albeit with only
moderate accuracy. On the other hand, digital computers have the ability to reach
arbitrarily high accuracy. However, if not used carefully, they may converge to a
non-physical solution, may not converge quickly, or may not converge at all. There
are ODEs that are particularly amenable to analog solution in that only a moderately
accurate solution is necessary, and those which require sufficiently high accuracy so
as to necessitate digital computation. One could solve the former with an analog
183
system and the later with a digital system. However, the strengths of each approach
can be utilized to a much higher degree if the analog computer is used to provide
its solution to the digital computer, which will use the analog solution as a starting
point for its numerical routine. This approach has the potential to speed up the digital
computer’s solution of ODEs for which high accuracy is needed, while avoiding some
Solvers
to a periodic input. The condition for a system having reached this so-called periodic
steay-state (PSS) is that all state variables at two times, separated by the period of
The condition in Eq. 7.1 means that the solution of the ODEs need only be
calculated over one period, subject to Eq. 7.1. One period is discretized into n points.
The derivatives at the last point will depend on the value at the first point, stemming
from Eq. 7.1. If the system has m state variables, over the n points, there are a total
expressed in the form f (x) = 0. The technique iterates from an interim solution xk
of the equation toward the exact solution xe . When the interim solution is near the
|xk+1 − xe |
lim =K (7.2)
k→∞ |xk − xe |2
for some nonzero constant K, where xk+1 is the interim solution at iteration number
The analog computer’s solution for the forced Duffing’s equation was used as
a vehicle for investigating the degree to which a PSS routine could be accelerated.
ẋ = y (7.3)
R = 0, the system becomes an autonomous system that will either oscillate or will
on how large γ is. For R 6= 0 the system will either oscillate periodically, or it will
exhibit chaotic behaviour. Loosely speaking if the amplitude of the forcing function
is small enough that x doesn’t change sign, the solution approaches a stable limit
cycle.
185
For fixed R, the transition from a stable limit cycle to chaos can be observed
The digital computation of the PSS of the system proceeds in two steps. In
the first step, the DC steady-state solution of the differential equation is computed
assuming that the input source (R cos ωt) is equal to zero. This requires the use of
a root-finding algorithm, such as Newton’s method. This finds the solution to the
nonlinear equation f (z) = 0, where f (z) is the right-hand side of the state-space
starting point for this algorithm which in this case, leads to erroneous results since
DC steady-state the solution needs to be checked for stability, and if the routine has
Typically, the DC steady-state solution becomes the starting guess for the
actual PSS solver. In the case of Duffing’s equation, the period, assuming its solution
2π
is periodic, is equal to ω
. If this interval is discretized into 64 points, then the
solution vector has a length of 128, since there are two variables. Newton’s method is
a non-physical PSS solution is also possible. Clearly, reasonable guesses are one way
f (xk )
∆xk = (7.6)
f 0 (xk )
In the simplest form of the method for each iteration, θ = 1 and the whole Newton
step is taken. In more sophisticated schemes, values of θ between 0 and 1 are tried
to see if an intermediate step gives a value of xk+1 that better satisfies f (xk+1 ) = 0.
If so, the intermediate step is taken. This has the benefit of preventing the routine
from skipping over the solution and entering into a region in which convergence is less
likely. These steps during which several values for θ are tried are computationally
more expensive.
A program called PSIM based on algorithms in [15] was used in this investiga-
tion. It is a PSS solver that uses the two-step process described above. The number
of iterations PSIM took varied based on the value of γ. In some cases, only 5 or 6
were needed to take the solution from the correct DC value to the PSS. However,
as γ was reduced, the number of iterations increased. For the set of parameters of
GHz Pentium IV. However, when the routine started at a solution given by the analog
computer, the number of iterations was reduced to 5 and the computation time to
0.76 s. The relative reduction in simulation time was greater than the reduction in the
number of necessary iterations. This is because some of the iterations taken when the
digital computer starts from a DC solution are the computationally more expensive
routines described in the earlier paragraph. The analog computer’s solution and the
final solution computed by PSIM are both shown in Fig. 7.1, with the latter drawn
187
0.5
x, y
−0.5
−1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
time, s
Figure 7.1: One period of the steady-state solution of the Duffing equation. R = 0.4,
γ = 0.67 and ω = 2π. The thick lines correspond to the analog computer’s solution
while the thin lines correspond to PSIM’s solution.
with the thin lines. The state variable y is centered close to the time axis while x
An area for work is to extend this technique to larger systems whose PSS is
Chapter 8
8.1 Introduction
This chapter outlines some theoretical comparisons between digital and analog com-
puters along two important performance criteria, namely power dissipation and com-
putation speed. While speed is perhaps the most obvious metric, power consumption
puters increases. For portable applications, the consequence of increased power con-
sumption is obvious: shorter battery life. However, even digital computers plugged
into a wall socket have problems stemming from too high of a power dissipation, such
as overheating and voltage drop due to IR losses in their power distribution network.
189
• The analog computer’s accuracy is adequate and it adequately solves the dif-
one instruction.
• All of the processing work done by the digital computer is carrying out floating
point operations. That is, there is no overhead from instructions that are not
The last two assumptions are necessary simply because most data for the power
efficiency of digital systems quantify energy per instruction. However, we can more
readily gauge the number of floating point operations a routine performs. These
assumptions let us count FLOPs but use the power data for instructions.
On the analog side, the total energy needed for a given computation is simply:
W = PAC ∆t (8.1)
Where W is the total energy dissipation, PAC is the power consumption of the analog
computer and ∆t is the duration of the computation. If the computation does not
190
use all of the analog computer’s blocks, PAC is replaced by the consumption of only
those blocks used. While power-down capability was not included in this design, a
future design could easily be equipped with the necessary circuitry to power-down
unused circuits.
An estimation could be made in a similar fashion, using the duration of the simulation
Matlab can show the elapsed CPU time of an operation or simulation. However,
estimate the number of floating point operations (FLOPs) required to carry out a
simulation, and then scale this number by a given device’s energy per FLOP.
For this discussion, FLOPs denotes the plural of FLOP. FLOPs per second
Matlab has a number of ODE solvers. All of the routines take a function
f (y, t) describing the ODE: ẏ = f (y, t). A Matlab function called FLOPS can be
used to determine the number of FLOPs a routine takes. However, Matlab does
not unsupported the function in Matlab 6 because the inclusion of the linear algebra
191
package LAPACK makes this impractical. The rest of this investigation was done in
For some large systems, the use of LAPACK may reduce the number of FLOPs
needed to invert matrices and perform other linear algebra functions. However, for
the simple examples considered here, this is not the case, and FLOP analysis using
Matlab 5 is appropriate.
In addition to FLOP count, the ODE solvers give a count of the number of
1. Successful steps: This is the number of time steps at which the solution was
evaluated.
2. Failed attempts: This is the number of time steps at which the solution failed or
did not meet convergence criteria, resulting in a shorter time step being taken.
∂f
4. Partial derivatives: This is the number of times the Jacobian, ∂y
is computed
5. LU decompositions.
The numbers of each of 3) through 6) per time step are influenced by the type
of ODE solver used. For example, the last three are never done when an explicit
routine is used. The time step is determined by tolerance requirements, the dynamics
of the system, and the way in which any noise signals are represented.
192
twice as frequently as the highest frequency of interested. This will force the time
step of the simulation to be approximately as long as the spacing of the noise samples,
when this spacing is much shorter than the system’s shortest time constant. How the
noise samples are interpolated further influences the time steps of the ODE solvers.
For example, representing the noise as a zero-order hold (ZOH) of the noise samples
can force smaller time steps since the noise changes abruptly at the steps in the ZOH.
step, but makes for smoother noise, a more accurate representation of continuous-time
The solution of this example (Eq. 6.26) was considered in Sect. 6.2.2. When solved
with the routine ode45, with a relative tolerance (relT ol) of 10−3 , over the interval
t ² (0, 100), 1.6 × 106 FLOPs are needed. The interval t ² (0, 25000) can be simu-
lated in 1 s on the analog computer, when the integrator’s time constant is 40 µs.
25000
On the digital computer, this takes 100
× 1.6 × 106 = 400 MFLOPs. The analog
computer’s circuits that are used in this simulation have a power consumption of
approximately 7.8 mW. Therefore, the equivalent performance of the analog com-
7.8 mW
puter is 400 MFLOPs/s
' 20 pJ/FLOP, while a typical general purpose digital computer
The solution of this example (Eq. 6.28) was considered in Sect. 6.2.3. This example,
with one random coefficient was solved using the suite of ODE solvers in Matlab. To
make a fair comparison, the right-hand side of Eq. 6.28 was coded explicitly, rather
than with matrix multiplication. Each row requires only 3 FLOPs this way rather
than the 19 FLOPs that are required for each row of 10 x 10 matrix multiplication. For
this investigation all ODE solvers were used with a variety of tolerances. The fewest
when tighter tolerances are used was 2.8 MFLOPs (ode23 and relT ol = 10−3 ). This
was over the interval 0 to 100 s. On the analog computer, this takes two marcoblocks
(but only half the circuits in each macroblock), giving a power dissipation of about
15.6 mW
15.6 mW, and a total equivalent performance of 2.8 M× 25000
∼ 22 pJ/FLOP.
100
way: For many ODE solvers, the solutions for relT ol = 10−3 and relT ol = 10−4
responded to the noise in a similar way. However, when the tolerance was relaxed to
10−2 , spikes due to the noise did not track those from the more accurate solutions.
When there are nine random coefficients, the smallest number of FLOPs in-
creased to 7.1 MFLOPs, bringing the equivalent power efficiency of the analog com-
Digital signal processors (DSPs) have a typical power efficiency of 100 pJ/FLOP
to 1 nJ/FLOP [24]. Even custom digital ASICs, such as digital filters, have a power
194
efficiency in the 10 pJ/FLOP range [24]. However, this analog computer has more
programmability than a digital filter. The analog equivalent of a digital filter is, of
course, an analog filter, which can be made to consume much less power than this
device. This analog computer represents a first attempt at a large VLSI analog com-
puter, and it is expected that future iterations would consume less power. Table 8.1
To achieve the necessary accuracy for the solution of these sample problems, a digital
computer may not need to perform computations in floating point. However, the
because the custom ASICs mentioned above which have efficiencies in the range of 10
pJ/FLOP are fixed point devices. Secondly, our comparisons with microprocessors
are biased in favour of the microprocessor due to the assumption that all FLOP take
the same number of clock cycles. For example, many microprocessors can pipeline
multiplication operations such that one operation is performed each clock cycle, how-
195
ever, very few can complete a division operation each clock cycle. In this analysis, all
Using an energy per conversion per level of 1 pJ (a typical value for high-performance
With the above taking place with only 8 bits, the resulting power dissipation is
Stochastic differential equations are a class of differential equations that are solved
efficiently on the analog computer. The inclusion of high frequency noise greatly
increases their computational load on a digital computer. However, the speed of the
analog computer is unchanged by it. Further, instead of the exact solution being
usually the goal of the simulation. This means that the moderate accuracy of the
There are some limitations to the ability of the analog computer to predict
the effects of high-frequency noise. The finite bandwidth of the memoryless blocks,
and the presence of higher-frequency poles and zeros in the integrators’ frequency
196
response limit the range of frequecies over which the noise behaviour of the system can
accurately be simulated. One way to extend this range is to lengthen the time constant
of the integrators, thereby increasing the ratio of the bandwidth of the system to the
lengthening the simulation duration and decreasing the power efficiency of the analog
computer. That is, once the input noise bandwidth has reached the bandwidth of the
analog computer’s memoryless blocks, to double the relative frequency of noise that
can be simulated, the time constants of the integrators must also be doubled, causing
the simulation to take twice as long, and reducing the power efficiency of the analog
a digital computer suffers when it must take twice as many time steps, which is the
When only about half of the blocks on the analog computer are used, it solves differ-
more than 1 FLOP per clock cycle, do not perform operations at this rate.
197
Chapter 9
A revision to this analog computer chip should have blocks that meet more stringent
ticular:
198
• Trigonometric functions.
• A digital logic block and flip-flops. This would allow for mixed-mode simulation.
• On-chip noise generators with the provision for controlling the frequency spec-
One of the most critical modifications to the present chip would be to correct
the problem of the unused SRAM cells powering up in the ON state and shorting the
inputs of blocks to ground. The simplest way to correct this would be to modify the
layout of the existing SRAM/switch cell by removing the VIAS that connect the M5
wires down to the actual input side of the CMOS switches. The unused cells could
• A larger number of analog inputs and analog outputs connecting the digital
• A more direct interface with the digital computer, perhaps incorporating the
AC chip with the above modifications on a PCB that can plug into the digital
wire to be used for a particular connection. This would increase the number of
One of the most promising application areas for future work is the solution of stochas-
tic differential equations. Many interesting problems that are very time consuming
to solve digitally require the solution of low-order equations. Other application areas
reactions.
Appendix A
• b0: Infinity mode. 0 selects dynamic offset cancellation. 1 selects static behav-
ior.
• b1-b2: Input and output range: 00: 100 nA. 10: 1 µA. 01: 20 µA.
• b3-b4: Tuning current range: 00: 100 nA. 10: 1 µA. 01: 20 µA.
• b5: DAC out?. 1 routes the DAC’s output to off-chip for possible measurement.
201
Each VGA / 2-input multiplier accepts a 16-bit word consisting of the following:
• b0-b1: Input range: 00: 100 nA. 10: 1 µA. 01: 20 µA.
• b2-b3: Output range: 00: 100 nA. 10: 1 µA. 01: 20 µA.
• b5: DAC out?. 1 routes the DAC’s output to off-chip for possible measurement.
• b2-b3: Output range for output 1: 00: low. 10: medium. 01: high.
• b4-b5: Output range for output 2: 00: low. 10: medium. 01: high.
• b6-b7: Output range for output 3: 00: low. 10: medium. 01: high.
• word0, b0-b1: Input range for x1 : 00: low. 10: medium. 01: high.
• word0, b2-b3: Input range for x2 : 00: low. 10: medium. 01: high.
• word0, b4-b5: Output range for y1 : 00: low. 10: medium. 01: high.
• word0, b6-b7: Output range for y2 : 00: low. 10: medium. 01: high.
• word0, b8-b9: Output range for y3 : 00: low. 10: medium. 01: high.
• word1, b0: Combos? 1 indicates that the two nonlinear blocks will be used in
irrelevant.
• word1, b2-b3: Range for DAC0: 00: low. 10: medium. 01: high. DAC0 sets c1 .
203
• word1, b15 - word2, b0: Range for DAC1: 00: low. 10: medium. 01: high.
DAC1 sets c2 .
ncections
The following signals are used to program the state of the chip’s switches, and the
• CLK: Active low signal used for programming the states of the switches.
• a[0]: A signal which is pulsed to latch data into the memory of the functional
• a[14] specifies whether the rest of the address word is referring to a location
jacent to it. a[10:11] encode the row (00=0, 01=1, 10=2, 11=3) and a[12:13]
• a[1:9] encode the address of a functional block to be programmed, and the row
and column address within switch memory. For the latter, a[1:3] determines the
index of the block of eight rows in the memory to be programmed and a[4:9]
bits are interpreted depends on the levels of CLK, W R EN , and a[0] signals.
• a[15] is used to program the column of switches that contains the global outputs
of a macroblock.
generate an a[0] signal, and W R EN signals for the macroblock. The first block,
GLOBAL ADDRESS DECODER, has one instance on the chip, while the second
The GLOBAL ADDRESS DECODER block decodes a[10:13] into two sets
of four signals that, through one-hot encoding, specify the row and column of the
to, and internal to a given macroblock. During a reset, if W R EN is high, both global
and local versions of the signal go high. The GLOBAL ADDRESS DECODER’s
a[10] a[11] row sel[0] row sel[1] row sel[2] row sel[3]
0 0 1 0 0 0
0 1 0 1 0 0
1 0 0 0 1 0
1 l 0 0 0 1
Table A.1: Truth table for GLOBAL ADDRESS DECODER’s row sel.
a[12] a[13] col sel[0] col sel[1] col sel[2] col sel[3]
0 0 1 0 0 0
0 1 0 1 0 0
1 0 0 0 1 0
1 l 0 0 0 1
Table A.2: Truth table for GLOBAL ADDRESS DECODER’s col sel.
W R EN a[14] RST W R EN G W R EN L
0 X X 0 0
1 0 0 0 1
1 l 0 1 0
1 X 1 1 1
Table A.3: Truth table for GLOBAL ADDRESS DECODER’s write enable signals.
Signal row sel[i] is applied to the MB DECODER blocks for all macroblocks
in the ith row while signal col sel[i] is applied to the MB DECODER blocks for all
206
note global and local write enable signals, which are applied to every MB DECODER
lowing way:
Only when a particular macroblock is selected via a[10:13] can its write enable
signals and a[0] be raised. a[14] determines if the write enable signal controls the
When a[15] is low, the word lines in the external memory of a macroblock can
be selected by a nor based decoder taking a[4:9] as its input. When a[15] is high, the
word line decoder for the external memory is disabled and a[15] controls the word line
for the SRAM cells controlling the states of the switches that connect global output
wires to off-chip.
A.2.1 Reset
When the RST signal is raised, the chip’s input signals a[0] and W R EN are applied
default data sequence is applied to the memory in all functional blocks. When a[0]
goes high, address lines a[1:5] and the lines that normally propagate their comple-
ments, all go high, causing every block’s memory to be programmed by the default
data. If clk is lowered while RST and W R EN are high, all of the switch memory is
Bibliography
[1] Granino Arthur Korn and Theresa M Korn. Electronic Analog and Hybrid Com-
[2] Glenn Cowan, Robert Melville, and Yannis Tsividis. A VLSI analog computer
[3] Uri M Ascher and Linda R Petzold. Computer Methods for Ordinary Differen-
tial Equations and Differential Algebraic Equations. Society for Industrial and
[4] Edward K F Lee and Glenn Gulak. A CMOS field-programmable analog array.
[6] Piotr Dudek and Peter J Hicks. A CMOS general-purpose sampled-data analogue
[7] M I Sobhy and M Y Makkey. A new look at analog computing using switched
[8] Andrew Singer and Alan Oppenheim. Circuit implementations of soliton systems.
[10] Walter J Karplus and Richard A Russell. Increasing digital computer efficiency
[11] Kenneth Kundert. A Designer’s Guide to SPICE and SPECTRE. Kluwer, 1995.
[12] John H. Mathews. Numerical Methods for Mathematics, Science, and Engineer-
like circuit simulation algorithms. IEEE Proceeds on Circuits, Devices, and Sys-
State Methods for Simulating Analog and Microwave Circuits. Kluwer, 1990.
[16] M Punzenberger and Christian Enz. A new 1.2 V BiCMOS log-domain integrator
for companding current-mode filters. In Proc. ISCAS, pages 125–128, May 1996.
[17] Yannis P Tsividis. Operation and Modeling of the MOS Transistor. Oxford
[18] Barrie Gilbert. A new wide-band amplifier technique. IEEE Journal of Solid-
1440, 1989.
[21] Roberto Alini, Andrea Baschirotto, and Rinaldo Castello. Tunable BiCMOS
Company, 1964.
211