An Asynchronous Linear
Predictive Analyzer
Juha Plosila
University of Turku, Dep. of Applied Physics
Lab. of Electronics and Information Technology
FIN-20014 Turku, Finland
Tiberiu Seceleanu
Turku Centre for Computer Science (TUCS)
Lemminkäisenkatu 14, FIN-20520 Turku, Finland
Turku Centre for Computer Science
TUCS Technical Report No 142
November 1997
ISBN 952-12-0095-2
ISSN 1239-1891
Abstract
Linear predictive analysis is a standard technique in modern digital speech processing.
This makes it an interesting implementation area for asynchronous design. We present an
asynchronous speed-independent circuit implementation of a linear predictive analysis
system. The implementation is built around a program ROM into which the algorithm
is encoded. The design process is carried out using the action systems formalism as the
development tool. As the result we get an efficient and logically highly reliable system
with a potential for low power consumption. We present various block diagrams of the
resulting composition and show the details of a set of selected controllers.
Keywords: linear predictive analysis, speech compression, asynchronous circuits, action
systems, implementation
TUCS Research Group
Programming Methodology Research Group
1 Introduction
Linear predictive analysis [11, 16] is a powerful speech analysis technique with which
the basic speech parameters such as pitch, formants, spectra and vocal tract area functions
can be reliably and accurately estimated. Hence, linear prediction is the basic method
behind the modern speech coding and compression techniques used for instance in digital
mobile phones. This method has the ability to provide extremely accurate estimates of the
speech parameters with a good computation speed.
Linear predictive analysis is based on the idea that a speech sample can be approximated, or predicted, as a linear combination of past speech samples. A unique set of
predictor coefficients can be determined by minimizing the sum of the squared differences between the actual speech samples in a finite time frame and the corresponding
estimates obtained by linear prediction. This minimization problem can be solved using
several approaches. A common and reliable scheme is the autocorrelation method, where
the predictor coefficients are obtained by computing a set of autocorrelation coefficients.
What makes linear prediction an attractive target for asynchronous techniques is the
fact that it is a common factor in the present speech compression methods. In this paper,
we present an asynchronous implementation of a system that computes linear predictive
analysis using the autocorrelation method. As a specification language we have the action
systems formalism [2] which allows us to derive the target circuit in a stepwise manner
within a mathematical framework, the refinement calculus [1, 4]. The logical correctness
of the design is preserved throughout the derivation, from the initial specification to the
final detailed description which is implemented as a network of circuit elements. Consequently, the design process yields a logically highly reliable implementation. In this
paper, the emphasis is on the implementation itself rather than in the details of the derivation. Basically, the derivation flow follows the guidelines presented in our previous work
on a pipelined microprocessor [13]. We thereby provide more evidence that our approach
is suitable for asynchronous design.
The resulting circuit contains a 46-word program ROM (PROM) for the involved
algorithms and a 2-stage pipeline. The control logic is mainly speed-independent, but the
data-path completion signals for data-path components are generated via matched delays
[6]. This is a compromise that requires careful timing analysis of the data path but keeps
the hardware overhead reasonable.
We believe that asynchronous techniques, because of their potential for low power
consumption with relatively good performance, are well-suited for speech processing applications, especially for those used in battery-operated devices. Our estimations show
that our design, even though it contains only a minimal pipeline structure, has a high
throughput capabilty indicating that the idle periods of the system are long compared
to the active periods. This, in turn, indicates potential for a low-power behavior. Furthermore, our PROM-based system is easy to upgrade: the other algorithms needed in a
speech compression method can be merged into the system basically by expanding the
PROM and the data path resources without changing the control logic.
1
Overview of the paper We proceed as follows. In sections 2-3 we briefly introduce the
linear predictive analysis basics and the action systems framework. The initial specification of the circuit is given in section 4. The guidelines of the decomposition process are
discussed in section 5. In sections 6 and 7, we describe the operation of the different functional blocks of the final composition and show the program ROM codes of the involved
algorithms. The system performance issues are discussed in section 8. We end with some
concluding remarks in section 9.
2 Linear prediction
The following is based on the comprehensive study on linear prediction in [11, 16].
The speech production mechanism, including glottal excitation, vocal tract response,
and sound radiation, can be modelled by a time varying digital filter whose system function H (z ) is of the form
S (z)
U (z )
G
P
(1)
p
;k
1;
k=1 ak z
where U (z ) and S (z ) are z-transforms of the excitation u(n) and the speech samples s(n),
respectively. The parameter G, in turn, is a gain factor. We can write for the sequences
s(n) and u(n) the simple difference equation
p
X
s(n) =
ak s(n ; k) + Gu(n)
(2)
H (z )
=
=
k=1
where
s^(n)
=
p
X
k=1
ak s(n ; k)
(3)
is called a linear predictor of the order p with the coefficients ak . The prediction error
e(n), also known as the residual, is defined as
e(n)
=
s(n) ; s^(n)
=
s(n) ;
p
X
k=1
ak s(n ; k)
(4)
This error is an output from a system whose transfer function is
A(z )
=
1
;
p
X
k=1
ak z;k
(5)
The prediction error filter A(z ) is known as the inverse filter, because the synthesis filter
H (z ), defined in Eq. 1, can be written as
H (z )
=
2
G
A(z)
(6)
The total squared prediction error, which represents the energy of the error sequence
e(n), is defined as
E
=
X 2
e (n)
X
=
n
s(n) ;
n
p
X
k=1
!2
ak s(n ; k)
(7)
The predictor coefficients ak are determined by minimizing E by setting
E
ai
=0
1
ip
(8)
yielding the following set of equations, also known as the normal equations:
p
X
X
ak s(n ; k)s(n ; i)
k=1
n
=
X
n
s(n)s(n ; i) 1 i p
In other words, we have p equations from which the unknown coefficients ak , 1
can be solved.
Autocorrelation method The autocorrelation function
s(n) is given as
R(i)
=
1
X
n=;1
(9)
k p,
R(i) of the speech sequence
s(n)s(n + i)
(10)
;1 < n < 1, and observing that
By assuming that the minimization interval is infinite,
R(i) is an even function, we can reduce the equations 9 to
p
X
k=1
ak R(i ; k)
=
R(i) 1 i p
(11)
In practise, however, we can process only finite segments of the sequence s(n), and hence
s(n) is windowed using a window function w(n) yielding the new sequence sw (n):
sw (n)
(
=
s(n)w(n)
0
0
nN ;1
otherwise
(12)
where N is the width of the window, or the frame size. The windowing reduces Eq. 10 to
R(i)
=
NX
;1;i
n=0
sw (n)sw (n + i) i 0
3
(13)
A very efficient method for solving the coefficients ai from Eq. 11 is the Durbin’s
recursive procedure which can be written as follows:
E0 := R(0)
for i =
1 to p :
Pi;1 ai;1 R(i ; j ). E
( ki := R(i) ;
i;1
j =1 j
i
ai := ki
for j = 1 to i ; 1 : aij := aij;1 ; ki aii;1
;j
2
Ei := (1 ; ki )Ei;1
)
for i = 1 to p :
(14)
ai := api
Here the intermediate quantities ki are called the reflection coefficients which can be used
to construct the lattice form versions of the direct form filters H (z ) and A(z ) introduced
above. In fact, the lattice form is preferable because of its better stability properties even
though it requires more computation than the simpler direct form.
3 Action systems
The action systems formalism is based on an extended version of the guarded command
language of Dijkstra [7]. The statements of this language include assignment, sequential composition, assertion, conditional choice and iteration, and are defined using weakest precondition predicate transformers. Comprehensive study on this formalism can be
found for example in [2, 3]. The action systems framework in asynchronous design is
treated in [13, 14, 15].
!
S > where g, the guard,
Actions An action is a guarded command of the form < g
is a boolean condition, and S , the body, is any statement in our language. The action A is
said to be enabled when the guard is true, disabled otherwise. If g is invariantly true, we
often write the action A simply as < S >.
We also use the following constructs:
Choice: The action A1 ] A2 tries to choose an enabled action from A1 and A2 , the
choice being nondeterministic when both are enabled.
Sequential composition: A1 A2 first behaves as A1 if this is enabled, then as A2
which can be enabled by A1 or by another action outside the sequential composition. Sequencing forces A1 and A2 to be exclusive.
k
Parallel composition: A1 A2 first behaves as the choice A1 ] A2 , then as A2 , if
A1 was selected, or as A1, if A2 was selected. Each action is executed once.
f i = 1::n : Ai ]g, where is either ], ;, or
An.
The scope of a constructor is indicated with parenthesis, for example A ((B ] C ) k D ).
Quantified instantiation: The notation
, is defined to be equivalent to A1
k
4
Action systems
An action system has the form:
A
sys (g ) ::
var l init g l := g0 l0
do A1 : : : Am od
j
j
]
where g , and l are lists of identifiers initialized to g0 and l0 , respectively. The identifiers
l are the local variables visible only within . The identifiers g, in turn, are the global
visible to other action systems as well. The local and global variables
variables of
are assumed to be distinct. The actions Ai of are allowed to refer to all of the state
variables consisting of the local and global variables.
The actions are considered atomic, i.e., if an action is selected for execution, it will
be completed without interference. Therefore two actions that do not have any read-write
conflicts can be executed in any order, or simultaneously. Hence, we can model parallel
programs with action systems taking the view of interleaving action executions.
A
A
A
Parallel composition
Consider two action systems
A and B
A
sys (gA ) ::
var lA init gA lA := gA0 lA0
do A1 : : : Am od
j
j
]
B
sys (gB ) ::
var lB init gB lB := gB 0 lB 0
do B1 : : : Bn od
j
j
]
\
\
where lA lB = , and the initializations of the global variables gA gB in the systems
and are consistent with each other. The parallel composition of and , denoted
, is the action system
A B
AkB
C
A
B
sys (gA
var lA
do (A1
j
]
j
od
gB ) ::
lB init gA gB lA lB := gA0 gB0 lA0 lB0
: : : Am)
(B1 : : : Bn )
]
Thus, parallel composition combines the state spaces of the constituent action systems
keeping the local variables lA and lB distinct. The reactive components and interact
with each other via the global variables that are referenced in both components. Termination of computation is a global property of the composed action system .
A
C
5
B
4 Specification of the analyzer
In this section, we present the formal specification of an asynchronous linear predictive
analysis system which uses the autocorrelation method with the Durbin’s recursion. Note
that the analyzer is viewed here as a standalone system with the output operations of its
own, but in a practical speech compression method, the analysis is a part of a bigger
concept.
The input for the analyzer is thought to be a continuous 8 kHz stream of speech
samples. The system outputs the windowed samples and the computed reflection coefficients for further processing.
System parameters In order to write the initial description for our linear predictive
analyzer, we must first select an appropriate window function w(n), window width N ,
and predictor order p.
For windowing we choose the commonly used Hamming window [17] :
w(n)
=
2n
0:56 ; 0:46 cos
N ;1
(15)
The frame size N is set to 256 which is a somewhat more challenging value than
160 of GSM [10]. Hence, the frame duration is 32 ms assuming that the sample
rate of the incoming speech sequence is 8 kHz.
The parameter p is set to 10 which can be considered an optimal value [11] and will
be used for example in the future GSM [9]. In the conventional GSM system [10]
the order of the predictor is only 8, but increasing it by 2 makes the analysis more
accurate and the quality of the synthesized speech better.
Formal specification
First we define a set of array types needed in the specification.
type sblk 0::255] kblk
:: rblk 0::10] ablk 0::19] : real
0 9]
The initial specification of the analyzer chip
parallel composition
A and its abstract environment E nv is the
A k E nv
where
where
A itself is a composition of three individual systems (see Fig. 1):
A=b I p k Lpa k Op
I
sys p (ip : chan si : real s : sblk) ::
var t : bool sa sb : sblk
init ip lpa t := ack ack false
do ; i = 0::255 :
< ip = req if t sai] := si ] t
j
]
j
;
od
f
]g
! : !
! sb i] := si fi ip := ack >
< lpa = ack ! if :t ! s := sa ] t ! s := sb fi t lpa := :t req >
6
A
ip
o1
lpa
Ip
si
s
Lpa
o2
k
op
Op
so
Env
Figure 1: Block diagram of the specification
L
sys pa (lpa o1 o2 : chan s : sblk k : kblk) ::
var R : rblk a : ablk E : real abase0 abase : int
init lpa o1 o2 := ack ack ack
do < lpa = req
WIN ACO >
; (< o1 := req > < DUR >)
; < o1 = ack
o2 := req >
; < o2 = ack
lpa := ack >
od
j
!
!
!
k
j
]
O
sys p (o1 o2 op : bool so : real
init o1 o2 op := ack ack
do < o1 = req
skip >
; ; i = 0::255 :
j
f
g
]
;
;
;
!
< so := si] op := req > ; < op = ack ! skip >
< o1 := ack >
< o2 = req ! skip >
f ; i = 0::9 :
< so := ki] op := req > ; < op = ack ! skip >
g
]
j
s : sblk k : kblk) ::
; < o2 := ack
od
>
]
with
b
WIN =
for j = 0 to 255 :
sj ] := sj ] (0:56 ; 0:46 cos( 2255j ))
b
ACO=
for i = 0 to 10 :
Ri] :=
P255;i sj ]sj + i]
j =0
7
b
DUR=
abase0 abase1 E := 0 10 R0]
for i = 0 to 9 :
Pi;1 aj + abase1]Ri ; j ])=E
( k i] := (Ri + 1] ;
j =0
ai + abase0] := ki]
for j = 0 to i ; 1 :
aj + abase0] := aj + abase1] ; ki]ai ; j + abase1]
E := (1 ; ki]2 )E
abase0 abase1 := abase1 abase0
)
The environment
E
E nv is given as
sys nv (ip op : chan si so : real) ::
var s : real
init ip op := ack ack
do < ip = ack
si := si0 :si0 real
j
]
]
j
od
!
2
< op = req ! s op := so ack >
ip := req >
The operation of the above composition is the following. The environment outputs
a sample si to by sending a request through the channel ip. The input unit p then
writes si into the array sa 0::255] (sb 0::255]) which models the first (second) input buffer, and sends an acknowledgement to nv . This procedure is performed 256 times to fill
the input buffer. Then p activates computation of the windowed samples and reflection
coefficients by sending a request to the computation unit pa through the channel lpa. At
the same time it switches the input buffer from sa (sb ) to sb (sa ) by toggling the auxiliary
boolean variable t and starts to receive the next 256-sample frame. Consequently, receiving a new frame sb (sa ) and processing the current frame sa (sb ) take place in parallel.
Because there is a continuous 8 kHz data stream at the input, the obvious real-time constraint is that pa must be idle and ready for a new round whenever p reaches the end
of a frame and wants to switch the input buffer. This happens every 32 ms.
When pa receives a request from p, it starts to compute linear predictive analysis
for the sample frame s which in fact is a copy of either sa or sb depending on the state
of the variable t in p. After the execution of the windowing and autocorrelation procedures WIN and ACO, the output operation of the 256 windowed samples in s is activated
by sending a request to the output unit p through the channel o1 in parallel with the
computation of the Durbin’s procedure DUR. When both of these operations have been
completed, pa sends a request to p through the channel o2 activating the output procedure of the 10 reflection coefficients k i]. p sends an acknowledgemet through o2
when this has been completed. Finally, after receiving an acknowledgement from p,
the computation unit pa sends an acknowledgement to p through lpa and is ready to
receive the next frame s.
The output unit p receives requests from pa through o1 and o2 . It sends, when
requested, the windowed samples s j ] and the reflection coefficients k i] to nv by communicating through the channel op. The output procedures use the variable so as the
common output buffer.
A
I
E
I
L
L
I
L
I
I
O
L
O
O
L
O
I
O
L
8
E
L
Note that the procedure DUR in pa is a modified version of the algorithm 14. The
main difference is that in DUR the need of storage has been minimized by (1) using a
single variable E instead of an array and (2) using the one-dimensional array a splitted
into two swapping segments instead of a two-dimensional array. Furthermore, because
only the reflection coefficients k are needed, the final for-loop of 14 is omitted. Also the
boundaries of the iteration counter i have been changed for convenience. In DUR we have
i = 0::9 instead of i = 1::10 of the procedure 14.
5 Decomposition
The initial specification given in the previous section is stepwise refined into a parallel
composition of more detailed and dedicated functional units. The control and data paths
are separated from each other in this process. The data path components that are extracted
include memory resources, a set of registers, and an arithmetic unit containing all basic
functions, i.e., multiplication, division, addition, and substraction. The control path, in
turn, contains a set of controllers responsible of executing the involved algorithms operating the data path components by asynchronous communication. The block diagram of
the final composition is shown in Fig. 2, where each block represents an action system, or
a parallel composition of several subsystems.
The refinement flow resembles the one presented for a pipelined microprocessor in
our previous work [13]. The reason for this is that we have chosen here an implementation, where the windowing, autocorrelation, and Durbin’s algorithms are encoded into a
program ROM rather than into a Tangram-style handshake logic [5] directly which would
be possible in principle. For this, a 2-stage pipeline (fetch, execute) is constructed. Actually, the derivation is now quite straightforward, because we don’t have any pipeline
hazard situations to deal with as we did in the 5-stage pipeline derivation in [13].
The structure and operation of the composition in Fig. 2, as well as its circuit implementation, are discussed in the following sections 6 and 7.
6 Implementation
In this section, we explain how the system in Fig. 2 works, and how it is implemented as
a digital circuit.
The circuit implementation uses the 4-phase handshake protocol on the communication channels. Therefore, each channel variable of Fig. 2 must first be expanded by (at
least) two boolean variables implementing the request and acknowledgement signals (req ,
ack). This transformation, also known as the handshake expansion [12], can be performed
within the refinement calculus by using an appropriate abstraction relation [15] such as
c req
( =
reqc ^ :ackc ) ^ (c = ack
:reqc _ ackc )
where c denotes any communication variable in the composition in question. However,
the details of this transformation process are out of the scope of this paper. Hence, we
give below the resulting diagrams directly, without presenting any formal proofs.
9
o2
o1
so
op
Op
mem_op
Env
CONST
JUMP
Launch
Comparator
oc
cc
cmpr
bsy
Fetch_Ctrl
fc
Exec_Ctrl
ec
STOP
EOC
pc
preg
prom
C
O
N
S
T
P
r
o
m
Pc
Pc
Block
ac
Addr_Ctrl
SELCMP
Comp_Ctrl
fwd
*
+/-
/
REG
CMP
P
r
e
g
x
mem
/
+/-
lpa
CMP
CONST
rstpc
Lpa
Lpa
SELJ
MUX
Ram2
Rom
Env
ip
R a m 1a
Ip
s
mem
commands/
control
address
R a m 1b
data
si
channels
Figure 2: Final block diagram of the analyzer
10
reg
6.1 Input and output units
I
The input unit p, depicted in Fig. 3, contains two input buffers, the 256-word memory
blocks am1a and am1b, corresponding to the arrays sa and sb of the initial specification in Sec. 4. p has own counter for address generation and a toggle mechanism for the
switching between the memory blocks. The idea is that when one buffer is being filled
during a 32 ms time frame, the other can be read and written freely by the system pa or
p which contain the address counters of their own.
R
I
R
L
O
Figure 3: Block diagram of
O
Ip
L
The output unit p awaits requests from the analysis unit pa via two separate communication channels as already explained in Sec. 4. The first request activates the output
procedure of the windowed samples in am1a or am1b. The second request, in turn,
activates the output procedure of the reflection coefficients computed in pa.
R
R
L
6.2 Analysis unit
L
The system pa carries out the computation of linear predictive analysis on a 256-sample
frame. The involved algorithms: windowing, autocorrelation, and Durbin, are encoded
into the 46 38-bit program ROM block rom who gets the addresses from the loadable
P
11
P
P
6-bit program counter unit c. The memory structure of rom is shown in Fig. 4, and
the elements of a 38-bit instruction word in Table 1. In Sec. 7, the program code for each
algorithm is presented in more detail.
38 bits
8
WIN
14
ACO
24
0
8
EOC
22
STOP
45
DUR
Figure 4: Memory allocation of
P rom
Table 1: Elements of a program instruction
Name
Bits
EA
LCA
UDA
EB
LCB
UDB
ETOG
ECB
SELCB
ERA
SELFA
SELA
ERO
EMEM
1
1
1
1
1
1
1
1
2
1
1
2
1
1
Description
Enable addr. cntr A
Load/count addr. cntr A
Inc/dec addr. cntr A
Enable addr. cntr B
Load/count addr. cntr B
Inc/dec addr. cntr B
Enable RAM2 toggle
Enable base addr. reg
Select base addr.
Enable offset reg.
Select offset adder func.
Select offset reg. input
Enable addr. reg.
Enable memory
L
E
Name
Bits
SELRAM
RW
SELROM
EADD
SELF
EMUL
EDIV
EREG
SELM
CMP
SELCMP
SELJ
EOC
STOP
Total
1
1
1
1
1
1
1
4
3
1
2
3
1
1
38
Description
Select RAM block
Read/write RAM
Select ROM block
Enable adder
Select adder func.
Enable multiplier
Enable divider
Enable regs
Select reg. inputs
Enable comparator
Select cmp. inputs
Select jump addr.
Enable output control
Stop computation
Main controllers The system pa contains five control blocks. The main controllers
are: aunch, etch Ctrl, and xec Ctrl. Their job is to control overall program flow.
The two other controllers, ddr Ctrl and Comp Ctrl, are slaves of xec Ctrl driving
the data path components by handshake channels.
Some of the building blocks needed in the circuit implementations of the controllers
are introduced below in Fig. 5. The well-known C-element is not depicted, but the different asymmetric C-elements are shown, mainly because of their non-standard symbols.
L
F
A
E
12
The E- and R-elements are left-right devices which synchronize two 4-phase handshake
cycles in certain ways. The E-element enhances the involved cycles by partly parallelizing their down-going parts. The R-element, in turn, releases the left-channel cycle to
continue immediately after the right-channel request has been sent.
E-element
l1
R-element
r1
l1
r1
C+
l2
C+
r2
l1
Symbol:
l2
r2
l1
r1
E
l2
r2
Up-asymmetric C-element
Symbol:
l2
r1
R
r2
Down-asymmetric C-element
a
a
c
c
b
b
a
C+ c
Symbol:
a
Symbol:
b
C-
c
b
Figure 5: Circuit elements
The formal action system specifications of the three main controllers are given below.
The corresponding circuit diagrams are shown in Figs 6 – 8.
L
sys aunch (lpa pc fc oc o1 o2 : chan rstpc : bool) ::
init lpa pc mc o2 rstpc := ack ack ack ack false
do < lpa = req
rstpc rstpc fc := true req >
; < oc = req
o1 oc := req ack >
; < fc = ack o1 = ack
o2 := req >
; < o2 = ack
lpa := ack >
od
j
j
^:
!
^
!
!
!
]
F
sys etch Ctrl (fc pc prom ec oc : chan STOP JMP EOC : bool) ::
init fc pc prom ec oc := ack ack ack ack ack
do (< fc = req ec = ack oc = ack
(STOP
JMP)
pc := req >
; < pc = ack
prom := req >
; < prom = ack
bsy
if EOC
oc := req ] EOC skip fi ec := req >)
] < fc = req ec = ack oc = ack
bsy STOP JMP fc := ack
od
j
^
^
j
!
^
^:
!
^:
!
^
^:
:
!
^: ^
!
^:
!
>
]
E
sys xec Ctrl (ec preg cmpr ac cc : chan CMP bsy : bool) ::
init ec preg cmpr ac cc := ack ack ack ack ack
do < ec = req
preg := req >
; < preg = ack
if CMP
cmpr := req ] CMP skip fi ac cc
; < cmpr = ack bsy
ec := ack >
; < ac = ack cc = ack
bsy := false >
od
j
!
!
!
^ !
^
!
:
j
]
13
!
bsy := req req
true
>
acklpa
req lpa
acko2
req o2
acko1
req o1
E
R
E
R
rstpc rstflag req fc ackfc
ackoc req oc
Figure 6: Circuit diagram of
req fc
ackfc
STOP JUMP
ackoc
Launch
req oc
C
C
ackec
req ec
C-
bsy
C-
E
req pc ackpc
E
req prom ackprom
Figure 7: Circuit diagram of
EOC
F etch Ctrl
Launch is the topmost controller in Lpa. It first enables initialization of P c by setting
the resettable flag rstpc and activates then the fetch controller F etch Ctrl. After this, it
awaits requests from F etch Ctrl to start the output unit O p two separate times.
The 2-stage pipeline operation, containing the fetch and execute phases, is realized
by one pipeline register reg and two dedicated controllers etch Ctrl and xec Ctrl.
etch Ctrl, driven by aunch, takes care of sequentially activating c, fetching an
instruction from rom, and then starting the execution controller xec Ctrl. In the
first round, when the flag rstpc has been set to true by aunch, c and the flag rstpc are
resetted. Otherwise c is either incremented or loaded with a jump address. Furthermore,
F
P
L
P
F
L
P
14
P
E
P
E
ackcmpr
ackec
req ec
req cmpr
req ac
C
E
ackac
R
C
bsy
req preg
CMP
ackpreg
Figure 8: Circuit diagram of
F
ackcc
req cc
E xec Ctrl
L
if the instruction bit EOC is true, etch Ctrl commands aunch to activate the output
procedure of the windowed samples in am1a or am1b. This happens in parallel
with the regular activation of xec Ctrl. The continuous fetch process is stopped, when
the instruction bit STOP in reg is true, the flag JUMP has been set to false by the
comparator, and xec Ctrl is nomore busy.
xec Ctrl first loads the output of rom into reg and activates then in parallel
the address and computation controllers ddr Ctrl and omp Ctrl sending an acknowledgement back to etch Ctrl which can then start the next instruction fetch. If the
instruction bit CMP is true, the comparator, which sets the flag JUMP according to the
result of the comparison between two values selected by dedicated bits of the instruction,
is activated as well. In this case, the acknowledgement to etch Ctrl is postponed until
the comparison has been completed. If JUMP is set to true by the comparator, a jump
address is loaded into c in the next fetch cycle. Otherwise c is incremented normally.
E
P
E
E
F
R
R
P
A
P
C
F
P
P
A
C
Address and computation control Both ddr Ctrl and omp Ctrl are operated by
a set of instruction bits stored in reg . These control bits determine which handshake
cycles are generated by the controllers. If a bit is true, the involved controller activates
the corresponding communication cycle. If a bit is false, the handshake in question is
skipped and an immediate response is given.
The action system specifications of these controllers are given below. Note that the
address controller is presented as the parallel composition of the control and data parts
ddr Ctrl:c and ddr Ctrl:d. The block diagram of ddr Ctrl and the circuit diagram of omp Ctrl are shown in Figs 9 and 10, respectively.
We have that
P
A
C
A
A
Addr Ctrl =b Addr Ctrl:c k Addr Ctrl:d
where
15
sys
j
Addr Ctrl:c (ac toggle cnta cntb cb rega rego ram fwd : chan
ETOG EA EB ECB ERA ERO EMEM : bool) ::
var b : bool
init ac toggle cnta cntb cb rega rego ram fwd b := ack
do (< ac = req
skip >
; (< if ETOG
toggle := req ] ETOG skip fi >
< if EA cnta := req ] EA skip fi >
< if EB cntb := req ] EB skip fi >)
; < toggle = ack cnta = ack cntb = ack
skip >
; (< if ECB
cb := req ] ECB skip fi >
< if ERA rega := req ] ERA skip fi >)
; < cb = ack rega = ack
b := true >)
]
: : : ack
!
!
:
!
k
!
: !
k
!
: !
^
^
!
!
:
!
k
!
:
!
^
!
((< ac = req ^ :ECB ^ :ERA) _ b !
if ERO ! rego := req ] :ERO !
fi >
; < rego = ack ! if EMEM ! ram := req ] :EMEM !
; < ram = ack ! fwd := req >
; < fwd = ack ^ b ! ac b := ack false >)
false
skip
]
j
od
sys
Addr Ctrl:d (toggle cnta cntb cb rega rego : chan
skip
fi >
CLA UDA CLB UDB SELFA : bool
SELCB SELA i REGA ADR : int) ::
j var t : bool CB : int
init toggle cnta cntb cb rega rego t := ack : : : ack false
do < toggle = req ! t := :t >
] < cnta = req !
if CLA ^ UD ! i := i + 1
] CLA ^ :UD ! i := i ; 1
] :CLA ! i := 0
fi cnta := ack >
] < cntb = req !
if CLB ^ UDB ! j := j + 1
] CLB ^ :UDB ! j := j ; 1
] :CLB ! j := 0
fi cntb := ack >
] < cb = req !
if SELCB = 0 ^ :t ! CB := 0
] SELCB = 0 ^ t ! CB := 1
] SELCB = 1 ! CB := 2
] SELCB = 2 ! CB := 3
fi cb := ack >
] < rega = req !
if SELA = 0 ! REGA := i
] SELA = 1 ! REGA := j
] SELA = 2 !
if SELFA ! REGA := i + j ] :SELFA ! REGA := i ; j fi
fi rega := ack >
8
] < rego = req ! ADR rego := (2 cbase) + REGA ack >
]
j
od
and
16
sys
j
C omp Ctrl (cc fwd mul div add reg : chan
EMUL EDIV EADD EREG0 EREG1 EREG2 EREG3 : bool) ::
init cc fwd mul div add reg := ack : : : ack
do < cc = req
skip >
; (< if EMUL
mul := req ] EMUL skip fi >
< if EDIV div := req ] EDIV skip fi >
< if EADD add := req ] EADD skip fi >)
; < fwd = req mul = ack div = ack add = ack
skip >
; (< if EMUL
regy := req ] EMUL skip fi >
< if EREG0 rege := req ] EREG0 skip >
< if EREG1 regs := req ] EREG1 skip >
< if EREG2 regx := req ] EREG2 skip >
< if EREG3 regz := req ] EREG3 skip >)
; < regy = ack rege = ack regs = ack regx = ack regz = ack
!
!
!
!
^
!
!
!
!
!
^
k
k
k
k
k
k
^
:
:
:
^
:
:
:
:
:
cc fwd := ack ack >
j
!
!
!
^
!
!
!
!
!
^
!
^
!
od
]
Addr Ctrl contains two 8-bit address counters and an adder-subtractor. Furthermore,
it has three registers for the base address (2 bits), offset (8 bits), and effective address
(10 bits). It generates the memory addresses and controls access to the memory blocks
am1a or am1b, am2, and om required by the three computation phases. The
address bit configuration for each memory block is presented in Fig. 11. The static mode
and selection bits for these memory units come from reg , but the operation requests
are sent by ddr Ctrl. An address can be generated either by a single instruction or
by several separate instructions piece by piece. This makes it possible to compute a new
address in parallel with accessing memories with the current address.
Note that the outputs of the other address counter and the offset register are also used
by the comparator in program jump control.
omp Ctrl drives the functional unit composed of the register bank and the combinational arithmetic units, i.e., multiplier, divider, and adder-subtractor. The register
bank contains five individual data registers used as input and output buffers by the arithmetic units and memory blocks. It can be loaded simultaneously from two different resources. Again, the static selection bits for the registers and required multiplexers come
from reg , but the operations are activated by the requests sent by omp Ctrl. The
functional unit is depicted in Fig. 12.
R
R
R
R
P
A
C
P
C
R
R
Data memory usage The windowing algorithm WIN uses am1a or am1b for both
input and output. Furthermore, om contains the 256 window coefficients (Eq. 15) by
which the original samples are multiplied. The autocorrelation phase ACO, in turn, reads
the windowed samples from am1a or am1b and writes the resulting 11 autocorrelation coefficients into am2. The Durbin’s recursion DUR uses the 41-word am2 for
input and output. This memory block contains, as shown in Fig. 13, two swapping 10word blocks a0 and a1 for the intermediate results, the final 10 reflection coefficients ki ,
and the mentioned autocorrelation coefficients Ri .
Observe that because of the addressing scheme selected for am2, where the 2-bit
base address and the corresponding 4-bit offset are concatenated as shown in Fig. 11, the
R
R
R
R
R
R
17
CONST
0
0
T
SELCB1
SELCB0
t
ac
EA
EB
ECB
ERA
ERO
ETOG
EMEM
ram
COUNTER i
cnta
A
d
d
r_
C
t
r
l.
c
0
1
UDB, CLB
UDA, CLA
COUNTER j
SELCB1
MUXT
MUX1
MUX0
cntb
toggle
cb
BASE ( CB )
fwd
SELFA
+/-
A
rega
SELA
MUXA
rego
REGA
REGA
ADR
Address_Ctrl.d
ADR
Figure 9: Block diagram of
18
Addr Ctrl
ackfwd
req fwd
req mul
req cc
ackcc
E
ackmul
EMUL
req div
EDIV
E
ackdiv
req add
EADD
E
ackadd
C
C
req regy
E
ackregy
C
req rege
EREG 0
E
ackrege
req regs
EREG 1
E
ackregs
req regx
EREG 2
E
ackregx
req regz
EREG 3
E
Figure 10: Circuit diagram of
19
C omp Ctrl
ackregz
CB
REGA
2
8
ADR
8
to
6
Ram1a/Ram1b/Rom to Ram2
Figure 11: Address bits
Figure 12: Block diagram of the functional unit
20
0
9
16
a0
10
25 32
a1
10
41 48
58
ki
Ri
10
11
Figure 13: Memory allocation of
Ram2
R
memory unit am2 contains actually more storage capacity than can be used. In other
words, we have a 59-word memory array of which only 41 elements are effectively in
use. We have preferred this straightforward approach even though it yields some unusable
memory locations. A memory-saving but clumsier alternative would be to generate the
effective address for am2 by summing the base and offset.
R
7 Program ROM codes
In this section, the PROM implementations of the procedures WIN, ACO, and DUR are
presented. We show the operations included in each instruction. The following variables
and special notations are used:
REGS :
REGX :
REGY :
REGZ :
REGE :
i, j :
t:
REGA:
CB:
ADR:
RAM1 :
RAM2 :
ROM :
x y := z :
b value :
goto l :
b1jb2
0
data register in the register bank
data register in the register bank
data register in the register bank
data register in the register bank
data register in the register bank
address counters in ddr Ctrl
toggling variable of the type bit in ddr Ctrl
(initialized to ’0’ in the initial system reset)
address offset register in ddr Ctrl
address base register in ddr Ctrl
effective address register in ddr Ctrl
RAM array in am1 of p
RAM array in am2 of pa
ROM array in om of pa (window coefficients)
A
A
A
A
R
R
R
I
L
L
A
assign the value of z to both x and y
define ’value’ to be a binary number
jump to the program line l
concatenate bit vectors b1 and b2 so that
b1 (b2) is the most (least) significant block
21
eoc:
stop :
enable ouput process of the windowed samples
exit the program
The mappings of the algorithms are shown in Tables 2– 4. The first column contains the PROM addresses and the second column the contents of the instructions. Each
instruction is a 38-bit word composed of the elements listed in Table 1.
Table 2: Mapping of the windowing procedure
Offset
0
1
2
3
4
5
6
7
8
Instruction
i j := 0 REGA := j ADR := REGA REGS REGE := RAM 1ADR] ROM ADR]
REGZ := REGE
REGX := REGS REGZ
k (i := i + 1 REGA := i ADR := REGA REGE := ROM ADR])
REGS REGA := RAM 1ADR] j
REGZ := REGE k (i ADR := i + 1 REGA RAM 1ADR] := REGX )
REGX := REGS REGZ
k (j := j + 1 REGA := i ADR := REGA REGE := ROM ADR])
REGS := RAM 1ADR] k (i := (i + 1) mod 256 REGA := j ADR := REGA)
REGZ RAM 1(ADR) := REGE REGX
if 1 < i ! goto 5 ] 1 i ! skip fi
“First line of ACO”
Table 3: Mapping of the autocorrelation procedure
Offset
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Instruction
i j := 0 CB REGA := 0b11 j ADR := REGA REGS REGZ := RAM 1ADR]
REGX := REGS REGZ k (j := (j + 1) mod 256 REGA := j ADR := REGA)
REGS REGZ := RAM 1ADR]
REGY := REGS REGZ k (j := j + 1 REGA := j ADR := REGA)
REGX := REGX + REGY k REGS REGZ := RAM 1ADR]
k if 1 < REGA ! goto 11 ] 1 REGA ! skip fi
REGA := i ADR := REGA RAM 2ADR] := REGX
i j := i + 1 0 REGA := j + i ADR := REGA REGZ := RAM 1ADR]
REGA := j ADR := REGA REGS := RAM 1ADR]
REGX := REGS REGZ k (j := j + 1 REGA := j + i ADR := REGA)
REGZ := RAM 1ADR]
REGA := j ADR := REGA REGS := RAM 1ADR]
REGY := REGS REGZ k (j := j + 1 REGA := (j + i) mod 256 ADR := REGA)
REGX REGZ := REGX + REGY RAM 1ADR]
k if 1 < REGA ! goto 18 ] 1 REGA ! skip fi
(REGA := i ADR := REGA RAM 2ADR] := REGX )
k if i < 10 ! goto 14 ] i 10 ! skip fi
“First line of DUR”
22
Table 4: Mapping of the Durbin’s procedure
Offset
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Instruction
k (i j := 0 CB REGA := 0b11 i ADR := CB jREGA REGE := RAM 2ADR])
j := j + 1 REGA := j ADR := CB jREGA REGX REGS := RAM 2ADR]
eoc
REGX REGE := REGX=REGE
k (CB REGA := 0b10 i ADR := CB jREGA)
RAM 2ADR] REGZ CB := REGX REGE 0b0jt
REGY := REGS REGZ k (ADR := CB jREGA RAM 2ADR] := REGX )
CB REGA := 0b11 i ADR := CB jREGA REGX := RAM 2ADR]
REGE := REGX ; REGY k (i j := i + 1 j + 1 REGA := j ADR := CB jREGA)
REGX j := RAM 2ADR] 0
CB REGA := 0b0jt j ADR := CB jREGA REGS := RAM 2ADR]
CB REGA := 0b11 i ; j ADR := CB jREGA REGZ := RAM 2ADR]
REGY := REGS REGZ k (j := j + 1 REGA := j )
REGX := REGX ; REGY
k if REGA < i ! goto 30 ] REGA i ! skip fi
REGX := REGX=REGE
t j := :t j ; 1 CB REGA := 0b0jt j
ADR := CB jREGA RAM 2ADR] := REGX
j := 0 CB REGA := 0b10 i ADR := CB jREGA RAM 2ADR] := REGX
t := :t CB REGA := 0b0jt i ; j ADR := CB jREGA REGS := RAM 2ADR]
REGY := REGS REGZ
k (REGA := j ADR := CB jREGA REGX := RAM 2ADR])
REGX := REGX ; REGY k (t j := :t j + 1 REGA := j )
RAM 2ADR] := REGX
k if REGA < i ! goto 37 ] REGA i ! skip fi
REGX := 1:0
k (i := i + 1 CB REGA := 0b10 j ADR := CB jREGA REGS REGZ := RAM 2ADR])
REGY := REGS REGZ k (i := i + 1 CB REGA := 0b11 i ADR := CB jREGA)
REGS i := REGX ; REGY i ; 1
REGZ := REGE
REGE := REGS REGZ
k if i < 10 ! goto 29 ] i 10 ! stop fi
23
Sample program lines
line 31 :
line 32 :
line 33 :
As an example consider the program lines 31 – 33 in Table 4:
CB REGA := 0b11 i ; j ADR := CB jREGA REGZ := RAM 2ADR]
REGY := REGS REGZ k (j := j + 1 REGA := j )
REGX := REGX ; REGY
k if REGA < i ! goto 30 ] REGA > i ! skip fi
The bit configurations of these three instructions are is shown in Table 5, where the most
essential bit values of each instruction are printed in bold, and ’X’ denotes a ’don’t care’value.
Table 5: Bit-map of the program lines 31 – 33
Bit
EAC
LCA
UCA
EB
LCB
UDB
ETOG
ECB
SELCB0
SELCB1
ERA
SELFA
SELA0
31
0
X
X
0
X
X
0
1
0
1
1
0
0
32
0
X
X
1
0
1
0
0
X
X
1
X
1
33
0
X
X
0
X
X
0
0
X
X
0
X
X
Bit
SELA1
ERO
EMEM
SELRAM
RW
SELROM
EADD
SELF
EMUL
EDIV
EREG0
EREG1
EREG2
31
1
1
1
1
1
0
0
X
0
0
0
0
0
32
0
0
0
X
X
X
0
X
1
0
0
0
0
33
X
0
0
X
X
X
1
0
0
0
0
0
1
Bit
EREG3
SELM0
SELM1
SELM2
CMP
SELCMP0
SELCMP1
SELJ0
SELJ1
SELJ2
EOC
STOP
31
1
0
1
X
0
X
X
X
X
X
0
0
32
0
X
X
X
0
X
X
X
X
X
0
0
33
0
0
0
0
1
0
1
0
0
1
0
0
The meaning of the bit configurations in Table 5 is the following.
j
line 31: We have that ECB = 0b1 and the selector SELCB1 SELCB0 is 0b10. This
indicates that the constant 0b11 is loaded into the address base register CB of
ddr Ctrl. Since ERA = 0b1, the offset register REGA is loaded in parallel with the loading of CB . Because the function selector SELFA is 0b0, and
the input selector SELA1 SELA0 of REGA is 0b10, the adder-subtractor of
ddr Ctrl is activated in the subtraction mode and the difference i j is assigned to REGA. After these parallel register assignments, the address generation
process is completed by loading the concatenation of CB and REGA into the
effective address register ADR enabled by the bit ERO = 0b1. Then, because
EMEM = 0b1 indicating a memory access operation, and because the RAM
block selector SELRAM and the mode selector RW are 0b1 and 0b0, respectively, the memory array RAM 2 of the unit am2 is read using the value of ADR
as the address. The result of this memory fetch operation is stored into the register
REGZ of the register bank, since EREG3 = 0b1 (’enable REGZ ’) and the input
selector SELM2 SELM1 SELM0 of the register bank has the value 0bX10 (’pass
RAM 2 to REGS and REGZ ’).
A
j
A
;
R
j
j
24
A
line 32: Now EB = 0b1 indicating that the counter j of ddr Ctrl is activated. Because
the mode selector LCB UDB is 0b01 (’count up’), an incrementation takes place.
The new value of j is stored into the address offset register REGA controlled by
the enabling bit ERA = 0b1 and the input selector SELA1 SELA0 = 0b01.
Since EMUL = 0b1, the multliplier is activated in parallel with the operations in
the address controller. The registers REGS and REGZ of the register bank are
the operands of the multiplication. The result of the multiplication, in turn, is stored
into the register REGY by default. Basically, also the register REGX or REGZ
could be used as the output buffer by setting the bit EREG2 or EREG3 to 0b1 and
the input selector SELM2 SELM1 SELM0 to 0b001.
j
j
j
j
line 33: Since EADD = 0b1, and the mode selector SELF = 0b0, the adder-subtractor
of the functional unit of the system pa is activated in the subtraction mode. The
registers REGX and REGY of the register bank are the operands of the subtraction.
L
The comparison bit CMP is 0b1 indicating that an if-statement is executed. This
takes place in parallel with the subtraction described above. The comparator input
selector SELCMP1 SELCMP0 is 0b10, which means that the contents of the
address offset register REGA of ddr Ctrl and the value of the address counter
i is compared. If REGA < i holds, the comparator sets the flag JUMP to 0b1.
Because the jump address selector SELJ2 SELJ1 SELJ0 has the value 0b100,
the constant 30 is loaded into the program counter c, if the flag JUMP was set.
If it was not set, i.e., if the the comparator found the value of REGA to be greater
than or equal to i, the program counter is not loaded but incremented normally.
j
A
j
j
P
8 System performance
L
From the point of view of the involved algorithms themselves, the system pa is completely sequential even though it would be quite possible to process the component procedures concurrently in principle. The sequential approach makes low power consumption with a moderate area cost possible, because the system architecture is relatively
simple, i.e., it does not contain heavy pipelining or very complex control logic and arbitration.
However, in addition to the pipelined fetch-execute process, the system is capable of
performing parallel instruction execution-level operations as well. For example, computing the next memory address, accessing the memory with the current address, and computing some arithmetic operation can all take place in parallel. The comparator, in turn,
operates in parallel with the controllers ddr Ctrl and omp Ctrl. Furthermore, the
register bank has two multiplexers at its input, which makes simultaneous loading of two
distinct registers possible. The input values can come either from the same or different
sources.
The above features require a wide instruction word (38 bits, see Table 1), but they
considerably improve performance. In fact, we could decrease the progran ROM width by
A
25
C
coding the instructions more efficiently and designing a dedicated instruction decoder for
them. However, we have not considered this an advantageous trade-off. Instead we have
preferred a wider instruction word with minimal coding, because this makes instruction
handling straightforward and efficient.
Performance estimation Below we present a very simple performance estimation for
the analysis unit pa based on the timing parameters specified in the ES2 0.7 m CMOS
process data sheets [8]. For this we must naturally fix the word size of the data variables
which were viewed in the above formal descriptions as real-type entities with infinite
precision. The word size plays an essential role also in the filter stability issues. Here we
assume that the input and output values of the system pa are 16-bit entities, while the
computations within pa use 32-bit arithmetics.
We get for the worst case data path delay of pa about 0.533 ms. The autocorrelation
algorithm clearly dominates: it alone covers about 86.3 % (0.460 ms) of the mentioned
0.533 ms. The windowing algorithm takes 8.8 % (0.047 ms), and the Durbin’s procedure
4.9 % (0.026 ms) of the total time.
The delay value 0.533 ms has been computed by assuming that the down-going parts
of the 4-phase handshake cycles are interleaved in such a way that they don’t effectively
take any extra time in the data path. This is achieved by using E-elements in the control
logic (see Figs 6 – 8, 10, and 5) and asymmetric matched delays in the data path. Furthermore, we have taken into account that the program ROM fetch delay, which is about
50 ns/instruction, limits the maximum operation speed to 20 MIPS. The system cannot
operate faster than this, no matter how quickly an instruction is actually executed.
The control logic has an intrinsic delay which should be added to the data path delay.
We can roughly estimate that the total figure (data path delay + control path delay) is not
more than 1 ms, which means that the potential throughput of pa is about 256 kword/s.
Because the input stream is only 8 kword/s, there is a large enough margin to execute
the other procedures of a speech compression system, including for example interpolation, filtering of the windowed samples, inverse filtering, and encoding of the reflection
coefficients and the residual [17].
Taking into account that a new sample frame is processed every 32 ms, the system
pa is idle for approximately 31 ms per frame. This fact, together with the properties of
the asynchronous operation mode itsef, indicates a potential for low power consumption.
L
L
L
L
L
L
9 Conclusions
We have presented here an asynchronous processor-like implementation of the linear
predictive analysis algorithm. We used the action system formalism as the correctnesspreserving specification and development tool.
The resulting parallel composition of action systems with the separate control and data
flows enabled us to map the three involved algorithms, i.e., windowing, autocorellation,
and Durbin’s recursion, onto a single hardware implementation. Although only some of
the details were presented, the correspondence between the action system description and
26
its circuit implementation is clear. The PROM-based architecture we have selected has a
good performance and is relatively easy to manage: the system can be provided with a
new algorithm by expanding the PROM and adding, if needed, RAM resources for data
and constants for the arithmetic units, comparator, and program counter. The controllers
can basically remain the same.
The emphasis was in the control logic. We will continue the work by concentrating
on the practical details of data path design. This includes for example the actual delaymatching and the real number representation and precision issues. Then we are able to
carry out a more accurate performance and power consumption analysis.
References
[1] R. J. R. Back. On the Correctness of Refinement Steps in Program Development.
PhD thesis, Department of Computer Science, University of Helsinki, Helsinki, Finland, 1978. Report A–1978–4.
[2] R. J. R. Back and R. Kurki-Suonio. Decentralization of process nets with centralized control. In Proc. of the 2nd ACM SIGACT–SIGOPS Symp. on Principles of
Distributed Computing, pages 131–142, 1983.
[3] R.J.R. Back and K. Sere. Stepwise refinement of action systems. Structured Programming, 12:17-30, 1991.
[4] R. J. R. Back and J. von Wright. Refinement calculus, part I: Sequential nondeterministic programs. In J. W. de Bakker, W.–P. de Roever, and G. Rozenberg,
editors, Stepwise Refinement of Distributed Systems: Models, Formalisms, Correctness. Proceedings. 1989, volume 430 of Lecture Notes in Computer Science, pages
42–66. Springer–Verlag, 1990.
[5] K. van Berkel. Handshake Circuits: an Asynchronous Architecture for VLSI Programming. International Series on Parallel Computation, Cambridge University
Press, 1993.
[6] A. Davis and S.M. Nowick. Asynchronous circuit design: motivation, background
and methods. In G. Birtwistle and A. Davis, editors, Asynchronous Digital Circuit
Design, pages 1-49. Springer, 1995.
[7] E. W. Dijkstra. A Discipline of Programming. Prentice–Hall International, 1976.
[8] ES2 0.7 m CMOS. Technology and design kit documentation, Europractise, 1996.
[9] GSM Enhanced Full Rate (EFR) 06.10, Version 0.2
[10] GSM Recommendation 06.10: Full Rate Speech Encoding and Decoding
[11] J. Makhoul. Linear prediction: a tutorial review. In Proc of the IEEE, 63(4):561 –
580, 1975
27
[12] A. J. Martin. Compiling communicating processes into delay-insensitive VLSI circuits. Distributed computing, 1:226–234, 1986.
[13] J. Plosila and K. Sere. Action systems in pipelined processor design. In Proc. of the
3rd Int. Symp. on Advanced Research in Asynchronous Circuits and Systems, pages
156 – 166, 1997.
[14] J. Plosila, R.Rukšėnas, and K. Sere. Delay-Insensitive Circuits and Action Systems.
TUCS Technical Report No 60, November 1996.
[15] J. Plosila, R.Rukšėnas, and K. Sere.
Manuscript, 1997.
Action Systems Synthesis of DI Circuits.
[16] L.R. Rabiner, R.W, Schafer. Digital Processing of Speech Signals. Prentice-Hall,
1978.
[17] R. Steele, editor. Mobile Radio Communications, Pentech-Press, 1992.
28
Turku Centre for Computer Science
Lemminkäisenkatu 14
FIN-20520 Turku
Finland
http://www.tucs.abo.fi
University of Turku
Department of Mathematical Sciences
Åbo Akademi University
Department of Computer Science
Institute for Advanced Management Systems Research
Turku School of Economics and Business Administration
Institute of Information Systems Science
View publication stats