Verilog HDL: Top.v Cpu.v Ram.v Io.v
Verilog HDL: Top.v Cpu.v Ram.v Io.v
Verilog HDL: Top.v Cpu.v Ram.v Io.v
top.v
author:
Dimitrije Janković
djankov@galeb.etf.bg.ac.yu
Introduction
2 / 126
Historical Facts
• Gateway Design Automation, 1983.
Proprietary language for mixed level design
representations
• Close partnerships, 1987-1989
Motorola, National and UTMC
• Synopsis, 1987.
Verilog based synthesis technology
• Cadence, 1990. – OVI
• IEEE 1364-1995 (last rev. 2001)
IEEE Standard Verilog® Hardware Description Language
3 / 126
Popularity of Verilog HDL
• General purpose HDL
Similar in syntax to C programming language
• Different levels of abstraction
RTL, behavioral, gate and switch levels
• Variety of tools and libraries
Postlogic synthesis simulation
• Programming Language Interface
Customizable simulation environment
4 / 126
Part 1
5/
General Idea…
Module instantiation
Design Block
(Model Under Test)
6 / 126
Data Types…
• Value set
0, 1, x, z
• Strength values
supply, strong, pull, large, weak, medium, small, highz
strong0
strong0
weak1
7 / 126
…more data types…
• Variables
reg signed a; // signed is optional
integer b;
time t;
realtime rt;
real r;
• Vectors
wire[7:0] bus; // 8-bit bus
reg[31:0] pc; // 32-bit register
• Arrays
reg arB[7:0][0:255]; // 2-dimensional array
reg[7:0] memA[0:1023]; // 1K 8-bit words
wire waC[1:20]; // array of wires
8 / 126
…and more data types
• Parameters
parameter msb = 7; // declares msb as constant value 7
parameter r = 5.7; // declares r as real parameter
parameter sr = (r + f) / 2;
localparam a = 1;
specparam t_Rise_clk_q = 150;
• Strings
reg[8*11:0] str; // enough room for 11 characters
…
str = “Hello world”;
• Special characters
\n, \t, %%, \\, \”, \ooo
9 / 126
Operators…
• {} {{}} concatenation, replication
{a, b[3:0]} // {a, b[3], b[2], b[1], b[0]}
{4{w}} // equivalent to {w, w, w, w}
• + - * / ** arithmetic
integer intA;
reg[15:0] regA;
intA = -4’d12; // format is [-][[<bit_width>]’<base_format>]<value>
// <base_format> can be b, d, h, o
regA = intA / 3; // regA is 65532
• % modulus
• > >= < <= == != relational, equality, inequality
10 / 126
…more operators
• && || ! logical
• & | ~ ^ ^~ bit-wise
• & ~& | ~| ^ ~^ reduction
a = 4’b1010;
b = &a; // reduction and, ands all the bits of a
• << >> <<< >>> shift logical, arithmetic
reg signed [3:0] start, result;
start = 4’b1000;
result = start >>> 1;
• ? : conditional
min = (a > b) ? b : a;
• or event or
11 / 126
Operands
• Bit selects and part selects
reg[15:0] big_vect;
reg[0:15] little_vect;
big_vect[msb_base_expr -: width_expr]
little_vect[msb_base_expr +: width_expr]
• Array and memory addressing
reg[7:0] mem[0:1023];
mem[addr_expr] = 8’b11110000
• String padding and potential problems
reg[8*5:0] str1, str2;
reg[8*10:0] str3;
str1 = “aaa”; str2 = “bbb”; // str1 is 10’h0000616161
str3 = {str1, str2}; // str3 is not equal “aaabbb”
12 / 126
Assignments…
• Continuous assignments
– net declaration assignments
wire a = b;
– assign statement
assign {c_out, sum_out} = in_a + in_b + c_in;
– delay
wire #10 a = b;
• Procedural assignments
reg[3:0] a = 4’b0101
integer i = a;
13 / 126
Gate Level Modeling…
• Gate declaration syntax
– keyword that names the type of gate
– optional drive strength
– optional propagation delay
– optional identifier that names the gate
– optional range array of instances
– terminal connection list
• Gate types
n-input gates tri-state gates
and, nand, or, nor, xor, xnor bufif0, bufif1, notif0, notif1
n-output gates pull gates
buf, not pullup, pulldown
14 / 126
Example: four-bit tri-state buffer
15 / 126
Example: array of instances
16 / 126
Example: array of instances
module dffn (q, d, clk); module MxN_pipeline (in, out, clk);
parameter bits = 1; // M=width,N=depth
input [bits-1:0] d; parameter M = 3, N = 4;
output [bits-1:0] q; input [M-1:0] in;
input clk ; output [M-1:0] out;
input clk;
// create a row of D flip-flops wire [M*(N-1):1] t;
DFF dff[bits-1:0] (q, d, clk);
endmodule // #(M) redefines the bits parameter for
// dffn
// create p[1:N] columns of dffn rows
dffn #(M) p[1:N] ({out, t}, {t, in}, clk);
endmodule
17 / 126
Gate Delays
• Rise, fall and turn-off delays
<gate_type> #(delay_time) <instance_name> (<connection_list>);
and #(rise_val, fall_val) a1 (out, i1, i2);
and #(rise_val, fall_val, turnoff_val) a2 (out, i1, i2);
• Min/Typ/Max values
and #(2:3:4, 3:4:5, 4:5:6) a3 (out, i1, i2);
• Switching delay at runtime
+maxdelays, +mindelays, +typdelay command line arguments
18 / 126
Switch level modeling…
• MOS switches
pmos nmos
rnmos rpmos
• Bidirectional pass switches
tran tranif1 tranif0
rtran rtranif1 rtranif0
• CMOS switches
cmos rcmos
19 / 126
User Defined Primitives
• UDP definition
– Two types of UDP headers
primitive myUdp (output out, input in1, input in2);
table
// in1 in2 out
0 0 : 0;
0 1 : 1;
1 ? : 0;
endtable
endprimitive
– Multiple input ports and exactly one output port
– No vector ports, only scalar
– Output port is first in the port list
20 / 126
Combinational UDPs
• Output state is a function of the input states
• State table
– One column for each input signal
– One row for the output signal
– Colon separates input columns from output column
– ? represents any value of that signal (0, 1 or X)
– Z value in table not allowed
– Z value on input signal interpreted as X
21 / 126
Sequential UDPs
• Level-sensitive sequential UDPs
22 / 126
Sequential UDPs
• Edge sensitive sequential UDPs
24 / 126
4-to1 Multiplexer
• Method 1: logical equation • Method 2: conditional operator
module mux4_to_1 (out, i0, i1, i2, i3, module mux4_to_1(out, i0, i1, i2, i3,
s1, s0); s1, s0);
output out; output out;
input i0, i1, i2, i3, s0, s1; input i0, i1, i2, i3, s0, s1;
assign out = (~s1 & ~s0 & i0) | assign out = s1 ? ( s0 ? i3 : i2 )
(~s1 & s0 & i1) | : ( s0 ? i1 : i0 );
(s1 & ~s0 & i2) | endmodule
(s1 & s0 & i3);
endmodule
25 / 126
Behavioral modeling…
• Behavioral model overview module behave;
– procedural statements reg [1:0] a, b;
– initial and always constructs initial begin
– begin-end statement blocks a = ’b1;
b = ’b0;
end
always begin
#50 a = ~a;
end
always begin
#100 b = ~b;
end
endmodule
26 / 126
Procedural assignments
• Execute under the control of the procedural flow
• Blocking assignments
– Executes before the statements that follow
rega = 0;
rega[3] = 1; // a bit-select
rega[3:5] = 7; // a part-select
mema[address] = 8’hff; // assignment to a mem element
{carry, acc} = rega + regb; // a concatenation
• Nonblocking assignments
– Execute concurrently
a <= b;
b <= a;
27 / 126
Continuous assignments
• assign, deassign • force, release
– executed in a procedural – operate on nets or variables
flow
– operate on regs reg a, b, c, d;
28 / 126
Conditional statement
• if-else statement • if-elseif statement
if (index > 0) if (index < segment2)
begin begin
if (rega > regb) instruction = segment_area [index +
result = rega; modify_seg1];
end index = index + inc_seg1;
else end
result = regb; else if (index < segment3)
begin
instruction = segment_area [index +
modify_seg2];
index = index + inc_seg2;
end
else
instruction = segment_area [index];
29 / 126
Case statement
• simple case statement • case statements with don’t-cares
case (sig) casez (ir)
1’b0: out = in1; 8’b1???????: …
1’b1: out = in2 8’b01??????: …
default out = 1’bz; 8’b00010???: …
endcase 8’b000001??: …
• case with constant endcase
case (1)
encode[2] : … mask = 8’bx0x0x0x0;
encode[1] : … casex (r ^ mask)
encode[0] : … 8’b001100xx: stat1;
default: … 8’b1100xx00: stat2;
endcase 8’b00xx0011: stat3;
8’bxx010100: stat4;
endcase
30 / 126
Looping Statements
• forever statement • while statement
– Continuously executes a – Executes while a condition is
statement met
forever while (tmp) begin
begin if (tempreg[0])
… count = count + 1;
end tmp = tmp >> 1;
• repeat statement end
– Executes a statement a fixed • for statement
number of times for ( initial_assignment;
repeat (size) begin condition;
if (shift_opb[1]) step_assignment )
result = result + shift_opa; statement;
shift_opa = shift_opa << 1;
shift_opb = shift_opb >> 1;
end
31 / 126
Procedural Timing Control
• delay timing control, #
#d rega = regb; // d is defined as a parameter
#((d+e)/2) rega = regb;// delay is average of d and e
#regr regr = regr + 1; // delay is the value in regr
• event timing control, @
@r rega = regb; // controlled by any value change in the reg r
@(posedge clock) rega = regb; // on posedge on clock
forever @(negedge clock) rega = regb; // on negative edge
@(*) // equivalent to @(a, b, c, d, e)
y = (a & b) | (c & d) | myfunction(e);
event my_event;
-> my_event;
• wait statement
wait (!enable) #10 a = b;
• intra-assignment timing controls
32 / 126
Block Statements
• Sequential blocks • Parallel blocks
begin // a waveform controlled fork
// by sequential delay #50 r = ’h35;
#d r = ’h35; #100 r = ’hE2;
#d r = ’hE2; #150 r = ’h00;
#d r = ’h00; #200 r = ’hF7;
#d r = ’hF7; #250 -> end_wave;
// trigger an event called join
// end_wave • Block names
#d -> end_wave; begin : BLOCK_1
end // …
end
33 / 126
Initial and Always Constructs
• Initial constructs • Always constructs
– Execute once, at time 0 – Enabled at time 0
– Execute repetitively
initial begin
areg = 0; // initialize a reg always #half_period
//initialize memory word areg = ~areg;
for (index = 0;
index < size; always @(posedge clk)
index = index + 1) q = d;
memory[index] = 0;
end
34 / 126
Tasks and Functions…
Distinction between tasks and functions
35 / 126
Tasks
• Task declaration
task [ automatic ] task_identifier ( input in_arg, output out_arg ) ;
{ block_item_declaration } // variables
statement
endtask
• Task enabling
full_task_hierarchical_name ( tasks_arg_list );
• Task arguments
– input arguments of any type
– only variables as output and inout arguments
– arguments passed by value rather than by reference
• Static versus automatic tasks
36 / 126
Traffic Lights Example
module traffic_lights; // task to wait for ’tics’ positive
reg clock, red, amber, green; // edge clocks
parameter on = 1, off = 0, // before turning ’color’ light off.
red_tics = 350, task light;
amber_tics = 30, output color;
green_tics = 200; input [31:0] tics;
// initialize colors. begin
initial red = off; repeat (tics) @ (posedge clock);
initial amber = off; color = off; // turn light off.
initial green = off; end
// sequence to control the lights. endtask
always begin // waveform for the clock.
red = on; // turn red light on always begin
light(red, red_tics); // and wait. #100 clock = 0;
green = on; // turn green light on #100 clock = 1;
light(green, green_tics); // and wait. end
amber = on; // turn amber light on endmodule // traffic_lights.
light(amber, amber_tics); // and wait.
end
37 / 126
Functions
• Function declaration
function [7:0] getbyte (input [15:0] address);
begin
// code to extract low-order byte from addressed word
...
getbyte = result_expression;
end
endfunction function automatic integer factorial;
• Setting the return value input [31:0] operand;
function_name = return_value; integer i;
if (operand >= 2)
factorial = factorial (operand - 1) *
operand;
else
factorial = 1;
endfunction
38 / 126
System Tasks and Functions
• Display $display, $monitor, $strobe, $write…
• File IO $fopen, $fclose, $fread, $fgets, $fwrite, $fmonitor, $fstrobe…
• Timescale $printtimescale, $timeformat
• Simulation control $finish, $stop
• PLA modeling $async$and$array, $sync$and$array,
$async$and$plane, $sync$and$plane…
• Stochastic analysis
• Simulation time $realtime, $time, $stime
• Conversion $bitstoreal, $itor, $signed…
• Probabilistic
• Command Line Input $test$plusargs, $value$plusargs
39 / 126
Display System Tasks
• $display • $monitor
reg [31:0] rval; – continuous monitoring
pulldown (pd);
40 / 126
Simulation Control
initial
begin • $stop [(n)]
x = 4’b0000; – Stops the simulation and goes
#50 to interactive mode
x = 4’b0001;
#50 – Prints optional diagnostic
x = 4’b0010; message
#50 • $finish [(n)]
x = 4’b0011; – Ends the simulation
#50 – Prints optional diagnostic
x = 4’b0100; message
#50
x = 4’b0101;
#50
$finish;
end
41 / 126
Simulation Time
• $time
– Returns 64-bit value of current simulation time scaled to the
timescale of the scope from which it was invoked
• $realtime
– Returns real value of current time
• $stime
– Returns unsigned integer 32-bit value of current time
`timescale 10 ns / 1 ns // Output from the simulation
module test; 0 0 set=x
reg set; 2 1.6 set=0
parameter p = 1.55; 3 3.2 set=1
initial begin
$monitor ($time, $realtime, "set=", set);
#p set = 0;
#p set = 1;
end
endmodule
42 / 126
Disabling of Named Blocks
• disable statement
– Results of output and inout arguments
– Scheduled, but not executed, nonblocking assignments
– Procedural continuous assignments (assign and force
statements)
44 / 126
Configuration
• Libraries // specify rtl adder for top.a1
– Library mapping file – // gate-level adder for top.a2
mapping source files to config cfg1;
libraries design rtlLib.top;
library rtlLib *.v; default liblist rtlLib;
library gateLib *.vg;
instance top.a2 liblist gateLib;
instance top.bot use rtlLib.bot:config
• Configurations endconfig
– config statements
45 / 126
Types of Delay Models
• Distributed delay module M (out, a, b, c, d);
– Specified on a per element basis output out;
input a, b, c, d;
– Assigning delay value to individual gates wire e, f;
– Delay values in assign statements and #5 a1 (e, a, b);
and #7 a2 (f, c, d);
• Lumped delay and #4 a3 (out, e, f);
endmodule
– Specified on per-module basis
– Single delay on output gate
• Pin-to-Pin delay (path delay)
– Delay from every input port to every
output port
46 / 126
Path Delay Modeling
• Specify blocks // pin-to-pin delays
– Assign pin-to-pin delays specify
– Set up timing checks specparam a_out = 9;
(a => out) = a_out; // parallel
– Define specparam constants (b => out) = 9;
• Inside specify blocks (c => out) = 11;
(d => out) = 11;
– Parallel connection (e, f) *> (g, h) = 12; // full
– Full connection if (i) (i => j) = 9;
– specparam statements if (~i) (i => j) = 11;
specparam t_rise = 9;
– Conditional path delays specparam t_fall = 11;
– Rise, fall and turn-off delays specparam t_turnoff = 8:9:10;
– Min, max and typical delays (k => l) = (t_rise, t_fall, t_turnoff);
endspecify
– Handling x transitions
47 / 126
Timing Checks
• $setup and $hold checks • $width checks
specify specify
$setup (data, posedge clk, 3); $width (posedge clk, 6);
$hold (data, posedge clk, 5);
endspecify
endspecify
48 / 126
Delay Back-Annotation
• OVI Standard Delay File (SDF) Format
• Delay back-annotation design flow:
1. Designer writes RTL description
2. Conversion of RTL description to gate level netlist
3. Timing simulation with prelayout delay etimates
4. Placement and routing, postlayout delays
5. Postlayout delays back-annotated to modify the delay
values and simulation run again
6. If needed, optimize RTL description and go back to step 2
49 / 126
Compiler Directives
• `timescale
– s, ms, us, ns, ps, and fs
`timescale <time_unit> / <time_precision>
• `include
• `define and `undef
• `ifdef, `else, `endif, `elseif, `ifndef
• `celldefine, `endcelldefine
• `default_nettype
• `resetall
• `line
• `unconnedcetd_drive, `nounconnected_drive
50 / 126
Components of a Simulation
• Bottom-up, top-down, combined design
• Testing
– Stimulus block (Test Bench)
51 / 126
Ripple Carry Counter Example
module D_FF (q, d, clk, reset); module stimulus
output q; input d, clk, reset; reg clk, reset; wire [3:0] q;
reg q; r_c_counter r1 (q, clk, reset);
always @( posedge reset or initial
negedge clk ) clk = 1’b0;
if (reset) always
q = 1’b0; #5 clk = ~clk;
else initial
q = d; begin
endmodule reset = 1’b1;
module T_FF (q, clk, reset); #15 reset = 1’b0;
output q; input clk, reset; #180 reset = 1’b1;
wire d; #20 $finish;
D_FF dff0 (q, d, clk, reset); end
not n1 (d, q); initial
endmodule $monitor ($time, “ Output q = %d”, q);
module r_c_counter (q, clk, reset); endmodule
output [3:0] q; input clk, reset;
T_FF tff0 (q[0], clk, reset);
T_FF tff1 (q[1]. q[0], reset);
T_FF tff1 (q[2]. q[1], reset);
T_FF tff1 (q[3]. q[2], reset);
endmodule 52 / 126
Logic Synthesis with Verilog
• From high-level description
into an optimized gate-level
representation
• Standard cell library
• Design constraints
– Timing
– Area
– Testability
– Power
• Computer aided logic
synthesis tools
53 / 126
Verilog HDL Synthesis
• Verilog constructs • Verilog operators
– Ports – input, output, inout – Arithmetic * / + - %
– Parameters – Logical ! && ||
– Signals and variables – wire, – Relational < > <= >=
reg, tri, vector – Equality == !=
– Instantiation – modules, – Bit-wise
primitive gates – Reduction
– Functions and tasks – timing – Shift
constructs ignored
– Concatenation
– Procedural – always, if, then,
else, case, no initial blocks – Conditional
– begin, end, disable
– assign
– Loops – for, while, forever.
must contain @(…)
54 / 126
Interpretation
• assign
assign out = (a & b) | c;
• if-else
if (s)
out = i1;
else
out = i0;
• always
always @(posedge clk)
q = d;
55 / 126
Synthesis Design Flow
• RTL design
• Translation creates
unoptimized internal
representation
• Internal logic optimization
• Technology dependant
optimization
• Cell characterization from
technology library
• Timing, area, power
constraints
• Verification – functional and
timing, regression test
56 / 126
Value Change Dump
• Contains value changes on selected variables
• Two types
– Four state – 0, 1, x, z, no strength information
– Extended – variable changes in all states and strengths
• $dumpfile – Specifies the VCD file name
• $dumpvars – Sets up dumping
• $dumpoff, $dumpon – Starts and stops dumping
• $dumpall – Creates a checkpoint in VCD file
• Drawback – size, could go up to 100MB
57 / 126
Part 2
58/
Event Simulation
• Discrete event execution model
• Connected threads of execution – processes
• Processes
– Can be evaluated, may have state, respond to changes on
inputs to provide output
• Update event – a change in value or named event
• Evaluation event – evaluation of a process
• Simulation time
• Event queue
59 / 126
Stratified Event Queue
• Five regions • $monitor and $strobe
1) Active events • Nonblocking assignment
2) Inactive events statements
3) Nonblocking assign update
events
4) Monitor events
5) Future events
• Taken off active region
• Simulation cycle
– Processing of all active
events
• Time step
• Explicit zero delay
60 / 126
Simulation Reference Model
while (there are events) {
if (no active events) {
if (there are inactive events) {
activate all inactive events;
} else if (there are nonblocking assign update events) {
activate all nonblocking assign update events;
} else if (there are monitor events) {
activate all monitor events;
} else {
advance T to the next event time;
activate all inactive events for time T;
}
}
E = any active event;
if (E is an update event) {
update the modified object;
add evaluation events for sensitive processes to event queue;
} else { /* shall be an evaluation event */
evaluate the process;
add update events to the event queue;
}
}
61 / 126
Determinism
• Guaranteed scheduling order
– Statements within begin … end blocks
– Nonblocking assignments
• Nondeterminism
– Active events processed in any order
– Statements with no time control don’t have to be executed
as one event
– Interleaving of the process execution
62 / 126
Race Conditions
assign p = q; Scenario 1
initial begin 1. q = 0 executed, scheduling an assignment to p
q = 1;
#1 q = 0; 2. current event suspended
$display(p); 3. p assignment event evaluated, now p equals 0
end 4. previous event continued, 0 is displayed
Scenario 2
1. q = 0 executed, scheduling an assignment to p
2. $display executed, 1 is displayed
3. p assignment executed, p now equals 0
63 / 126
Implications of Assignments
• Continuous assignments
– Processes, sensitive to value changes on rhs
• Procedural continuous assignments
– Processes sensitive to source value changes
• Blocking assignments
– Computes rhs value and schedules an update event
– If zero delay schedules an inactive event
• Nonblocking assignments
– Computes the value and schedules a nonblocking assignment
• Switch processing
– Bidirectional elements, computed with a relaxation technique
• Ports
– Implicit continuous assignments
64 / 126
Part 3
Case Study
Design of a Pipelined CPU
65/
SISC Processor Example
SISC – Small Instruction
Set Computer
66 / 126
Instruction Set
Name Mnemonic Opcode Format (inst dst, src)
NOP NOP 0 NOP
BRANCH BRA 1 BRA mem, cc
LOAD LD 2 LD reg, mem1
STORE STR 3 STR mem, src
ADD ADD 4 ADD reg, src
MULTIPLY MUL 5 MUL reg, src
COMPLEMENT CMP 6 CMP reg, src
SHIFT SHF 7 SHF reg, cnt
ROTATE ROT 8 ROT reg, cnt
HALT HLT 9 HLT
67 / 126
Other Features
Condition codes Operand addressing
A Always 0 mem - Memory address
C Carry 1 mem1 - Memory address or immediate value
E Even 2 reg - Any register index
P Parity 3 src - Any register index or immediate value
Z Zero 4 cc - Condition code
cnt - Shift/rotate count, >0 = right, <0 = left, +/- 16
N Negative 5
Instruction format Process status register
IR[31:28] Opcode PSR[0] Carry
IR[27:24] cc PSR[1] Even
IR[27] source type 0 = reg(mem), 1 = imm PSR[2] Parity
IR[26] destination type 0=reg, 1 = imm PSR[3] Zero
IR[23:12] source address PSR[4] Negative
IR[23:12] shift/rotate count
IR[11:0] destination address
68 / 126
Declarations
// Parameter Declaration
parameter WIDTH = 32;
parameter CYCLE = 10;
parameter ADDRSIZE = 12;
parameter MAXREGS = 16;
parameter MEMSIZE = (1 << ADDRSIZE);
// Register Declaration
reg [WIDTH – 1:0] MEM [0:MEMSIZE – 1],
RFILE [0:MAXREGS – 1],
ir,
src1,
src2;
reg [WIDTH:0] result;
reg [ADDRSIZE - 1:0] pc;
reg psr;
reg dir;
reg reset;
integer i;
69 / 126
Main Process
Processor without pipeline
always
begin : main_process
if (!reset)
begin
#CYCLE fetch;
#CYCLE execute;
#CYCLE write_result;
end
else
#CYCLE;
end
70 / 126
System Initialization
task apply_reset;
begin
reset = 1;
#CYCLE
reset = 0;
pc = 0;
end
endtask
initial
begin : prog_load
$readmemb (“sisc.prog”, MEM);
$monitor (“%d %d %h %h %h”,
$time, pc, RFILE[0], RFILE[1], RFILE[2]);
apply_reset;
end
71 / 126
Functions and Tasks
task execute;
begin Functions
case (`OPCODE)
• getsrc
`NOP: ;
`BRA : begin • getdst
if ( checkcond( `CCODE ) ) • checkcond
pc = `DST;
end Tasks
`HLT : begin • fetch
$display ( “Halt…” );
• execute
$stop;
end • write_results
`ADD : begin • set_condcode
clear_condcode;
src1 = getsrc (ir); src2 = getdst (ir); • clear_condcode
result = src1 + src2; • apply_reset
set_condcode (result);
end
// … The rest
endcase
end
endtask 72 / 126
Testing the Model
task disprm;
• Test program input rm; // Display register file or memory
– Binary input [ADDRSIZE – 1:0] adr1, adr2;
begin
– Loaded with $readmemb if (rm == `REGTYPE)
while (addr2 >= addr1)
• Invocation begin
– verilog sisc.v $display (“REGFILE[%d] = %d\n”,
adr1, RFILE[adr1]);
• Debugging adr1 = adr1 + 1;
end
– Verifying functionality else
– Interactive debugging while (addr2 >= addr1)
begin
– Helper tasks $display (“MEM[%d] = %d\n”,
ad1, MEM[adr1]);
adr1 = adr1 + 1;
end
end
endtask
73 / 126
Modeling Pipeline Control
• Three stage pipeline
– Fetch i F E W
– Execute i+1 F E W
– Write i+2 F E W
• Prefetching instructions
– When branch taken execution
unit idle for one cycle
• Memory access reservation
– Fetch unit idle on load and store
– Multiported register files and
memories keep the pipeline full
74 / 126
Functional Partitioning
• Synchronous pipeline execution
• Three events triggered on positive edge
– do_fetch, do_execute, do_write_results
• Data transfer between stages on negative edge
75 / 126
Additional Declarations
parameter QDEPTH = 3; // Instruction queue depth
// Various controls/flags
reg mem_access, branch_taken, halt_found;
reg result ready;
reg executed, fetched;
wire queue_full;
76 / 126
Fetch Unit
• fptr shows the position in the queue where the next
instruction is to be stored
• qsize shows the number of prefetched instructions
• mem_access signals the fetch unit to stall a cycle
task fetch;
begin
IR_Queue[fptr] = MEM[pc];
fetched = 1;
end
77 / 126
Execution Unit
• One cycle instructions Execution unit stalls a cycle
• Two cycle load and store • Branch taken
• Arithmetic instructions • Queue empty
– Operands immediate or regs
task flush_queue;
• Three port register file begin
– Two outs and one in for result // pc modified by branch execution
• mem_access reserves memory fptr = 0;
eptr = 0;
access for next cycle
qsize = 0;
if (!mem_access) ir = IR_Queue[eptr]; branch_taken = 0;
end
`LD : begin
if (mem_access == 0) // Reserve next
mem_access = 1;
else begin // Mem access
……… // in next cycle
end
78 / 126
Write Unit
• wresult and wir registers between execute and write stages
• result ready flag set by execute if the result must be written to register
file
• Alternate approaches
– Modify result reg in negative clock cycle by execute and write it to register file in
positive cycle by write unit
– Remove write stage completely and write result from the ALU
80 / 126
Interlock
• Register interlock reg bypass;
function [31:0] getsrc;
– Two arithmetic instructions
input [31:0] i;
– Arithmetic instruction followed by begin
a store if (bypass) getsrc = result;
else if (`SRCTYPE === `REGTYPE)
getsrc = RFILE[`SRC];
I1: ADD R1, R2 // R1 = R1 + R2 else getsrc = `SRC;
I2: CMP R3, R1 // R3 = ~R1 end
endfunction
function [31:0] getdst;
I1: ADD R1, R2 // R1 = R1 + R2
…
I2: STR A, R1 // MEM[A] = R1 endfunction
always @(do_execute) begin : execute_block
if (!mem_access) begin
ir = IR_Queue[eptr];
bypass = ( (`SRC == `WDST) ||
(`DST == `WDST) );
end
execute;
if (!mem_access) executed = 1;
end
81 / 126
Test Vector Generation
// Program to count the number of 1’s in a given binary number
//
0010_1000_0000_0000_0000_0000_0000_0001 // LD R1, #0
0010_0000_0000_0000_1001_0000_0000_0000 // LD R2, NMBR
0001_0010_0000_0000_0000_0000_0000_0100 // STRT : BRA L1
0100_1000_0000_0000_0001_0000_0000_0001 // ADD R1, #1
0111_1000_0000_0000_0001_0000_0000_0000 // L1 : SHF R2, #1
0001_0100_0000_0000_0000_0000_0000_0111 // BRA L2, ZERO
0001_0000_0000_0000_0000_0000_0000_0010 // BRA STRT ALW
0011_0000_0000_0000_0001_0000_0000_1010 // L2 : STR RSLT, R2
1001_1111_1111_1111_1111_1111_1111_1111 // HLT
0101_0101_0101_0101_1010_1010_1010_1010 // NMBR : 5555aaaa
0000_0000_0000_0000_0000_0000_0000_0000 // RSLT : 00000000
82 / 126
Complete Processor Model
module pipeline_control; `define TRUE 1
`define FALSE 0
parameter CYCLE = 10; `define DEBUG_ON debug = 1
parameter HALFCYCLE = (CYCLE / 2); `define DEBUG_OFF debug = 0
parameter WIDTH = 32; `define OPCODE ir[31:28]
parameter ADDRSIZE = 12; `define SRC ir[23:12]
parameter MEMSIZE = (1 << ADDRSIZE); `define DST ir[11:0]
parameter MAXREGS = 16; `define SRCTYPE ir [27]
parameter SBITS = 5; `define DSTTYPE ir[26]
parameter QDEPTH = 3; `define CCODE ir[27:24]
`define SRCNT ir[23:12]
reg [WIDTH – 1:0] MEM [0:MEMSIZE – 1], `define WOPCODE wir[31:28]
RFILE [MAXREGS – 1], `define WSRC wir[23:12]
ir, src1, src2, `define WDST wir[11:0]
IR_Queue[0:QDEPTH – 1], `define REGTYPE 0
wir; `define IMMTYPE 1
reg [WIDTH:0] result, wresult; `define NOP 4’b0000
reg [SBITS – 1:0] psr; `define BRA 4’b0001
reg [ADDRSIZE – 1:0] pc; `define LD 4’b0010
reg dir, reset, clock, mem_access, halt_found; `define STR 4’b0011
reg branch_taken, result_ready, executed; `define ADD 4’b0100
reg fetched, debug, bypass; `define MUL 4’b0101
integer i; `define CMP 4’b0110
reg [2:0] eptr, fptr, qsize; `define SHF 4’b0111
wire queue_full; `define ROT 4’b1000
event do_fetch, do_execute, do_write_results; `define HLT 4’b1001
83 / 126
Complete Processor Model
`define CARRY psr[0] function [WIDTH – 1:0] getdst;
`define EVEN psr[1] input [WIDTH – 1:0] in;
`define PARITY psr[2] begin
`define ZERO psr[3] if (bypass) getdst = result;
`define NEG psr[4] else if (`DSTTYPE === `REGTYPE)
`define CCC 1 getdst = RFILE[`DST];
`define CCE 2 else
`define CCP 3 $display (“Error: Immediate destination.”);
`define CCZ 4 end
`define CCN 5 endfunction
`define CCA 0 function checkcond;
`define RIGHT 0 input [4:0] ccode;
`define LEFT 1 begin
case (ccode)
assign queue_full = (qsize == QDEPTH); `CCC : checkcond = `CARRY;
`CCE : checkcond = `EVEN;
function [WIDTH – 1:0] getsrc; `CCP : checkcond = `PARITY;
input [WIDTH – 1:0] in; `CCZ : checkcond = `ZERO;
begin `CCN : checkcond = `NEG;
if (bypass) getsrc = result; `CCA : checkcond = 1;
else if (`SRCTYPE === `REGTYPE) endcase
getsrc = RFILE[`SRC]; end
else endfunction
getsrc = `SRC; task clearcondcode
end begin
endfunction psr = 0;
end
84 / 126
Complete Processor Model
endtask branch_taken = 1;
task setcondcode end
input [WIDTH:0] res; end
begin `LD : begin
`CARRY = res[WIDTH]; if (mem_access == 0)
`EVEN = ~res[0]; mem_access = 1;
`PARITY = ^res; else begin
`ZERO = ~( | res ); if (debug) $display (“Load…”);
`NEG = res[WIDTH – 1]; clearcondcode;
end if (`SRCTYPE )
task fetch; RFILE[`DST] = `SRC;
begin else
IR_Queue[fptr] = MEM[pc]; RFILE[`DST] = MEM[`SRC];
fetched = 1; setcondcode( {1’b0, RFILE[`DST]} );
end mem_access = 0;
task execute; end
begin end
if (!mem_access) ir = IR_Queue[eptr]; `STR : begin
case (`OPCODE) if (mem_access == 0)
`NOP : begin mem_access = 1;
if (debug) $display (“Nop…”); else begin
end if (debug) $display (“Store…”);
`BRA : begin clearcondcode;
if (debug) $write (“Branch…”); if (`SRCTYPE)
if ( checkcond(`CCODE) == 1 ) MEM[`DST] = `SRC;
begin else
pc = `DST; MEM[`DST] = RFILE[`SRC];
85 / 126
Complete Processor Model
mem_access = 0; result = (i >= 0) ? (src2 >> i) :
end (src2 << i);
end setcondcode (result);
`ADD : begin end
clearcondcode; `ROT : begin
src1 = getsrc (ir); clearcondcode;
src2 = getdst (ir); src1 = getsrc (ir);
result = src1 + src2; src2 = getdst (ir);
setcondcode (result); dir = ( src1[ADDRSIZE - 1:0] == 0) ?
end RIGHT :
`MUL : begin LEFT;
clearcondcode; i = ( src1[ADDRSIZE – 1:0] == 0 ) ?
src1 = getsrc (ir); src1 : ~src1[ADDRSIZE – 1:0];
src2 = getdst (ir); while (i > 0) begin
result = src1 * src2; if (dir == `RIGHT) begin
setcondcode (result); result = src2 >> 1;
end result [WIDTH – 1] = src2[0];
`CMP : begin end
clearcondcode; else begin
src1 = getsrc (ir); result = src2 << 1;
result = ~src1; result[0] = src2 [WIDTH – 1];
setcondcode (result); end
end i = i – 1;
`SHF : begin src2 = result;
clearcondcode; end
src1 = getsrc (ir); setcondcode (result);
src2 = getdst (ir); end
i = src1[ADDRSIZE – 1:0]; `HLT : begin 86 / 126
Complete Processor Model
$display (“Halt…”); begin
halt_found = 1; if ( (`OPCODE >= `ADD) &&
end (`OPCODE < `HLT) ) begin
default : $display (“Error: Wrong Opcode.”); setcondcode (result);
endcase wresult = result;
if (!mem_access) executed = 1; wir = ir;
end result_ready = 1;
endtask end
task write_result; end
begin endtask
if ((`WOPCODE >= `ADD) && (`WOPCODE < `HLT)) task set_pointers;
begin begin
if (`WDSTTYPE == `REGTYPE) case ({fetched, executed})
RFILE[`WDST] = wresult; 2’b00 : ;
else 2’b01 : begin
MEM[WDST] = wresult; qsize = qsize – 1;
result_ready = 0; eptr = (eptr + 1) % QDEPTH;
end end
end 2’b10 : begin
endtask qsize = qsize + 1;
task flush_queue; fptr = (fptr + 1) % QDEPTH;
begin end
fptr = 0; eptr = 0; qsize = 0; 2’b11 : begin
branch_taken = 0; eptr = (eptr + 1) % QDEPTH;
end fptr = (fptr + 1) % QDEPTH;
endtask end
task copy_results endcase
87 / 126
Complete Processor Model
end always @ (do_fetch) begin : fetch_block
endtask fetch;
always @ (do_write_results)
always @ (negedge clock) begin : phase_2_loop write_results;
if (!reset) begin always @ (posedge clock) begin : phase_1_loop
if (!mem_access && !branch_taken) if (!reset) begin
copy_results; fetched = 0;
if (branch_taken) pc = `DST; executed = 0;
else if (!mem_access) pc = pc + 1; if (!queue_full && !mem_access)
if (branch_taken || halt_found) -> do_fetch;
flush_queue; if (qsize || mem_access)
else set_pointers; -> do_execute;
if (halt_found) begin if (result_ready)
$stop; -> do_write_results;
halt_found = 0; end
end end
end task apply_reset;
end begin
always @ (do_execute) begin : execute_block reset = 1;
if (!mem_access) begin #CYCLE reset = 0;
ir = IR_Queue[eptr]; pc = 0;
bypass = ((`SRC == `WDST) || end
(`DST == `WDST)); endtask
end initial begin : prog_load
execute; $readmemb (“sisc.prog”, MEM);
if (!mem_access) executed = 1; apply_reset;
end end
88 / 126
Part 4
89/
PLI Purpose and History
• C language interface
• Provides means to access instantiated design dynamically
• Possible applications
– Dynamical delay calculators
– Reading stimulus vectors from files
– Custom graphical waveform and debugging environments
– Decompilers
– Simulation models written in C and dynamically linked to Verilog
– Interfaces to actual hardware
• Three generations of PLI
– TF routines
– ACC routines
– VPI
90 / 126
User-defined System Tasks
• Way of invoking PLI applications
• Invoked like normal Verilog system tasks and
functions
• User defined system tasks, system functions and
real functions
• Overriding built-in system tasks and functions
– Custom functionality
– No vendor implementation
– Additional type safety
91 / 126
PLI Interface Mechanism
• PLI applications are C functions, compiled into a
library and linked into the software product
• Invoked for different reasons
– Syntax checking
– Performing the operation
– Miscellaneous reasons
– Callbacks
• PLI include files
– veriuser.h
– acc_user.h
– vpi_user.h
92 / 126
PLI 1.0
• Five classes of user-supplied PLI applications
– sizetf, checktf, calltf, misctf and consumer
• sizetf
– Check the size of the lhs, invoked at compile time
• checktf
– Checks the correctness of arguments passed
• calltf
– Performs the operation
• misctf
– Called at various times in a simulation
• consumer
– VCL callbacks
93 / 126
Associating PLI applications
• Vendor specific
• Verilog XL
– List the user-defined systfs in a static array in veriuser.c
– Compile veriuser.c into your PLI dynamic library libpli.dll(so)
– Place the library in the simulator directory
• Synopsis VCS
– List the user-defined systfs in a separate TAB file
– Compile the PLI functions into a dynamic library
– Pass the TAB and library file names on command line
94 / 126
PLI Application Arguments
• Arguments from Verilog invocation stored in
static memory
• Accessed using TF and ACC routines
– tf_nump, tf_getp
• Arguments passed to PLI functions
– data argument – user defined identifier
– reason argument – reason of invocation
– paramvc argument – VCL argument
95 / 126
TF Routines
• Reading and writing arguments
– tf_nump, tf_getp, tf_getrealp, tf_putrealp, tf_strgetp…
• Value change detection
– tf_asynchon, misctf function called when value changes
• Simulation time – tf_gettime, tf_getlongtime, tf_strgettime
• Simulation synchronization – tf_synchronize, tf_rosynchronize
• Saving information from one call to the next
– tf_setworkarea, tf_getworkarea
• Displaying output messages
– io_printf, tf_error, tf_message, tf_text
• Stopping and finishing
– tf_dostop, tf_dofinish
96 / 126
ACC Routines
• Read and write information to • Accessible objects
instantiated Verilog design – Module instances
• handle data type – Module ports
– objects unique identifier
– Individual bits of ports
– acc_handle_...
– Module or data paths
• ACC routine types
– Intermodule paths
– Fetch – acc_fetch_...
– Handle – acc_handle_... – Top-level modules
– Next – acc_next_... – Primitive instances
– Modify – Primitive terminals
– VCL – acc_vcl_add, – Nets, regs and variables
acc_vcl_delete – Named events
– Miscellaneous
– Parameters
– Timing checks
– User-defined systf args
97 / 126
PLI Example 1 – Helper functions
/* Taken from Chris Spear’s math library */
#define ARG1 1 /* First argument */
#define ARG2 2 /* Second argument */
#define RETURNV 0 /* Return value */
exp_call() /* calltf routine */
{
tf_putrealp (RETURNV, exp(tf_getrealp(ARG1)));
}
log_call()
{
tf_putrealp (RETURNV, log(tf_getrealp(ARG1)));
}
log10_call()
{
tf_putrealp (RETURNV, log10(tf_getrealp(ARG1)));
}
sin_call()
{
tf_putrealp (RETURNV, sin(tf_getrealp(ARG1)));
}
sqrt_call()
{
tf_putrealp (RETURNV, sqrt(tf_getrealp(ARG1)));
}
98 / 126
PLI Example 2 - Iteration
#include "acc_user.h"
display_net_names()
{
handle module_handle;
handle net_handle;
/*initialize environment for access routines*/
acc_initialize();
/*get handle for module*/
module_handle = acc_handle_tfarg(1);
/*display names of all nets in the module*/
net_handle = null;
while ( net_handle = acc_next_net( module_handle, net_handle ) )
io_printf( "Net name is: %s\n", acc_fetch_fullname(net_handle) );
acc_close();
}
99 / 126
Verilog Procedural Interface
• Access to internal simulator structures
• Access to design elements
• C interface to object model
• Object data model diagrams
• VPI system tasks – vpi_register_systf
– cimpiletf
– sizetf
– calltf
100 / 126
VPI Object Classifications
one-to-many relationship
/* Global access*/
vpiHandle net;
net = vpi_handle_by_name(“top.m1.w1”, NULL);
/* Module access */
vpiHandle net, mod;
net = vpi_handle_by_name(“top.m1.w1”, NULL);
mod = vpi_handle( vpiModule, net );
101 / 126
VPI Routines
• VPI callbacks – vpi_register_cb
– Simulation event
– Simulation time
– Simulation action/feature
– User-defined system task or function execution
• Access to simulation objects
– vpiHandle
– Object properties
– Iteration
– Global objects
• Function availability
• Traversing expressions
102 / 126
VPI Example
void display_certain_net_values(vpiHandle module, PLI_BYTE8 target)
{
static s_vpi_value value_s = {vpiBinStrVal};
static p_vpi_value value_p = &value_s;
vpiHandle net, itr;
itr = vpi_iterate(vpiNet, module);
while (net = vpi_scan(itr))
{
PLI_BYTE8 *net_name = vpi_get_str(vpiName, net);
if (strcmp(target, net_name) == 0)
{
vpi_get_value(net, value_p);
vpi_printf(“Value of net %s: %s\n”,
vpi_get_str(vpiFullName, net), value_p->value.str);
}
}
}
103 / 126
Part 5
104/
Key Players on the Market
HDL HVL
• Cadence • Languages
– Verilog XL, NC – Verilog, VHDL, SystemC,
• Synopsis OpenVera, e
– VCS, Scirocco • Cadence
• Model Technologies Inc. – TestBuilder, SignalScan
– Modelsim • SynaptiCAD Inc.
• SynaptiCAD Inc. – TestBencher Pro.
– Verilogger Pro. • Novas
– Debussy
• Synopsis
– Simwave
• Verisity
– eCelerator
• Veritools
– Undertow
105 / 126
Latest
• Avanti Execs pleaded no contest to stealing
place and route code from Cadence
• Cadence gets $195 in criminal restitution
• Synopsis acquired Avanti
• Cadence agrees to settle the civil lawsuit for
code theft for another $265
106 / 126
Inside Information
Dan Notestein,
President and CEO at SynaptiCAD Inc.
“…I wouldn't say superlog is "hot" yet in terms of usage, but things
are looking well for its future. Synopsys acquired Co-Design
Automation, the company that created Superlog, so they've taken
over Superlog promotion. As far as I know, the plan is to slowly
release parts of Superlog into the public domain and add it to what
is being called "System Verilog" (this is the new industry standard,
it already contains new stuff not in Verilog 2002 I think). Other than
that not too much has changed in EDA I think. I think SUGAR is
rapidly becoming the dominant temporal expression language.
We're currently planning to add support for it to a new product.
OpenVERA and e are still in head-to-head competition, and there
is no clear sign which, if either, will be a winner. TestBuilder
technology for creating test benches is supposed to be folded into
an upcoming version of SystemC (we're looking forward to this, it
will expand our TestBencher market)…”
107 / 126
Simulator Types
• Interpretative
– VerilogXL, Verilogger
– Slow
– Interpret Verilog commands at CLI
– One-pass
• Native Compiled
– NC
– Very fast
– Mixed language
– Compilation, elaboration, simulation
• Compiled
– VCS
– Fast
– Not very flexible, or scalable (in terms of multiple language support)
108 / 126
HVL
• Hardware Verification Language
– Verilog, VHDL, SystemC, OpenVera, e
• Verification of HDL hardware models
• Test bench generation
– Bus-functional models
• Graphical Representation
– Waveform viewers
• HDL debuggers
– Test bench + waveform + interactive debugging
109 / 126
The Future
• Verification
– Single design bottleneck
– 70% of design time
– Behavioral coverage
• Standardization
• Additional platform support
• Single design flow
– RTL to GDSII, Synopsis $117K one cpu/year license
• Distributed simulation process
– Multiple platforms, multiple simulators
– One waveform
110 / 126
Part 6
Superlog
111/
Overview
• SUPERLOG contains functionality of
– Verilog for hardware design
– C for software
– Direct structure access, eliminates the need for PLI
– Interfaces to encapsulate communication
– Sequential assertions for protocol checking
– State machines for designing control logic
– Dynamic processes for pipelines and real time software
– Dynamic arrays for queues, lookup tables and large memories
– Random selection and data generation for testbenches
112 / 126
Application
• Extension of Verilog 2001
• SUPERLOG streamlines the design flow
• One language for
– Systems
– Hardware
– Testbenches
– Software
113 / 126
Background
• Mixture
– System specification languages – SDL
– Hardware description languages – Verilog
– Testbench languages – Vera
– Programming languages – C
• Specifying hardware in C/C++ difficult
• VHDL problems
– Strict and complex data type system
– Limited inter-process communication
– Fixed number of processes
• Extended Synthesizable Subset (ESS), SUPERLOG
subset
114 / 126
Scalar Data Types
• logic and bit data types (from VHDL std_logic)
• string data type and operators
• User defined data types – typedef and import
• dynamic data type
• void
• Pointers
• Structures and unions, packed structures and unions
• Casting
• System data types (e.g. $vpiHandle)
115 / 126
Arrays
• Packed and unpacked
• Pointers
• Queues
• Sparse arrays
• Associative arrays
116 / 126
Attributes
• System attributes
• User defined attributes
• $get and $set methods
117 / 126
Pointers, Memory Management
• ref, deref
• $alloc, $free, $delete
• null
• pointers to structures, -> operator
• pointer arithmetic
118 / 126
Control Flow
• Dynamically weighted case – casew
• Selections – unique and priority
• Transitions
• Loops – for, foreach, forever, repeat while, do-while
• Jumps – break, continue, return, goto
• Parallel blocks – priority and unique fork
• Processes
• Delay control – changed, written
119 / 126
State Machines
• State declaration – state
• State machine transitions
– always_comb
– transition
• Hierarchical and concurrent state machines
120 / 126
Protocols and Assertions
• Interface constraints in sequence and timing
• Sequential assertions – assert sequence
– Recognizing regular expressions of patterns and events
• Repetition
• Alternation
• Sequential and concurrent matching
• Generating error messages automatically
• Synchronous protocols - iff
121 / 126
Processes
• Dynamic processes – process
• Semaphores
• Pre-emption
– $pause, $suspend, $resume, $this_process, $suspend
• final blocks
122 / 126
Foreign Languages
• import from C
• export to C
• Writing PLI as normal tasks and functions
• Interface to other simulators
123 / 126
Interfaces
• Changing the level of abstraction in modeling communication
• Low-level – a bundle of wires
• High-level – more like class templates
– Types, constants, variables, functions and tasks
– Processes and continuous assignments, not synthesizable
• Direction of ports seen from the module - modport
124 / 126
System Data
• Introspection
• C and Verilog system data
• Hierarchy and connectivity information
• Operating system environment variables
• VPI properties accessible from SUPERLOG
125 / 126
Materials
• IEEE 1364-2001
IEEE Standard Verilog® Hardware Description Language
• Verilog HDL – A Guide to Digital Design and
Synthesis
Samir Palnitkar
• Digital Design and Synthesis with Verilog HDL
Eli Sternheim, Singh R., Madhavan R.,
Trivedi Y.
126 / 126