Module 3 Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 62

CST 305 System Software

Module 3 (Assembler Features and Design Options)

Machine Dependent Assembler Features – Instruction Format and Addressing Modes,


Program Relocation.

Machine Independent assembler features – Literals, Symbol Defining Statements,


Expressions, program blocks, Control sections and Program Linking.

Assembler design options- Algorithm for Single Pass assembler, Multi pass assembler,
Implementation example of MASM Assembler

Machine - Dependent Assembler Features

Here considering the design and implementation of an assembler for more complex SIC/XE. So
that it is easy to examine the effect of the extended hardware on the structure and functions of
assembler. Many real machines have certain architectural features that are similar to those we
consider here.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 1


CST 305 System Software

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 2


CST 305 System Software

Indirect addressing mode is indicated by adding the prefix @ to the operand. Immediate
addressing mode is denoted by adding the prefix # to the operand. Instructions that refer to
memory are normally assembled using either the program-counter relative or base relative mode.
BASE (line 70) is an assembler directive used in conjunction with base relative addressing. If the
displacements required for both program-counter relative and base relative addressing are too
large to fit into a 3-byte instruction, then 4-byte extended format (format 4) must be used. The
extended instruction format is specified with the prefix + added to the operation code in the
source statement (line 15,35,65). Programmer has to specify this addressing when it is required.

• Main difference in SIC and SIC/XE program:

– Register –to – register instructions (in place of register – to –memory instructions)


wherever possible

– Eg: in line 150 : COMP ZERO is changed to

COMPR A,S

– Similarly, in line 165: TIX MAXLEN

TIXR T

– Register –to – register instructions are faster than the corresponding register – to –
memory instructions because they are shorter and more importantly, they do not require
another memory reference

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 3


CST 305 System Software

– Fetching an operand from a register is much faster than retrieving it from the main
memory

– When using immediate addressing, the operand is already present as part of the
instruction and need not be fetched from anywhere (line 55, 70)

– Addition of some instructions:

– Changing COMP to COMPR (line 150) forces to use CLEAR (line 132)

– Improvement in execution speed

– CLEAR is executed only once for each record read

– COMPR is executed for every byte of data transferred

Hand Assembly of SIC/XE

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 4


CST 305 System Software

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 5


CST 305 System Software

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 6


CST 305 System Software

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 7


CST 305 System Software

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 8


CST 305 System Software

If you calculate program – counter relative displacement that would be required for the
statement on line 175, you will see that it is too large to fit into the 12-bit displacement field.
Line 20 can be used with base relative mode. In our assembler, however we have arbitrarily
chosen to attempt program – counter relative assembly first.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 9


CST 305 System Software

In this line 12, the immediate operand is the symbol LENGTH. Since, the value of symbol is the
address assigned to it, the immediate instruction has the effect of loading register B with the
address of LENGTH. Note that here we combined program counter relative addressing with
immediate addressing. In general, target address calculation is performed, then, if immediate
mode is specified, the target address (not the contents stored at that address) becomes the
operand.

Program Relocation

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 10


CST 305 System Software

If we knew in advance exactly which programs were to be executed concurrently in this way,
we could assign addresses when the programs were assembled so that they would fit together
without overlap or waste space. It is impractical to plan program execution this closely. We
do not know exactly when jobs will be submitted, exactly how long they will run etc.
Desirable to load programs into memory wherever there is room for it. In such a situation, the
actual starting address of the program is not known until load time.

If we do this, the address 102D will not contain the value that we expect – in fact, it will
probably be part of some other user’s program. We need to make some change in the address
portion of this instruction so we can load and execute the program at address 2000.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 11


CST 305 System Software

The only parts of the program that require modification at load time are those that specify
direct addresses. The rest of the instructions need not be modified

– Not a memory address (immediate addressing)

– PC-relative, Base-relative

Looking at the object program, it is not possible to distinguish the values which represent
addresses and which represent the constant data items. Assembler does not know the actual
location where the program will be loaded, it cannot make changes in the addresses used by
the program. However, the assembler can identify for the loader those parts of the object
program that need modification

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 12


CST 305 System Software

Note that no matter where the program is loaded, RDREC is always 1036 bytes past the
starting address of the program.

Please check the slide no. 48 and 49 in slide show for better understanding of program
relocation in SIC and SIC/XE.

Relocatable Program

An object program that contains information needed for address modification for loading is
called re-locatable program. Here we are considering JSUB instruction, when assembler
generates the object code for the JSUB instruction, it will insert the address of RDREC
relative to the start of the program (This is the reason we initialized the location counter to 0
for the assembly. The assembler will also produce a command for the loader, instructing it to
add the beginning address of the program to the address field in the JSUB instruction at load
time. The command for the loader must also be a part of the object program.

Format of Modification Record

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 13


CST 305 System Software

Half byte approach is closely related to SIC/XE. Other machines it is not appropriate.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 14


CST 305 System Software

The record specifies that the beginning address of the program is to be added to a field that
begins at address 000007( relative to start of the program) and is 5 half bytes in length. Thus
in the assembled instruction 4B101036, the first 12 bits (4B1) will remain unchanged. The
program load address will be added to the last 20 bits (01036) to produce the correct operand
address.

Exactly, the same kind of relocation must be performed for the instructions on lines 35 and
65. The rest of the instructions in the program, however, need not be modified when the
program is loaded:

• Some cases operand is not memory address at all( eg: CLEAR S or LDA # 3)
• Some cases operand is specified using program-counter relative or base relative
addressing
• In line 10 STL RETADR is assembled using program counter relative addressing
with displacement 02D)
• No matter where the program is loaded in memory, the word labelled RETADR will
always be 2D bytes away from the STL instruction→ Thus no instruction
modification is needed
• When STL is executed, the program counter will contain the (actual ) address of the
next instruction

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 15


CST 305 System Software

• The target address calculation process will then produce the correct (actual ) operand
address corresponding to RETADR

Similarly, the distance between LENGTH and BUFFER will always be 3 bytes.

• Thus displacement in the base relative instruction on line 160 will be correct without
modification. (The contents of the base register will, depend upon where the program is
loaded. However, this will be taken care of automatically when the program –counter relative
instruction LDB # LENGTH is executed)

Machine-Independent Assembler Features


• Literals
• Symbol-Defining Statements
• Expressions
• Program Blocks
• Control Sections and Program Linking
Literals

It is convenient if a programmer can write the value of a constant operand as a part of the
instruction that uses it. This avoids having to define the constant elsewhere in the program
and make up a label for it. Such an operand is called a literal because the value is stated
literally in the instruction.In our assembler language notation, a literal is identified with the
prefix =, which is followed by a specification of the literal value, using the same notation as
in the BYTE statementeg1 :

45 001A ENDFIL LDA =C’EOF’ 032010

Thus the literal in the statement specifies a 3-byte operand whose value is the character string
EOF.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 16


CST 305 System Software

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 17


CST 305 System Software

• Eg 2 :
215 1062 WLOOP TD =X’05’ E32011
This statement specifies a 1-byte literal with the hexadecimal value 05

The notation used for literals varies from assembler to assembler, however most assemblers use
some symbols (as we have used =) to make literal identification easier.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 18


CST 305 System Software

Literal Pools

All of the literal operands used in the program are gathered together into one or more literal
pools. Normally literals are placed into a pool at the end of the program. The assembly listing of
the program containing literals usually includes a listing of this literal pool, which shows the
assigned addresses and the generated data values. Such literal pool listing is shown in the figure
immediately following the END statement. Here in this case the pool consists of the single literal
= X’05’.

In some cases, however, it is desirable to place literals into a pool at some other location in the
object program. For this an assembler directive LTORG (line 93).

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 19


CST 305 System Software

LTORG

When the assembler encounters a LTORG statement, it creates a literal pool that contains all of
the literal operands used since the previous LTORG (or the beginning of the program). This
literal pool is placed in the object program at the location where the LTORG directive was
encountered. Note that, the literals placed in a pool by LTORG will not be repeated in the pool at
the end of the program. If we had not used the LTORG statement on line 93, the literal =C’EOF’
would be placed in the pool at the end of the program. This literal pool would begin at address
1073.This means that the literal operand would be placed too far away from the instruction
referencing it to allow program – counter relative addressing.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 20


CST 305 System Software

The problem is the large amount of storage reserved for BUFFER.

By placing the literal pool before this buffer, avoids the need to use extended format instructions
when referring to literals.The need for an assembler directive LTORG arises when it is desirable
to keep the literal operand close to the instruction that uses it.Duplicate Literals – same literal in
more than one place in the program and store only one copy of the specified data value. Most
assemblers recognize duplicate literals. For eg: the literal =X’05’ is used in the program in line

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 21


CST 305 System Software

215 and 230.However, only one data area with this value is generated.Both instructions refer to
the same address in the literal pool for their operand.

The easiest way to recognize duplicate literals is by comparison of the character strings defining
them eg: =X’05’. More intelligent way is to look at the generated value instead of the defining
expression.eg: =C’EOF’ and =X’454F46’ would specify identical operand values.The assembler
might avoid storing both literals if it recognised this equivalence. But the complexity of the
assembler increases.If used character string defining a literal to recognize the duplicates we must
be very careful of literals whose value depends upon their location in the program.For example,
there are literals that refer to the current value of the location counter (often used by the symbol
*). Such literals are sometimes useful for loading base registers.

This will load the beginning address of the program into register B. So that this value would be
available for base relative addressing.In detecting duplicate literals such a notation causes a
problem. For eg:

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 22


CST 305 System Software

– If a literal =* appeared on line 13 in the program, it would specify an operand


with value 0003.

– If the same literal appeared on line 55, it would specify an operand with value
0020.

In such case, the literal operands have identical names; however they have different values and
both must appear in the pool. Another problem arises if a literal refers to any other item whose
value changes between one point in the program and another.

How is a literal handled by an assembler?

The basic data structure needed is a literal table (LITTAB). In LITTAB for each literal contains
the literal name, the operand value and length and the address assigned to the operand when it is
placed in a literal pool.LITTAB is organised as hash table, using the literal name or value as the
key.

• In Pass 1, each literal operand is recognised

– The assembler searches LITTAB for the specified literal name (or value)

– If the literal is already present in the LITTAB: no action is required

– If the literal not present: the literal is added to LITTAB (leaving the address
unassigned)

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 23


CST 305 System Software

– When Pass 1 encounters a LTORG statement or the end of the program, the
assembler makes a scan of the literal table.

– At this time each literal currently, in the table is assigned an address (unless such an
address has already been filled in)

– As these addresses are assigned, the location counter is updated to reflect the number
of bytes occupied by each literal

• In Pass 2, the operand address for using in generating object code is obtained by searching
LITTAB for each literal operand is encountered

– The data values specified by the literals in each literal pool are inserted at the
appropriate places in the object program exactly as if these values had been generated
by BYTE or WORD statements.

– If a literal value represents an address in the program (eg: location counter value), the
assembler must also generate the appropriate Modification Record

Symbol –Defining Statements

Till now we deal only with user-defined symbols appear in the assembly program as labels on
instructions or data areas.The value of such a label is the address assigned to the statement on
which it appears.Most assemblers provide an assembler directive that allows the programmer to
define symbols and specify their values.

EQU

EQU (for “equate”) is the assembler directive generally used to define symbols and specify their
values. General form:

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 24


CST 305 System Software

symbol EQU value

This statement defines the given symbol (ie., it enters into SYMTAB) and assigns to it the value
specified.The value may be given as a constant or as any expression involving constants and
previously defined symbols.One common use of EQU is to establish symbolic names that can
be used for improved readability in place of numeric values. For eg: on line 133 of the program
2.5 (as per the text)

+ LDT # 4096

This statement loads the value 4096 into register T. This value represents the maximum length
record we could read with subroutine RDREC.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 25


CST 305 System Software

If we include a statement

MAXLEN EQU 4096

in the program, then you can include the line 133 as in program 2.9 (as per text book)

+ LDT # MAXLEN

When the assembler encounters the EQU statement, it enters MAXLEN into SYMTAB (with
value 4096). During assembly of LDT instruction, the assembler searches SYMTAB for the
symbol MAXLEN, using its value as operand in the instruction.Another common use of EQU
is defining mnemonic names for registers.Assembler recognizes standard mnemonics for
registers – A, X, L, etc.Assembler expects register numbers instead of names in an instruction
like RMO.This instructs the programmer to write RMO 0,1 instead of RMO A, X

In such case the programmer could include a sequence of EQU statements like:

These statements cause the symbols A, X, L... To be entered into SYMTAB with their
corresponding values 0,1,2 ...The programmer can establish and use names that reflect the logical
function of the registers in the program.

ORG

ORG (for “origin”) is an assembler directive used to indirectly assign values to symbols.It is of
the form:

ORG value

Where value is a constant or an expression involving constants and previously defined


symbols.When this statement is encountered during the assembly of a program, the assembler
resets its location counter (LOCCTR) to the specified value.The values of symbols used as labels
are taken from LOCCTR, the ORG statement will affect the values of all labels defined until the
next ORG.We know that location counter is used to control assignment of storage in the object

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 26


CST 305 System Software

program. In most cases, if we alter the location counter value it will result in incorrect assembly.
However, the ORG is useful in label definition. Suppose that we are defining a symbol table with
the following structure:

• In this table,
– the SYMBOL field contains a 6-byte user-defined symbol
– VALUE is a one-word representation of the value assigned to the symbol
– FLAGS is a 2-byte field that specifies symbol type and other information

We want to refer to entries in the table using indexed addressing (placing in the index register the
offset of the desired entry from beginning of the table). We want to refer the fields SYMBOL,
VALUE, and FLAGS individually, so we define these labels.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 27


CST 305 System Software

LDA instruction fetches the VALUE field from the table entry indicated by the contents of
register X. Same Symbol definition using ORG directive.

The first ORG statement resets the location counter value to the value of STAB (ie. The
beginning address of the table.)

The label on the following RESB statement defines SYMBOL to have the current value in the
LOCCTR, this is the same address assigned to SYMTAB. LOCCTR is then advanced so the
label on the RESW statement assigns to VALUE the address (STAB +6) and so on.The result is
a set of labels with the same values as those defined with EQU statements above.So, using ORG
the definition becomes clear that each entry in STAB consists of a 6-byte SYMBOL, followed by
a one-word VALUE, followed by a 2-byte FLAGS.Last ORG statement is very important
because:

• it sets the LOCCTR back to its previous value – the address of the next unassigned byte
of memory after the table STAB.

• any labels on subsequent statements, which do not represent the part of STAB, are
assigned proper addresses

In some assemblers, the previous value of LOCCTR is automatically remembered, so simply can
write

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 28


CST 305 System Software

ORG

With no value specified to return to the normal use of LOCCTR.

Restrictions for EQU and ORG


There are restrictions that are common to all symbol defining assembler directives. In case of
EQU, all symbols used on right hand side of the statement – ie., all terms used to specify the
value of the new symbol – must have been defined previously in the program.

The reason for this is the symbol definition process. Here in the second example, BETA cannot
be assigned a value when it is encountered during Pass1 of the assembly, because ALPHA does
not yet have a value. However, two-pass assembler design requires that all symbols be defined
during Pass 1. In case of ORG, all symbols used to specify the new location counter value must
have been previously defined. Consider the sequence:

The sequence cannot be processed. In this case, the assembler would not know (during
Pass1)what value to assign to the location counter in response to the first ORG statement. As a
result, the symbols BYTE 1, BYTE 2, and BYTE 3 could not be assigned addresses during Pass
1.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 29


CST 305 System Software

This restriction is a result of particular way in which we defined the two passes of our
assembler.The forward reference problem cannot be resolved by an ordinary two pass assembler
regardless of how the work is divided between the passes. Consider the sequence:

It cannot be resolved by an ordinary two-pass assembler.We need complex assembler structures.

Expressions

The assembly language statements have used single terms (labels, literals, etc.) as instruction
operands. Most assemblers allow the use of expressions as instruction operands.Each such
expression must be evaluated by the assembler to produce a single operand address or value.

• Assemblers generally allow arithmetic expressions formed using the operators +, -, *, /


• Division is usually defined to produce an integer result.
• Individual terms in the expression may be:
• Constants
• User-defined symbols
• Special terms

• Most commonly the special term is the current value of the location counter (*). This *
represents the value of the next unassigned memory location.

This statement gives BUFFEND a value that is the address of the next after the buffer area.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 30


CST 305 System Software

• We discussed the problem of program relocation:

– Some values in the object program are relative to the beginning of the program,Some
are absolute (independent of program location).

Similarly, the values of the terms and expressions are either relative or absolute.Constant is an
absolute term.Labels on the instructions and data areas, references to the location counter value
are relative terms. A symbol whose value is given by EQU (or any other similar directive) may
be either an absolute term or a relative term depending upon the expression used to define its
value.

• Expressions can be:

– Absolute expression

– Relative expression

Depending upon the type of value they produce.

• Absolute expression: Expression contain only absolute terms.

• It may contain relative terms provided that relative terms occur in pairs and the terms in
such pair have opposite signs

• Relative Expression: All of the relative terms except one can be paired and the
remaining unpaired relative terms must have a positive sign. No relative terms can enter
into a multiplication or division operation no matter in absolute or relative expression.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 31


CST 305 System Software

Assemblers should determine the type of an expression. Keep track of the types of all symbols
defined in the program in the symbol table. Generate Modification records in the object program
for relative values.We need a “flag” in the SYMTAB for indication.

Program Blocks

We have seen so far the program being assembled was treated as a unit. Even the source program
logically contains subroutines, data areas etc.The assembler handles it as one entity, resulting in
a single block of object code.Within this object program, the generated machine instructions and
data appeared in the same order as they were written in the source program. Many assemblers
provide features that allow more flexible handling of the source and object programs:

• Allow the generated machine instructions and data to appear in the object program in a
different order from the corresponding source statements

• Allowed the creation of several independent parts of the object program. These parts
maintain their identity and are handled separately by the loader

Program Block: segments of code that are rearranged within a single object program unit

Control Section: segments that are translated into independent object program units

Consider the program in the figure which is written using program blocks.

In this case there are three blocks:

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 32


CST 305 System Software

1. (unnamed) program block contains the executable instructions of the program

2. (named CDATA) contains all data areas that are a few words or less in length

3. (named CBLKS) contains all data areas that consist of larger blocks of memory

The assembler directive USE indicates which portions of source program belong to various
blocks.

At the beginning, statements are assumed to be part of the unnamed (default) block. If no USE
statements are included, the entire program belongs to this single block. The USE statement in
line 92 signals the beginning of the block named CDATA. Source statements are associated with

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 33


CST 305 System Software

this block until the USE statement on line 103, which begins the block named CBLKS. The USE
statement may also indicate the continuation of a previously begun block.Thus the line 123
resumes the default block, and the statement on line 183 resumes the block named CDATA.

Each program block may actually contain several separate segments of the source program.
Assembler rearrange these segments to gather together the pieces of each block. These blocks
will then be assigned addresses in the object program, with the blocks appearing in the same
order in which they were first begun in the source program.

How assembler handles Program Blocks?

During Pass 1:

• Assembler accomplishes the logical rearrangement of code by maintaining a separate


location counter for each program block.
• When the block begins first, the location counter for a bloc is initialized to 0
• When switching to another block, the current value of the location counter is saved and when
resuming a previous block, the saved value is restored
• Each label in the program is assigned an address that is relative to the start of the block that
contains it.
• When labels are entered into the symbol table, the block name or number is stored along with
the assigned relative address.
• At the end of Pass1 the latest value of location counter for each block indicates the length of
that block
• The assembler can then assign to each block a starting address in the object program
(beginning with relative location 0)

During Pass 2:

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 34


CST 305 System Software

• For code generation, the assembler needs the address for each symbol relative to the start of
the object program (not the start of an individual program block) which is easily found from
the information in SYMTAB.
• The assembler simply adds the location of the symbol, relative to the start of its block, to the
assigned block starting address

Consider the program: The column headed Loc/Block shows the relative address within a
program block assigned to each source line and a block number indicating which program block
is involved. This information gets stored in the SYMTAB for each symbol. The value of
MAXLEN in line 107, shown without a block number indicates that it is an absolute symbol,
whose value is not relative to the start of the program. At the end of Pass1 assembler constructs a
table that contains the starting addresses and lengths for all blocks.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 35


CST 305 System Software

Separation of programs into blocks has considerably reduced the addressing problem. Because
the large buffer area is moved to the end of the object program. No longer needed to use
extended format instruction on line number 15,35,65. The base register is no longer necessary.
The problem of placing literal is also solved: by placing LTORG statement in CDATA block to
be sure that literals are placed ahead of any large data area. It is not necessary to physically
rearrange the generated code in the object program to place the pieces of each program block
together. The assembler simply writes the object code as it is generated during Pass 2and insert
the proper loader address in each Text record.These load addresses will reflect the starting
address of the block as well as the relative location of the code within the block. For example: in
figure: The first two Text records are generated from line 5~70. When the USE statement is
recognized. Assembler writes out the current Text record, even if there still room left in it. Begin
a new Text record for the new program block.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 36


CST 305 System Software

Line 95~105 no generated code, so no text records are created. Next Text record lines 125~180
→ statements that belong to the next program block generate the object code. Fifth text record
contains single byte of data from line 185. Sixth record resumes the default program block. Does
not matter that the text records of the object program are not in sequence by address. Loader will
load the object code from each record at the indicated addresses.

Control Sections and Program Linking

A control section is a part of the program that maintains its identity after the assembly. It can be
loaded and relocated independently of the other control sections. Different control sections are
most often used for subroutines or other logical subdivisions of a program. The programmer can
assemble, load, and manipulate each of these control sections separately. The resulting flexibility
is the major benefit of using control sections. When control sections form logically related parts
of the program, there should be some means for linking control sections together. For example,
the instructions in one control section might need to refer to instructions or data located in
another section. The control sections are independently loaded and located, so the assembler is
unable to process these reference in a usual way. The assembler has no idea where any other
control section will be located at execution time. The references between the control sections are
called external references. The assembler generates information for each external references that
will allow the loader to perform the required linking. Consider an example program in the next

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 37


CST 305 System Software

slide that might be written using multiple control sections. In this case there are three control
sections:

• One for the main program

• Second for the subroutine RDREC

• Third for the subroutine WRREC

The start statement identifies the beginning of the assembly and gives a name COPY the first
control section. The first section continues until the CSECT statement on line 109. CSECT is an
assembler directive signals the start of a new control section named RDREC.

Similarly, the CSECT statement on the line 193 begins the control section named WRREC. The
assembler establishes a separate location counter for each control section as done for program
blocks.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 38


CST 305 System Software

Assembler handles program blocs and control sections in a different way:

Difference:

It is not necessary for all control sections in a program to be assembled at the same time.
Symbols that are defined in one control section may not be used directly by another control
section. They must be identified as external references for the loader to handle. There are two
assembler directives to identify such references:

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 39


CST 305 System Software

Control section names in this case COPY, RDREC,WRREC do not need to be named in an
EXTDEF statement because they are automatically considered to be external symbols. The order
in which symbols are listed in EXTDEF and EXTREF statements is not significant. Control
section names in this case COPY, RDREC,WRREC do not need to be named in an EXTDEF
statement because they are automatically considered to be external symbols. The order in which
symbols are listed in EXTDEF and EXTREF statements is not significant.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 40


CST 305 System Software

How external references handled by the assembler?

CASE 1: Consider the instruction:

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 41


CST 305 System Software

The operand RDREC is named EXTREF statement for the control section, so this is an external
reference. The assembler has no idea where the control section containing RDREC will be
loaded, so it cannot assemble the address for this instruction. Instead the assembler inserts an
address of zero and passes information to the loader, which will cause the proper address to be
inserted at the load time. The address of RDREC will have no predictable relationship to
anything in this control section; therefore, relative addressing is not possible. Thus an extended
format instruction must be used to provide room for the actual address to be inserted. This is true
of any instruction whose operand involves an external reference.

CASE 2: Consider the instruction:

Here the value of data word to be generated is specified by an expression involving two external
references: BUFFEND and BUFFER. The assembler stores this value as zero. When the program
is loaded, the loader will add to this data area the address of BUFFEND and subtract from it the
address of BUFFER, which results in the desired value. Note the difference between handling of
expression on line 190 and the similar expression on line 107. The symbols BUFEND and
BUFFER are defined in the same control section with EQU statement on line 107. Thus the
value of the expression can be calculated immediately, by the assembler. This could not be done
for line 190, BUFFEND and BUFFER are defined in another control section, so their values are
unknown at assembly time.

CASE 3: Consider the instruction:

This makes an external reference to BUFFER. The instruction is assembled using extended
format with an address of zero. The x bit is set to 1 to indicate indexed addressing as specified by
the instruction. The assembler must remember via entries in SYMTAB, that in which control
section a symbol is defined. Any attempt to refer to a symbol is identified using EXTREF as an
external reference. The assembler must allow the same symbol to be used in different control
sections. There is a conflicting definitions of MAXLEN on line 107 and 190 should cause no

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 42


CST 305 System Software

problem. A reference to MAXLEN in the control section COPY would use the definition on line
107, whereas a reference to MAXLEN in RDREC would use the definition on line 190. It is
clear that assembler leaves room in the object code for the values of external symbols. The
assembler must also include information in the object program that will cause the loader to insert
the proper values where they are required. For that two new record types in the object program:

• Define Record

• Refer Record

Define Record: gives information about external symbols that are defined in this control section
ie, the symbols named by EXTDEF.

Refer Record: lists symbols that are used as external references by the control section-ie,
symbols named EXTREF

The other information needed for program linking is added to the Modification record type.

Modification record (revised)

• Col. 1 M
• Col. 2-7 Starting address of the field to be modified, relative to the beginning of the
control section (hex)
• Col. 8-9 Length of the field to be modified, in half-bytes (hex)
• Col. 10 Modification flag (+ or - )
• Col.11-16 External symbol whose value is to be added to or subtracted from the

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 43


CST 305 System Software

Indicated field.

For modification record the first three items are the same as we studied earlier. The two new
items specify the modification to be performed:

• Adding or subtracting the values of some external symbol

• The symbol used for modification may be defined either in thus control section or in
another one

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 44


CST 305 System Software

The figure shows the object program corresponding to the source program written using control
sections. Note that there is separate set of object program records (from Header through End) for
each control section. The records for each control section are exactly the same as they would be
if the sections are assembled separately. The Define and Refer records for each control section
include the symbols named in EXTDEF and EXTREF statements. In case of Define, the record
also indicates the relative address for each external symbol within the control section. For
EXTREF symbols, no address information is available. These symbols are simply named in the
Refer record.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 45


CST 305 System Software

Modification record :

• M00000405+RDREC

• M00000705+COPY

• M00001405+COPY

• M00002705+COPY

The existence of multiple control sections that can be relocated independently of one another
makes the handling expressions slightly more complicated. If the two terms represent relative
locations in the same control section, their difference is an absolute value regardless of whether
the control section is loaded. On the other hand, if they are in different control sections, their
difference has a value that is unpredictable.

Assembler Design Options

• One – pass Assembler

• Multi - pass Assembler

One – Pass Assembler

It is used when it is necessary or desirable to avoid a second pass over the source program. But
the main problem in trying to assemble a program in one pass involves forward reference.
Operand parts of the instructions are often symbols that have not yet been defined in the source
program.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 46


CST 305 System Software

• There are two main types of one pass assembler

1. Produces object code directly in memory for immediate execution

2. Produces the object program for later execution

Consider the program:

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 47


CST 305 System Software

Load- and- go Assembler

One pass assembler that generate their object code in memory for execution. No object program
is written out, and no loader is needed. This kind of load and-go assemblers are useful in a
system.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 48


CST 305 System Software

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 49


CST 305 System Software

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 50


CST 305 System Software

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 51


CST 305 System Software

Multi-Pass Assembler

In symbol defining statements – using EQU and ORG directive, the symbol or the expression
giving the new value which used on the right hand side should be defined previously in the
source program

As a result, ALPHA cannot be evaluated during the second pass. This means that any assembler
that makes only two sequential passes over the source program cannot resolve such a sequence
of definitions. The general solution is a multi-pass assembler that can make as many passes as
are needed to process the definitions of symbol. It is not necessary for such an assembler to make
more than two passes over the entire program. Instead the portions of the program that involve

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 52


CST 305 System Software

forward references in symbol definition are saved during pass 1. Additional passes through these
stored definitions are made as the assembly progresses. This process is normally followed by a
2-pass assembler. These tasks are accomplished in several ways: Method involves storing of
symbol definitions that involve forward references in the symbol table. This table also indicates
which symbols are dependent on the values of others, to facilitate symbol evaluations

Consider the sequence of symbol defining statements that involve forward references:

Symbol Table entries resulting from Pass 1 processing of the statement

MAXLEN has not yet been defined, so no value for HALFSZ can be computed. The
defining expression for HALFSZ is stored in the symbol table in place of its value. The entry
&1 indicates that one symbol in the defining expression is undefined. In actual
implementation, this definition might be stored at some other location. SYMTAB would then
simply contain a pointer to the defining expression. The symbol MAXLEN is also entered in
the symbol table, with the flag * identifying it is undefined. Associated with this entry is a

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 53


CST 305 System Software

list of the symbols whose values depend on MAXLEN (here HALZSZ)→ similar to one-pass
assembler we have seen.

In this case there are two undefined symbols: BUFFEND and BUFFER. Both these are entered
into SYMTAB with lists indicating the dependence of MAXLEN upon them.

Similarly, the definitions of PREVBT cause this symbol to be added to the list of dependencies
on BUFFER.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 54


CST 305 System Software

So far we are simply saving symbol definitions for later processing. Line 4 the definition of
BUFFER begins the evaluation. Let us assume that when line 4 is read, the location counter
contains the hexadecimal value 1034. This address is stored as the value of BUFFER. The
assembler the examines the lists of symbols that are dependent on BUFFER. The symbol table
entry for the first symbol in this list (MAXLEN ) shows that it depends on two currently
undefined symbols; so MAXLEN can be calculated immediately. Then &2 is changed to &1 to
show that only one symbol in the definition (BUFFEND) remains undefined. The symbol
PREVBT can be calculated and stored in SYMTAB.

When BUFFEND is defined in line 5, its value is entered into the symbol table. The list
associated with BUFFEND then directs the assembler to evaluate MAXLEN, and entering a
value for MAXLEN causes the evaluation of the symbol in its list (HALFSZ). This completes

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 55


CST 305 System Software

the symbol definition process. If any symbols remained undefined at the end of the program, the
assembler would flag them as errors.

Traditional CISC Machines

• CISC: Complex Instruction Set Computers

• Have :

– Large and complicated instruction set

– Different instruction formats and lengths

– Many different addressing modes

Pentium Pro Architecture

Introduced near 1995. Family of Intel x86

Memory

Described in two ways: At physical level memory consists of 8 bit bytes. All addresses used are
byte addresses. Two consecutive bytes form a word. Four bytes form a double word (also called
a dword). At programmers level the memory of x86 viewed as a collection of segments. So the
address consists of two parts – a segment number and an offset that points to a byte within the
segment. Segments can be of different sizes and are used for different purposes. Some segments
contains executable instructions and other segments may be used to store data. Some data
segments may be treated as stacks that can be used to save register contents, pass parameters to
subroutines and for other purposes. It is not necessary for all of the segments used by the
program to be in physical memory. In some cases a segment can also be divided into pages.
Some pages of the segment may be in physical memory, while others may be stored on disk.
When an x86 instruction is executed, the hardware and the operating system make sure that the
needed byte of the segment is loaded into physical memory. The segment/offset address specified
by the programmer is automatically translated into a physical byte address by the x86 Memory
Management Unit(MMU).

Registers

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 56


CST 305 System Software

There are 8 general-purpose registers named as:

EAX, EBX, ECX, EDX, ESI,EDI, EBP and ESP

Each general –purpose registers are 32 bits long (ie. One double word). Registers
EAX,EBX,ECX,EDX are generally used for data manipulation. It is possible to access individual
works from these registers. These four registers commonly used to hold addresses

• Special purpose registers:


o EIP – a 32-bit register that contains a pointer to next instruction to be executed
o FLAGS – a 32-bit register that contains many different bit flags(indicate the status of
the processor, others to record results of comparisons and arithmetic operations)
o Segment registers: 16 –bit registers that are used to locate segments in the memory

• FPU : Floating Point Unit

– Contain eight 80 bit data registers and several other control and status registers

Data Formats

Integers: are stored as 8-, 16-, 32-bit Binary numbers. Both signed and unsigned numbers
(ordinals) are supported. 2’s complement representation for negative numbers. Can also be
stored in Binary Coded Format (BCD).

• Packed: each byte represents two decimal digits, with each digit encoded (in binary) using 4
bits of byte

• Unpacked: each byte represents one decimal digit. The value of this digit is encoded(in
binary) in the low-order 4 bits of the byte; the higher order are normally zero

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 57


CST 305 System Software

Characters are stored one per byte, and represented using 8-bit ASCII codes.

Strings may consist of bits, bytes,words or doublewords; there are also special instructions to
handle each type of string. FPU can handle 64-bit signed integers.

Floating points are represented using three different formats namely,

– single-precision format (32 bit long→24 bit floating point value,7 bit exponent, 1 bit
for storing the sign)

– double-precision format (64 bit long →53 significant bit, 10 bit exponent)

– extended-precision format(80 bit long →64 significant bits, 15 bit exponent)

Instruction Formats

The format begins with an optional prefixes containing flags that modify the operation of the
instruction

• For example, some prefixes specify a repetition count for an instruction

• Others specify a segment register that is to be used for addressing an operand (overriding
the normal default assumptions made by the hardware)

Following the prefixes(if any) is an opcode (1 or 2 bytes). Some operations have different
opcodes, each specifying a different variant. Following opcode are the number of bytes that
specify the operands and addressing modes used. The opcode is the only element that is always
present in every instruction. Other elements may or may not be present, and may be of different
length depending on the operations and operand involved. Thus there are large number of
potential instruction formats varying in length from 1 byte to 10 bytes or more.

Addressing modes

• Provides a large number of addressing modes

• Immediate mode: Operand value may be specified as part of the instruction itself

• Register mode: Operand value may be in register

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 58


CST 305 System Software

• Operands stored in memory are often specified using variations of the general target address
calculation

• TA = (base register)+(index register)*(scale factor)+ displacement

• Any general purpose register may be used as a base register. Any general purpose register
except ESP can be used as an index register.The scale factor may have the value 1,2,4,or .
The displacement may be as 8,16,or 32 bit value

• The base and index register numbers, scale and displacement are encoded as parts of the
operand specifiers in the instruction.

• Various combinations of these items may be omitted, resulting in eight different addressing
modes

• Direct mode : The address of an operand in memory may also be specified as an absolute
location

• Relative mode: The address of an operand in memory may also be specified as a location
relative to the EIP register

Instruction Set

• There are more than 400 different instructions. An instruction may have zero, one, two or
three operands

• There are Register-to-register instructions, register-to-memory instructions, and a few


memory-to-memory instructions

• In some cases, operands may also be specified in the instruction as immediate value

• Most data movement and integer arithmetic instructions can use operands that are 1,2 or 4
bytes long

• String manipulation instructions, which uses repetition prefixes, can deal directly with
variable-length strings of bytes, words or double words

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 59


CST 305 System Software

• There are many instructions that perform logical and bit manipulations, and support control
of the processor and memory management systems

• The x86 architecture also include special purpose instructions to perform operations
frequently required in high-level programming languages.For example, entering and leaving
procedures and checking subscript values against the bounds of an array

Input and Output

• Input is performed by instructions that transfer one byte, word or double word at a time
from an I/O port into register EAX

• Output instructions transfer one byte, word or double word from EAX to an I/O port

• Repetition prefixes allow these instructions to transfer an entire string in a single


operation

Implementation-MASM Assembler

The programmer of an x86 views memory as a collection of segments. MASM assembler


language program is written as a collection of segments. Each segment belongs to a particular
class, corresponding to its contents. Commonly used classes are CODE, DATA, CONST and
STACK. During program execution, segments are addressed via the x86 segment registers. In
most cases the code segments are addressed using register CS. Stack segments are addressed
using the register SS. These segment registers are automatically set by the system loader when a
program is loaded for execution. Register is set to indicate the segment that contains the starting
label specified in the END statement of the program. Register SS is set to indicate the last stack
segment processed by the loader. Data segments (including constants segment) are normally
addressed using DS,ES,FS or GS. The segment register to be used can be explicitly specified by
the programmer (by writing it as a part of the assembler language program). If the programmer
does no specify a segment register, one is selected by the assembler. By default, the assembler
assumes that all the references to data segments use register DS. This assumption can be changed
by the assembler directive ASSUME

ASSUME

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 60


CST 305 System Software

ASSUME ES: DATASEG2

Tell the assembler that register ES indicate the segment DATASEG2. Thus, any reference to
labels are defined in DATASEG2 will be assembled using register ES. It is possible to collect
several segments into a group and use ASSUME to associate a segment register with the group.
Registers DS,ES,FS and GS must be loaded by the program before they can be used to address
data segments.

Eg: MOV AX, DATASEG2

MOV ES, AX

would set ES to indicate the data segment DATASEG2. Similar to BASE directive in SIC/XE.
BASE tell a SIC/XE assembler the contents of register B; programmer must provide the
executable instructions to load this value into the register. Similarly, ASSUME tells MASM the
contents of a segment register; the programmer must provide instructions to load this register
when the program is executed. Jump instructions are assembled in two ways depending on
whether the target of the jump is in the same code segment as the jump instruction

• Near jump: jump to a target address in the same code segment as the jump instruction

• Far jump: : jump to a target address in the different code segment

Near jump is assembled using the current code segment register CS. The assembled machine
instruction for a near jump occupies 2 or 3 bytes. A far jump is assembled using a different
segment register which is specified in the instruction prefix. The assembled machine instruction
for a far jump occupies 5 bytes. Forward references to the labels in the source program can cause
problem. For example: consider a jump instruction

JMP TARGET

If the definition of the label TARGET occurs in the program before the JMP instruction, the
assembler can tell whether this is a far jump or near jump. If forward reference to TARGET, the
assembler does not know how many bytes to reserve for the instruction. By default, the MASM
assumes that a forward jump is a near jump. If the target is in another code segment, the
programmer must warn the assembler by writing.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 61


CST 305 System Software

JMP FAR PTR TARGET

Programmer can specify the near jump by

JMP SHORT TARGET

If the programmer does not specify FAR PTR a problem occurs: During Pass 1, the assembler
reserves 3 bytes for the jump instruction. But the actual assembled instruction requires 5 bytes.
Earlier version of MASM causes a phase error. Later version, the assembler can repeat pass1 to
generate the correct location counter values. Far jump is similar to forward references in SIC/XE
that require the use of extended format instructions. Other situations in which the length of an
assembled instruction depends on the operands that are used. Eg: For ADD instruction, the
operand may be registers, memory or immediate operands. Immediate operands may occupy
from 1 to 4 bytes in the instruction. An operand that specifies a memory location may tae varying
amounts of space in the instruction, depending upon the location of the operand. Other situations
in which the length of an assembled instruction depends on the operands that are used. Pass1 of
x86 assembler is more complex than SIC. Segments in a MASM source program can be written
in more than one part. If a SEGMENT directive specifies the same name as a previously defined
segment, it is considered to be continuation of that segment.

All the parts of a segment are gathered together by the assembly process. References between
segments that are assembled together are automatically handled by the assembler. External
references between separately assembled modules must be handled by the linker. PUBLIC is
used in MASM instead for EXTDEF in SIC/XE. EXTRN is used in MASM instead for EXTREF
in SIC/XE. The object program from MASM assembler may be in several different format.
MASM can produce instruction timing listing that shows the number of clock cycles required to
execute each machine instruction. This allows the programmer to exercise a great deal of control
in optimizing time-critical sections of code.

Ms. Anna N Kurian, Asst. Prof., Dept. of CSE, SJCET, Palai 62

You might also like