CSA Complete
CSA Complete
CSA Complete
2/13/2012
Instruction Code
The organization of a computer is defined by its internal registers, the timing and control structure, and the set of instructions that it uses. The internal organization of a digital system is defined by the sequence of micro-operations it performs on data stored in its registers. The general purpose digital computer is capable of executing various micro-operations and it can be instructed as to what specific instructions it must perform.
2/13/2012 Er. KAPIL PRASHAR 2
Continued
y The user of a computer can control the process by means of a program. y A program is a set of instructions that specify the operations, operands, and the sequence by which processing has to occur. y An instruction is a binary code that specifies a sequence of micro-operations. y Instructions and data are stored in memory. y The ability of store and execute instructions, the stored program concept (von Neumann architecture), is the most important property of a general-purpose computer.
2/13/2012 Er. KAPIL PRASHAR 3
Continued
An instruction code is a group of bits that instruct the computer to perform a specific operation (set of microoperations). Operation code is a basic part of instruction code; a group of bits that define such operations as add, subtract, multiply, shift, and complement. The operation code must consist of at least n bits for a given 2n (or less) distinct operations. Control unit receives the instruction from memory and interprets the operation code bits. It then issues a sequence of control signals to initiate MOs in internal registers. Er. KAPIL PRASHAR 2/13/2012 4
Continued
For every operation code, the control issues a sequence of MOs, needed for h/w implementation ofv that operation. An operation code is called macro-operation because it specifies a set of micro-operations. An instruction code specify also the registers or the memory words for operands and results
Memory words can be specified by their address Registers can be specified by a binary code of k bits specifying one of 2k possible registers.
Continued
A simple computer organization
One register An instruction code format with two parts
Operation code An address: tells the control where to find an operand from memory. The data processed with data in register.
Fig. next Control reads 16-bit instruction from program memory. It uses the 12-bit address part of instruction to read 16-bit operand from data memory. It then executes the operation specified by the operation code. If an operation does not need an operand from memory, the address bits can be used for other purposes, e.g. clear AC, complement AC ( no need of address). 2/13/2012 Er. KAPIL PRASHAR 6
2/13/2012
Continued
When the second part of an instruction code specifies an operand(not address), the instruction is said to have an immediate operand. When the second part specifies an address of an operand, the instruction is said to have a direct address. Indirect address: the second part specifies a memory location where the address of the operand is found. Indirect address increases addressable memory size => more bits for specifying addresses of operands.
2/13/2012 Er. KAPIL PRASHAR 8
2/13/2012
Computer Registers
Instructions are stored in consecutive memory locations and are executed sequentially one at a time. The control reads an instruction from a specific address in memory and executes it: after that next instruction is read and executed, and so on. Registers are needed for storing fetched instructions, and counters for computing the address of the next instruction.
2/13/2012 Er. KAPIL PRASHAR 10
Continued
Computer needs processor registers for data manipulation and holding addresses (see. Next Fig. and Table). Program counter (PC) goes through a counting sequence and causes the computer to read sequential instructions from memory. Instructions are read and executed in sequence unless a branch instruction is encountered
Calls for a transfer to a nonconsecutive instruction in the program The address part of a branch instruction becomes the address of the next instruction in PC Next instruction is read from the location indicated by PC
2/13/2012
11
2/13/2012
12
Continued
The basic computer has (see Fig. In previous slide):
8 registers 1 memory unit 1 control unit Common bus
The outputs of 7 registers and memory are connected to the common bus. Connections to bus lines are specified by selection lines S0, S1, and S2. A register load during the next clock pulse transition is selected with a LD (load) input. Memory write/read is enabled with write/read signals.
2/13/2012 Er. KAPIL PRASHAR 13
Continued
INPR receives a character from an input device. OUTR receives a character from AC and delivers it to an output device. Bus receives data from 6 registers and the memory unit. 5 registers have three control lines: LD (load), INR (increment), and CLR (clear): equivalent to a binary counter with parallel load and synchronous clear. 2 registers have only a LD input. AR is used to specify memory address: no need for an address bus. 16 inputs to AC come from an adder and logic circuit with three sets of inputs: AC output, DR, INPR.
2/13/2012
15
Continued
y Content of any register can be applied onto the bus, and an operation can be performed in the adder and logic circuit during the same clock cycle. The clock transition at the end of the cycle transfers the content of the bus into the designated register and the output of the adder and logic circuit into AC, e.g.: DR AC and AC DR AC to the bus (S2S1S0 = 100), enabling the LD of DR, transferring DR into AC (through adder and logic unit), and enabling LD of AC, all during the same clock cycle. The two transfers occur upon the arrival of the clock pulse transition at the end of the clock cycle.
2/13/2012
16
Computer Instructions
y
y y y y The basic computer has three 16-bit instruction code formats (see. Next Fig.). Opcode contains 3 bits and the meaning of the remaining 13 bits depends on the operation code encountered. A memory-reference instruction uses 12 bits to specify address and one bit to specify addressing mode I. The register-reference instructions are recognized by opcode 111 with 0 in bit 15. A Register-register instruction specifies an operation on or test of the AC register. An operand is not needed: 12 bits are used for specifying the operation or test to be executed.
2/13/2012
17
Continued
y Input-output instruction is recognized by the opcode 111 with 1 in bit 15. Remaining 12 bits are used to specify the type of input-output operation or test performed. y Bits 12-15 are used to recognize the type of instruction. y If Bits 12-14 are not 111 the instruction is a memory reference type: I (bit 15) is taken as the addressing mode. y If bits 12-14 are 111, bit 15 is inspected for the type of instruction: 0 for register-reference and 1 for inputoutput instruction.
2/13/2012
18
2/13/2012
19
2/13/2012
20
Continued
Arithmetic, logical, and shift instructions provide computational capabilities for processing data. All computation are done in processor registers: instructions for moving information between memory and registers are needed. Status checking (e.g. comparing magnitudes of two numbers) and program control instructions (e.g.branch) for altering the program flow. Input and output instructions for human-computer interaction: programs must be transferred into memory and the results of computations must be transferred to the user. Instructions in Table 5-2 constitute a minimum set.
2/13/2012
22
Continued
y Addition and subtraction: ADD, CMA, INC. y Shifts: CIR, CIL y Multiplication and division: addition, subtraction, and shift. y Logic: AND, CMA, CLA => NAND => all logic operations with two variables. y Moving information: LDA, STA. y Branching and status checking: BUN, BSA, ISZ, and skip operations. y Input-output: INP, OUT
2/13/2012
23
Continued
The instruction set of the basic computer is complete, but not efficient. An efficient set of instructions includes separate instructions for frequently used operations in order to perform them fast. Examples: OR, exclusive-OR, subtract, multiply, divide. These operation must be programmed in the basic computer .
2/13/2012
24
2/13/2012
25
Continued
There are two major types of control organization: hardwire control and microprogrammed control
Hardwire organization (see next Fig.): the control logic is implemented with gates, flip-flops, decoders, and other digital circuits
Can be optimized to produce fast mode of operation Requires changes in the wiring if the design has to be modified
A microprogram is a program consisting of microcode that controls the different parts of a computer's central processing unit (CPU). The memory in which it resides is called a control store.
2/13/2012
26
Continued
Block diagram of the (hardwire) control unit is shown in next Figure. (control logic derived later)
IR contains an instruction read from memory three parts: I-bit, opcode, bits 0-11
I is transferred to a flip-flop 4-bit sequence counter (SC) provide the sequence of 16 timing signals synchronous clear and increment
When required, SC can be cleared (CLR signal enabled) by a suitable control logic, e.g. (see Fig.):
D3T4: SC 0 Control outputs are a function of all incoming signals to the control logic gates. SC enables sequential control outputs.
2/13/2012
27
2/13/2012
28
2/13/2012
29
Continued
Memory read/write are initiated by a rising clock edge. It is assumed that memory access is completed in one clock cycle
assumption is often not valid in real computers because the memory cycle is usually longer that the clock cycle => wait cycles (states) must be provided until the memory word is available. No wait cycles in basic computer introduced here.
Next rising edge will load the memory word into a register.
2/13/2012
30
Continued
It is important to understand the timing relationship between clock transition and the timing signals. For example, the register transfer statement: T0: AR PC specifies a transfer of the content PC into AR if the timing signal T0 is active. T0 is active an entire clock cycle. During this time interval the content of PC is placed onto the bus and LD input of AR is enabled. The actual transfer occurs at the end of the clock cycle when the clock goes through a positive transition (latches inputs to flip-flops). This same transition increments SC: the next clock cycle has T1 active and T0 inactive.
2/13/2012
31
Instruction Cycle
A program consists of a sequence of instruction, and it resides in the memory.
After phase 4, the control jumps back to phase 1. This process continues until HALT instruction is encountered.
Er. KAPIL PRASHAR 32
2/13/2012
Initially program counter PC in loaded with the address of the first instruction in the program. SC is cleared (i.e. timing signal T0 is active). SC is incremented after each clock pulse. Fetch and decode phases can be specified by following register transfer statements:
2/13/2012
33
2/13/2012
34
Continued
during T0:
1. Place the content of PC onto bus (S2S1S0 = 010 => 2) 2. Transfer the content of the bus to AR (enable LD input of AR) The next clock transition initiates transfer from PC to AR
during T1:
1. Enable the read input of memory 2. Place the content of memory onto the bus (S2S1S0 = 111 => 7) 3. Transfer the content of bus to IR (enable LD input of IR) 4. Increment PC (enable INR input of PC) The next clock transition initiates the read and increment operations
2/13/2012
35
Continued
During T2:
1. Opcode is decoded by the 3 x 8 decoder 2. IR(0-11) is transferred to AR (address register) 3. IR(15) is latched to flip-flop I 2 and 3 occur at the end of the clock cycle
2/13/2012
36
2/13/2012
37
2/13/2012
38
Register-Reference Instruction
Recognized by the control when D7 = 1 and I = 0. Uses bits 0-11 of the instruction code to specify one of 12 instructions. The 12 bits are available in IR(0-11) and they were transferred to AR during time T2. See Table for control functions and microoperations for the register-reference instructions.
Each control function share Boolean relation D7I T3 (denoted by r)
The particular control function is indicated by one of the bits in IR(0-11) The execution of a register-reference instruction is completed at time T3:the sequence counter is cleared to 0 and control goes back to fetch the next instruction with timing signal T1.
2/13/2012
39
2/13/2012
40
Memory-Reference Instructions
Table lists the seven memory-reference instructions: the execution of each instruction requires a sequence of microoperations because data is stored in memory and cannot be processed directly. The effective address resides in AR and was placed there during timing signal T2 when I = 0, and T3 when I = 1 (see Fig.).
2/13/2012
41
2/13/2012
42
Continued
AND to AC: pair wise AND to bits in AC and the memory word specified by the effective address
D0T4: DR M[AR] D0T5: AC AC AND DR, SC 0
2/13/2012
43
Continued
ADD to AC: adds the content of the memory word specified by the effective address to the value of AC
D1T4: DR M[AR] D1T5: AC AC + DR, E Cout , SC 0 Output of the operation decoder = 1 Extended Accumulator
2/13/2012
44
Continued
LDA: Load a memory word from a specified effective address to AC
D2T4: DR M[AR] D2T5: AC DR, SC 0 See Fig on slide 26: no direct path from the bus to AC: memory word is first read into DR whose content is then transferred into AC.
2/13/2012
45
Continued
STA: store the content of AC into the memory word specified by the effective address
D3T4: M[AR] AC , SC 0
2/13/2012
46
Continued
BUN: Branch unconditionally transfers the program to the instruction specified by the effective address. The next instruction is fetched and executed from the memory address given by the new value in PC
D4T4: PC AR , SC 0
BSA: Branch and save return address this instruction is useful for branching to a portion of a program called a subroutine or procedure
M[AR] PC, PC AR + 1 address of the next instruction in sequence (return address)
2/13/2012
2/13/2012
48
The BSA instruction performs the function usually referred to as a subroutine call. The indirect BUN instruction at the end of the subroutine performs the function referred as a subroutine return. In most commercial computers, the return address associated with the subroutine is stored in either a processor register of in a portion of memory called a stack.
A stack is a data structure that works on the principle of Last In First Out (LIFO). This means that the last item put on the stack is the first item that can be taken off, like a physical stack of plates. A stack-based computer system is one that is based on the use of stacks, rather than being register based.
2/13/2012 Er. KAPIL PRASHAR 49
BSA instruction
Continued
The BSA instruction must be executed with a sequence of two microoperations:
D5T4: D5T5: M[AR] PC, AR AR + 1 PC AR, SC 0
Timing signal T4 initiates a memory write operation, places the content of PC onto the bus, and enables the INR input of AR.
Memory write operation is completed and AR is incremented by the time the next clock transition occurs. The bus is used at T5 to transfer the content or AR to PC.
2/13/2012
50
ISZ instruction
ISZ: Increment the word specified by the effective address, and if incremented value is equal to 0, PC is incremented by 1 D6T4: DR M[AR] D6T5: DR DR + 1 D6T6: M[AR] DR, if (DR = 0) then (PC PC+1), SC 0 Programmer usually stores a negative number (in 2 s complement) in the memory word. Repeated increments will eventually clear the memory word to 0. At that time PC is incremented by one in order to skip the next instruction in the program => can be used to create loops .
2/13/2012
51
Continued
Flow chart showing microoperations for the seven memory-reference instructions is shown in Fig in the next slide.
2/13/2012
52
2/13/2012
53
I/O Instructions
Input Output devices Transmitter and Receiver Interface INPR + FGI OUTR + FGO AC Monitor Receiver FGO OUTR
Keyboard
Transmitter
INPR FGI
2/13/2012
54
p: INP OUT SKI SKO ION IOF pB11: pB10: pB9: pB8: pB7: pB6:
SC <- O AC(0-7) INPR, FGI O OUTR AC(0-7), FGO O If ( FGI = 1) then PC PC + 1 If ( FGO = 1) then PC PC + 1 IEN 1 IEN O
Clear SC Input Character Output Character Skip on input flag Skip on output flag Interrupt enable on Interrupt enable off
2/13/2012
55
Interrupt Cycle
Instruction Cycle
0 R 1
Interrupt Cycle
2/13/2012
56
Stack Organization
A stack is a storage device for storing information in such a manner that the item stored last is the first item retrieved (LIFO last-in, first-out). The stack is a memory unit with an address register called a stack pointer (SP), which always points at the top item in the stack. The two operations of a stack are the insertion (push) and deletion (pop) of items. Push-operation increments the SP and pull-operation decrements the SP.
2/13/2012
59
Operations on Stack
Push (performed if stack is not full i.e. if FULL = 0):
2/13/2012
61
Operations on Stack
Pop (performed if stack is not empty i.e. if EMTY = 0):
2/13/2012
62
2/13/2012
64
there is no straight forward way to determine the next operation that is performed.
2/13/2012 Er. KAPIL PRASHAR 65
Continued
Arithmetic expressions can be presented in prefix notation (also referred to as Polish notation by Polish mathematician Lukasiewicz): operators are placed before the operands. The postfix notation (reverse Polish notation (RPN) ) places the operator after the operand. E. g .: A + B , infix notation +AB , prefix notation AB+ , postfix notation (RPN)
2/13/2012 Er. KAPIL PRASHAR 66
Continued
The reverse Polish notation suits of stack manipulation. E. g. the expression A* B + C * D is written in RPN as AB* CD*+ and is evaluated by scanning from left to right: when operator is found, the operation is performed by using operands on the left side of the operator. The operator and operands are replaced by the result of operation. The scan is continued and the procedure is repeated for every operator:
2/13/2012
67
Continued
1. * is found 2. Take the two operands from left: A and B 3. compute P = A * B 4. replace operands and operator with the result => PCD*+ 5. continue scan 6. * is found 7. Take the two operands from left: C and D 8. compute Q = C * D 9. replace operands and operator with the result => PQ+ 10. continue scan 11. + is found 12. take the two operands from left: P and Q 13. compute R = P + Q 14. replace operands and operator with the result: R 15. continue scan: no more operators => stop; R is the result of evaluation.
2/13/2012
68
2/13/2012
69
Continued
RPN is the most efficient method known for evaluating arithmetic expressions. Used e. g. in electronic calculators. Stack is useful for evaluating arithmetic expressions in RPN
operands are pushed into the stack in the order of appearance (in RPN) the topmost operands are popped from the stack and used for the operation The result is pushed to replace the popped operands
Most compilers convert all arithmetic expressions into Polish notation: efficient translation of arithmetic expressions into machine language instructions.
2/13/2012 Er. KAPIL PRASHAR 70
Continued
E. g.: (3* 4)+( 5* 6) => 34* 56*+
2/13/2012
71
Instruction Formats
The typical fields found in instruction formats are:
1. Operation code specifying the operation: add, subtract,complement, etc. 2. Address field designating a memory address or a register 3. Mode field for specifying the way for determining the effective address of an operand.
The number of address fields in the instruction format of a computer depends on the internal organization of its registers.
2/13/2012
72
Continued
E.g.: MIPS (a RISC microprocessor architecture developed by MIPS Computer Systems Inc.) Register-Register
Register-immediate
Jump/Call
2/13/2012
73
Three-Address Instruction
Computers may have instructions of several different lengths containing varying number of addresses. E.g. three-address instructions : This instruction format can use each address field to specify either a processor register or a memory word. (A+B)*(C+D): ADD R1, A, B R1 M[A] + M[B] ADD R2, C, D R2 M[C] + M[D] MUL X, R1, R2 M[X] R1 * R2
Advantage of this format is that it results in short programs when evaluating arithmetic expressions. Disadvantage is that the binary-coded instructions require too many bits to specify three addresses.
2/13/2012
74
Two-Address Instruction
E.g. two-address instructions: Here also each address field can specify either a processor register or a memory word. Program for the previous example MOV R1, A R1 M[A] ADD R1, B R1 R1+ M[B] MOV R2, C R2 M[C] ADD R2, D R2 R2+ M[D] MUL R1, R2 R1 R1 * R2 MOV X, R1 M[X] R1 The first symbol listed in an instruction is assumed to be both a source and the destination where the result of the operation is transferred.
2/13/2012
75
One-Address Instruction
One-Address Instruction use an implied accumulator (AC). Here we assume that Ac contains the result of the last operation. LOAD A AC M[A] ADD B AC AC + M[B] STORE T M[T] AC LOAD C AC M[C] ADD D AC AC + M[D] MUL T AC AC * M[T] STORE X M[X] AC
2/13/2012
76
Zero-Address Instruction
A stack-organized computer does not use an address field for the instructions ADD and MUL. The PUSH and POP instructions, however, need an address field to specify the operand that communicates with the stack. The name zero-address is given to this type of instructions because of the absence of an address field in them. PUSH PUSH ADD PUSH PUSH ADD MUL POP A TOS A B TOS B TOS (A + B) C TOS C D TOS D TOS (C + D) TOS (C + D) * (A + B) X M[X] TOS
2/13/2012
77
Addressing Modes
The addressing mode specifies a rule for interpreting or modifying the address field of the instruction before the operand is actually referenced. Addressing modes are used:
1. To provide programming versatility for the user: pointers to memory, counters for loop control, indexing data, etc. 2. To reduce the number of bits in the addressing field of the instruction.
The decoding phase of an instruction cycle determines the addressing mode(s) and the locations (registers and/or memory locations) of operands. Depending on the CPU, an instruction can have more than one address field, and each address field may be associated with its own particular addressing mode.
2/13/2012
78
2/13/2012
80
Continued
Direct/Absolute mode: The operand is in either a RF register or a MM location, whose address is explicitly given in the instruction. I.e., the EA of the operand is given in the instruction.
2/13/2012
81
Continued
Indirect mode: The EA of the operand is in the register, or MM location, whose address is given in the instruction.
2/13/2012
82
Continued
Index mode: The EA of the operand is generated by adding a constant value (given in the instruction) to the content of a register (specified in the instruction). This is used to address elements of an array . The starting address of is the constant and the index is contained in the register. Element can be addressed by this mode with different index .
2/13/2012
83
Continued
Relative mode: Similar to index mode except the register is the PC. This is used to address an operand in an MM location whose address is specified relative to the current instruction. The EA is obtained by adding a constant (offset, or the displacement from current position to the location of the operand, can be negative) and the content of PC. The constant is either explicitly given in the instruction by the assembly programmer, or calculated by the assembler based on the knowledge of the MM locations of the program and the desired operand.
2/13/2012
84
Continued
Base register addressing mode: The content of a base register is added to the address part of the instruction to obtain the effective address. The address part of the instruction gives the displacement relative to the base address. EA = address part of instruction + content of CPU register
2/13/2012
85
Program Control
Name
Program flow can be altered by instructions that modify the value of the program counter: important feature of a digital computer provides a control over the program flow and capability for branching to different program segments. Typical program control instructions:
2/13/2012
TST
86
Continued
Branch and jump instructions may be conditional or unconditional
2/13/2012
87
Continued
Compare and test instructions can be used in setting conditions for subsequent conditional branch instructions
Compare performs an arithmetic subtraction: result is not saved only status bit conditions are set as a result of operation. Similarly test performs logical AND of two operands and updates certain status bits.
2/13/2012 Er. KAPIL PRASHAR 88
Continued
The status register stores the values of the status bits (status register is composed of the status bits). Bits of the status register are modified as a result of an operation performed in the ALU. Status bits also known as CONDITION-CODE bits or FLAG bits
2/13/2012 Er. KAPIL PRASHAR 89
Continued
E.g. (8-bit ALU with a 4-bit status register):
2/13/2012
90
Continued
Status bits can be checked after ALU operation to determine certain relationships that exist between the values of A and B V indicates overflow i.e. for 8-bit ALU the result is greater than 127 or less than -127. If Z is set, the result is zero: we can use e.g. XOR operation to compare to numbers (the result is zero iff A = B) and Z indicates the result of comparison. A single bit in A can be checked with a mask that contains 1 in that particular bit position (others being 0 s) and by using AND operation.
2/13/2012 Er. KAPIL PRASHAR 91
Continued
Conditional branch instructions use the status bits for checking conditions for branching:
2/13/2012
92
Subroutine Call
For subroutine calls, different computers can use a different temporary location for storing the return address some computers use the first memory location of the subroutine (like the Basic Computer ). some store the return address in a fixed memory location. some computers use a processor register. stack memory is yet another possibility (the most efficient way): when a succession of subroutines is called (nested calls), the sequential return addresses can be pushed into the stack. The return from subroutine instruction pops the return address (and assigns to program counter) from the top of the stack: we always have the return address for the last called subroutine.
2/13/2012 Er. KAPIL PRASHAR 93
Continued
Subroutine call (stack based) microoperations:
SP SP 1 M[SP] PC PC effective address Decrement stack pointer Push content of PC onto the stack Transfer control to the subroutine Pop stack and transfer to PC Increment stack pointer
.. and return:
PC M[SP] SP SP + 1
By using subroutine stack each return address (in nested calls) can be pushed into the stack without destroying any previous values
e.g. in basic computer a recursive subroutine call would destroy the previous return address stored in the first memory location of the subroutine.
2/13/2012
94
Program Interrupt
Program interrupt refers to the transfer of program control from a currently running program to another service program as a result of an external or internal generated request
otherwise similar to subroutine call, except:
1. The interrupt is (usually) initiated by an internal of external signal rather than an execution of an instruction (software interrupts are exceptions). 2. The address of the interrupt service program (routine) is determined by hardware rather than the address field of an instruction: the CPU must possess some form of HW procedure for selecting a branch address servicing the interrupt. 3. Interrupt routine stores all the information (not just PC) necessary to recover the state of the CPU prior the return from the interrupt routine.
Er. KAPIL PRASHAR 95
2/13/2012
Continued
After the interrupt routine the CPU must return exactly the same state that is was when the interrupt occurred. The state of the CPU at the end of the execute cycle (the interrupt is recognized in this phase) is determined from:
1. The content of PC 2. The content of all processor registers 3. The content of status conditions status bits (program status word PSW) stored in a separate status register. contains status information about the state of the CPU: bits from ALU operation, interrupt enable bits, and CPU operation mode (system mode, user mode), for example.
2/13/2012
96
Continued
Some computer store only program counter (and PSW) prior entering to an interrupt routine
the interrupt routine must take care of storing and restoring the CPU status.
CPU does not respond to an interrupt until the end of an instruction execution
in an interrupt is pending control goes to a interrupt cycle. contents of PC and PSW are pushed onto stack. the branch address is transferred to PC and new PSW is loaded into the status register. the interrupt routine can now be executed starting from the branch address (which may contain a branch instruction to a user defined service routine). the last instruction of the interrupt routine is a return from interrupt : the stack is popped to retrieve PWS to status register and return address to PC => CPU state is restored and the interrupted program can proceed like nothing had happen.
2/13/2012 Er. KAPIL PRASHAR 97
Interrupt Types
1. External interrupts
from I/O, timing, or any other external source. e.g.: I/O device requesting new data, elapsed time of an event, power failure, etc.
3. Software interrupts
initiated by an instruction (rather than HW signals) a special call instruction that behaves like an interrupt. can be used by a programmer to initiate an interrupt routine at any desired point in the program. can be used for accessing operating system services, for example.
2/13/2012 Er. KAPIL PRASHAR 98
Control Unit
It is a part of the CPU. Its purpose is to issue control signals that provide control inputs for the multiplexers in the common bus, control inputs in processor register, and microoperations for the accumulator. Two major types control organizations are there: hardwired control & microprogrammed control.
2/13/2012
99
2/13/2012
100
Continued
The control function that specifies a microoperation is a binary variable. When it is in active state, the corresponding microoperation is executed. A control variable in the opposite binary state, does not change the state of the registers in the system. The control variables at any given time can be represented by a string of 1 s and 0 s called the control word. In a bus-organized system, the control signals that specify microoperations are groups of bits that select the paths in multiplexers, decoders, and ALU.
2/13/2012 Er. KAPIL PRASHAR 101
2/13/2012
102
Continued
Microprogram: Composed of a sequence of microinstructions corresponding to the sequence of steps in the execution of a given machine instruction. Since alteration of microprogram are not needed once the control unit is in operation, the control memory can be a read-only memory (ROM). Dynamic Microprogramming
2/13/2012
103
Control Memory
A computer that employs a microprogrammed control unit will have two separate memories: a main memory and a control memory. The content of main memory may alter when the data are manipulated and every time that the program is changed. The control memory holds a fixed microprogram that cannot be altered by the occasional user.
2/13/2012 Er. KAPIL PRASHAR 104
2/13/2012
105
Continued
Sequencer or Next-address generator
Used to generate the address of the next microinstruction to be retrieved from the control memory.
Control Memory
(CM) Usually a ROM; holds the control words which make up the microprogram for the MCU.
Sequencing
Each machine instruction is executed through the application of a sequence of microinstructions. Clearly, we must be able to sequence these; the collection of microinstructions which implements a particular machine instruction is called a routine. The MCU typically determines the address of the first microinstruction which implements a machine instruction based on that instruction's opcode. Upon machine power-up, the CAR should contain the address of the first microinstruction to be executed. The MCU must be able to execute microinstructions sequentially (e.g., within routines), but must also be able to ``branch'' to other microinstructions as required; hence, the need for a sequencer. The microinstructions executed in sequence can be found sequentially in the CM, or can be found by branching to another location within the CM. Sequential retrieval of microinstructions can be done by simply incrementing the current CAR contents; branching requires determining the desired CW address, and loading that into the CAR.
2/13/2012
107
2/13/2012
108
Continued
Microprogrammed:
Design is simpler problem of timing each instruction is broken down. Microinstruction cycle handles timing in a simple and systematic way. easier to modify slower than hardwired control In the microprogrammed control, any required changes or modifications can be done by updating the microprogram in control memory. Once the hardware configuration is established, there should be no need for further hardware or wiring changes.
2/13/2012 Er. KAPIL PRASHAR 109
RISC/CISC
Instruction set determines the way that machine language programs are constructed. Early computers had small and simple instruction sets in order to minimize the (expensive) hardware needed for their implementation. Today many computers have instructions that include 100 to 200 instructions
variety of data types large number of addressing modes
2/13/2012 Er. KAPIL PRASHAR 110
RISC/CISC
Complex instruction set computer (CISC) has complex hardware and large instruction set: functions from software to hardware. In contrast, reduced instruction set computer (RISC) uses fewer and simpler instructions which can be executed faster within the CPU. RISC chips require fewer transistors (than CISC), which makes them cheaper to design and produce. There is still considerable controversy among experts about the ultimate value of RISC architectures Its proponents argue that RISC machines are both cheaper and faster, and are
therefore the machines of the future. Skeptics note that by making the hardware simpler, RISC architectures put a greater burden on the software. They argue that this is not worth the trouble because conventional microprocessors are becoming increasingly fast and cheap anyway.
2/13/2012
111
CISC
One reason for the trend to provide a complex instruction set is to simplify the translation from high-level to machine language programs. Variable length instruction formats. Register operands need less bits whereas memory operands need more bits. Memory manipulation
2/13/2012
112
2. Use of overlapped register windows to speed-up procedure call and return. 3. Efficient instruction pipeline 4. Compiler support for efficient translation of high-level language programs into machine language programs. A characteristic of some RISC processors is their use of overlapped register windows to provide the passing of parameters and avoid need for saving and restoring register values: speeds up procedure calls and returns.
2/13/2012 Er. KAPIL PRASHAR 115
Input/Output Devices
When using a computer the text of programs, commands to the computer and data for processing have to be entered. Also information has to be returned from the computer to the user. This interaction requires the use of input and output devices. The most common input devices used by the computer are the keyboard and the mouse. The keyboard allows the entry of textual information while the mouse allows the selection of a point on the screen by moving a screen cursor to the point and pressing a mouse button. Using the mouse in this way allows the selection from menus on the screen etc. and is the basic method of communicating with many current computing systems. Alternative devices to the mouse are tracker balls, light pens and touch sensitive screens. The most common output device is a monitor which is usually a Cathode Ray Tube device which can display text and graphics. If hardcopy output is required then some form of printer is used.
2/13/2012
116
Keyboard
A computer keyboard is a peripheral modeled after the typewriter keyboard. Keyboards are designed for the input of text and characters, and also to control the operation of the computer. Physically, computer keyboards are an arrangement of rectangular or near-rectangular buttons, or "keys". Keyboards typically have characters engraved or printed on the keys; in most cases, each press of a key corresponds to a single written symbol. However, to produce some symbols requires pressing and holding several keys simultaneously, or in sequence; other keys do not produce any symbol, but instead affect the operation of the computer, or the keyboard itself.
2/13/2012
117
Mouse
The mouse is a device that allows to control the movement of the insertion point on the screen. The operator places the palm of the hand over the mouse and moves it across a mouse pad, which provides traction for the rolling ball inside the device. Movement of the ball determines the location of the I beam on the computer screen. When the operator clicks the mouse the I beam becomes an insertion point which indicates the area you are working on the screen. You can also click the mouse and activate icons or drag to move objects and select text.
2/13/2012
118
Monitor
A monitor's front is called a screen with a cathode ray tube (CRT) attached to the screen. The CRT contains an electronic gun that sends an electronic beam to a phosphorescent screen in front of the tube. To produce a pattern on the screen, a grid inside the CRT receives a variable voltage that causes the beam to hit the screen and make it glow at selected spots.
2/13/2012
119
Printer
Provide a permanent record on paper of computer output data or text. Three basic types of printers are there :
Daisywheel : contains a wheel with the character placed along the circumference. To print a character, the wheel rotates to the proper position and an energized magnet then presses the letter against the ribbon. Dot matrix : contains a set of dots along the printing mechanism. Each dot can be printed or not depending on the specific character that are printed on the line. Laser printer : uses a rotating photographic drum that is used to imprint the character images. The pattern is then transferred onto the paper in same manner as a copying machine.
2/13/2012
120
Magnetic tape
Are used mostly for storing files of data. Access is sequential and consists of records that can be accessed one after another as the tape moves along a stationary read-write mechanism. It is one of the cheapest and slowest method for storage and has the advantage that tapes can be removed when not in use.
2/13/2012
121
Magnetic disks
Have high-speed rotational surfaces coated with magnetic material. Access is achieved by moving a read-write mechanism to a track in the magnetized surface. Disks are used mainly for bulk storage of programs and data.
2/13/2012
122
Input-Output Interface
There are three ways that computer buses can be used to communicate with memory and I/O:
1. Use two separate buses, one for memory and other for I/O. 2. Use a common bus for memory and I/O but have separate control lines for each isolated I/O configuration. 3. Use one common bus for memory and I/O with common control lines: memory mapped I/O.
Follow this link for more study material: http://www.ustudy.in/ce/arch/u2
2/13/2012 Er. KAPIL PRASHAR 123
Continued
Some computers use one common bus to transfer information between memory or I/O devices and the CPU
2/13/2012
124
Memory-mapped I/O
one set of read and write signals for I/O and memory no way to make difference between memory access and I/O access => memory and I/O devices share the available address space. no distinct I/O instructions: same instructions are used to manipulate memory words and I/O data.
2/13/2012
125
2/13/2012
126
Continued
The interface registers communicate with the CPU through the bi-directional data bus. Address bus is used to select the interface unit and the register. External circuit must be provided externally to generate chip select (CS) signal (e.g. from address). Register select inputs are usually connected to the two least significant lines of the address bus. The content of the selected register is transferred to CPU when the I/O read signal is enabled. CPU transfers data to a selected register when I/O write is enabled.
2/13/2012 Er. KAPIL PRASHAR 127
Continued
One way is to use strobe signal
one of the unit indicates to the other unit when the transfer occurs:
2/13/2012
129
Continued
Another way is to use handshaking
data is acknowledged by the receiving unit: sender knows whether the data has been successfully received or not
2/13/2012
130
Continued
..destination initiated transfer using handshake
2/13/2012
131
Continued
Handshaking allows arbitrary delays from one state to the next and permits each unit to respond at its own data rate.
If one unit is faulty, the data transfer cannot be completed
timeout can be used to detect this kind of error. internal timer is used to measure time: if other unit does not respond within a given time period, the unit assumes that an error has occurred.
2/13/2012
132
Continued
The transfer of data can be parallel or serial
parallel transfer is fast but requires many wires: used for short distances and when speed is important. serial transfer is slower but requires only one pair of conductors.
A serial asynchronous data transmission technique used in many interactive terminals employs special bits that are inserted at both ends of the character code
each character consists of three parts
1. a start bit 2. the character bits (data) 3. ..and stop bits
when transmitter is idle the data line remains at high state (logic 1).
2/13/2012
133
Continued
the first bit, called start bit, is always a 0 and is used to indicate the beginning of a character (data). The last bit called stop bit is always a 1. A transmitted character can be detected by the receiver from knowledge of transmission rules: 1. When data is not being send, the line is kept in the 1-state. 2. The initiation of data transmission is detected from the start bit, which is always 0. 3. The data bits always follow the start bit. 4. After the last bit of the data transmitted, a stop bit is detected when the line returns to the 1-state for at least one bit time line remains at 1-state until next start bit. the receiver knows the transfer rate and number of data bits: it can examine the line at proper times and receive valid bits.
2/13/2012
134
Continued
E.g.: asynchronous serial transmission (8 data bits, 2 stop bits):
2/13/2012
135
Continued
Baudrate is defined as the rate at which serial information is transmitted and is equivalent to the data transfer in bits per second: assume 10 characters per second, i.e. 10 * 11 bits/second (start + 8 data + 2stop), => baudrate is 110.
2/13/2012
136
Modes of Transfer
Data transfer between the central computer and I/O devices may be handled in a variety of modes. 1. Programmed I/O 2. Interrupt-initiated I/O 3. Direct memory access (DMA).
2/13/2012
137
Programmed I/O
Each data item transfer is initiated by an instruction in the program. Peripheral to CPU and CPU to Memory or vice versa. Transferring data under program control requires constant monitoring of the peripheral by the CPU. Once the data transfer is initiated, the CPU is required to monitor the interface to see when a transfer can again be made.
2/13/2012 Er. KAPIL PRASHAR 138
Interrupt-initiated I/O
Programmed I/O is time consuming method since it keeps the CPU in a loop until the I/O unit indicates that it is ready for data transfer. It can be avoided by using an interrupt facility and special commands to inform the interfaces to issue an interrupt request signal when the data are available from the device. In the meantime CPU can go with processing other programs. When the interface that keeps monitoring the device finds it ready, it generates an interrupt signal and the CPU then stops its task and branches to a service program to process the I/O transfer and then return to the task it was performing.
2/13/2012
139
DMA
The CPU and the DMA controller cannot use the system bus at the same time, so some way must be found to share the bus between them. One of two methods is normally used.
The DMA controller transfers blocks of data by halting the CPU and controlling the system bus for the duration of the transfer. The transfer will be as quick as the weakest link in the I/O module/bus/memory chain, as data does not pass through the CPU, but the CPU must still be halted while the transfer takes place.
1. Burst mode
2. Cycle stealing
The DMA controller transfers data one word at a time, by using the bus during a part of an instruction cycle when the CPU is not using it, or by pausing the CPU for a single clock cycle on each instruction. This may slow the CPU down slightly overall, but will still be very efficient.
2/13/2012 Er. KAPIL PRASHAR 141
2/13/2012
142
DMA Controller
DMA controller is used to transfer the data between the memory and i/o device. The DMA controller needs the usual circuits to communicate with the CPU and i/o device. In addition to this, it needs an address register and address bus buffer. The address register contains an address of the desired location in memory. The word count register holds the number of words to be transferred. The control register specifies the mode of transfer. The DMA communicates with the i/o devices through the DMA request and DMA acknowledge line. The DMA communicates with the CPU through the data bus and control lines. The RD (Read) and WR (write) signals are bidirectional. When the BG (Bus Grant) signal are bidirectional. When the BG (Bus Grant) signal is 0, the CPU can communicate with the DMA registers through the data bus. When BG is 1, the CPU has relinquished the buses. The the DMA can communicate directly with the memory.
2/13/2012
143
The connection between the DMA controller and other components in a computer system for DMA transfer is shown in figure.
2/13/2012
144
DMA Transfer
The DMA request line is used to request a DMA transfer. The bus request (BR) signal is used by the DMA controller to request the CPU to relinquish control of the buses. The CPU activates the bus grant (BG) output to inform the external DMA that its buses are in a high-impedance state (so that they can be used in the DMA transfer.) The address bus is used to address the DMA controller and memory at given location The Device select (DS) and register select (RS) lines are activated by addressing the DMA controller. The RD and WR lines are used to specify either a read (RD) or write (WR) operation on the given memory location. The DMA acknowledge line is set when the system is ready to initiate data transfer. The data bus is used to transfer data between the I/O device and memory. When the last word of data in the DMA transfer is transferred, the DMA controller informs the termination of the transfer to the CPU by means of the interrupt line.
2/13/2012
145
Channel I/O
This is a system traditionally used on mainframe computers, but is becoming more common on smaller systems. It is an extension of the DMA concept, where the DMA controller becomes a full-scale computer system itself which handles all communication with the I/O modules.
2/13/2012
146
Memory System
Memory is Internal storage area in the computer. The term memory identifies data storage that comes in the form of chips, and the word storage is used for memory that exists on tapes or disks. Moreover, the term memory is usually used as a shorthand for physical memory, which refers to the actual chips capable of holding data. Some computers also use virtual memory, which expands physical memory onto a hard disk.
2/13/2012 Er. KAPIL PRASHAR 147
2/13/2012
148
Memory Hierarchy
Hierarchy diagram
Magnetic tapes IO Processor Magnetic disks Cache memory
149
Main memory
CPU
2/13/2012 Er. KAPIL PRASHAR
Continued
The memory hierarchy system consists of all storage devices employed in a computer system from the slow but high capacity auxiliary memory to a relatively faster main memory, to an even smaller and faster cache memory accessible to the high speed processing logic. At the bottom of the hierarchy are the relatively slow magnetic tapes used to store removable files. Next are the magnetic disks used as backup storage.
2/13/2012 Er. KAPIL PRASHAR 150
Continued
The main memory occupies a central position by being able to communicate directly with the CPU and with auxiliary memory devices through an I/O processor. When programs not residing in main memory are needed by the CPU, they are brought in from auxiliary memory. Programs not currently needed in main memory are transferred into auxiliary memory to provide space for currently used programs and data. Cache is a very high speed memory. It increase the speed of processing by making current programs and data available to the CPU at a rapid rate. Cache is employed in the computer system to to compensate for the speed differential between main memory access time and processor logic which is usually very fast.
2/13/2012
151
Memory Management
A program s machine language code must be in the computer s main memory in order to execute. Assuring that at least the portion of code to be executed is in memory when a processor is assigned to a process is the job of the memory manager of the operating system. This task is complicated by two other aspects of modern computing systems:
The first is multiprogramming. The second aspect is the need to allow the programmer to use a range of program addresses which may be larger, perhaps significantly larger than the range of memory locations actually available.
2/13/2012
152
Multiprogramming
Multiprogramming mean that several (at least two) processes can be active within the system during any particular time interval. These multiple active processes result from various jobs entering and leaving the system in an unpredictable manner. Pieces, or blocks, of memory are allocated to these processes when they enter the system, and are subsequently freed when the process leaves the system. Therefore, at any given moment, the computer s memory, viewed as a whole, consists of a part of blocks, some allocated to processes active at that moment, and others free and available to a new process which may, at any time, enter the system. In general, then , programs designed to execute in this multiprogramming environment must be compiled so that they can execute from any block of storage available at the the time of the program s execution. Such program are called relocatable programs, and the idea of placing them into any currently available block of storage is called relocation.
2/13/2012
153
nd 2
Aspect
The second aspect of modern computing systems affecting memory management is the need to allow the programmer to use a range of program addresses which may be larger, perhaps significantly larger than the range of memory locations actually available. That is, we want to provide the programmer with a virtual memory, with characteristics (especially size) different from actual memory, and provide it in a way that is invisible to the programmer. This is accomplished by extending the actual memory with secondary memory such as disk. Providing an efficiently operating virtual memory is another task for the memory management facility.
2/13/2012
154
Relocation
Relocation of currently active programs is called dynamic relocation. If a currently executing process could be relocated, both the computer s response time and resource utilization could be improved. Actual implementation of dynamic relocation is not trivial. The compiler can not possibly assign the correct addresses because a program must be complied before it can be loaded and executed. Thus, program relocation, especially dynamic relocation the moving around of currently active processes must be done by the operating system s memory management facility. Given the needs for multiprogramming and virtual memory, and having the mechanism of dynamic relocation, it is time to take a serious look at how one might design the actual memory manger.
2/13/2012
155
Actual memory management Creating and maintain an environment which will sustain both
multiprogramming and virtual memory consists basically of designing a memory management program which will facilitate the timely movement of blocks of program code into portions of main memory when they are about to be executed, and out of main memory to secondary memory (disk) when they are no longer needed. There are basically three approaches to this problem. In the fist approach, called swapping, all of the code for a particular process is transferred into main storage prior to dispatching the processor to the process. When the process becomes blocked or its time slice used up, the entire block of code is again swapped out to secondary storage to be replaced by the block of code representing the next process to assume control of the processor, and so on. This approach, while reasonable when the size of main memory is limited, obviously causes a substantial execution delay overhead during the swapping itself. This overhead cost can sometimes be ameliorated by alternative approaches, which move parts of the code for processes rather than code for entire processes.
2/13/2012
156
Continued
The other two approaches are segmentation and paging. Both recognize the fact that only that portion of a process code which is about to execute actually needs to be in main storage at any particular time. These approaches have two major advantages over swapping.
First, if just a part of a currently executing process needs to be in main memory at a given time, then it follows that parts of more process can be simultaneously in main store, and thus a greater degree of multiprogramming can be facilitated in the system. The term degree of multiprogramming refers to the number of processes currently active within the system.
2/13/2012 Er. KAPIL PRASHAR 157
Continued
The second advantage of segmentation or paging is that the capability to move just part of programs allows part of a program to be loaded into memory and executed, and then be replaced by another part to be requiring a very large amount of memory, perhaps in total, more than the capacity of the computer s main store. But this would be, then, an implementation of virtual memory, as it has been described above.
2/13/2012
158
Continued
Segmentation and paging differ from one another primarily in the way the code for a particular process is divided. In segmentation, a program code is divided into number of variable sized blocks corresponding to the logical structure of the program, such as procedures, functions and data segments. Paging, on the other hand, divides the program code into fixed blocks, called pages. It is evident that the more logical subdivision of segmentation makes program linking easier, while the fixed blocks of paging, being each interchangeable with the other, makes memory management easier. In either case, since portions of program s code are being moved around during a program s execution, something like a hardware relocation register will be needed to compute actual addresses in order to avoid unacceptable slowdown in program execution times.
2/13/2012
159
2/13/2012
160
Dynamic RAM
Dynamic RAMs(DRAM) are designed for high capacity, moderate speeds, and low power consumption. Their memory cells are basically charge-storage capacitors with driver transistors. The presence or absence of charge in a capacitor is interpreted by the sense line of the RAM as 1 or 0. The charge in a capacitor has tendency to discharge itself, therefore dynamic RAMs are required periodic charging to maintain the data stored. This periodic charging is called refreshing.
2/13/2012
161
Static RAM
Static RAMs(SRAM) are made of flip-flops and logic gates. Since a flip-flop is a bistable elements, it can be used to store the binary values. Operations of flip-flops are fast and do not require refreshing. However, due to the unit complexity, the capacity of static RAM is low compared with dynamic RAM. For the same reason, power consumption and the cost of unit storage are high, too.
2/13/2012 Er. KAPIL PRASHAR 162
2/13/2012
163
2/13/2012
164
2/13/2012
165
Types of ROM
1. Standard ROMs : Standard ROMs are programmed by the manufacturer. Users can only read the data or execute programs in the ROM. Usually, standard ROMs store certain standard applications for general user applications. 2. Programmable ROMs : Programmable ROMs (PROM) can be programmed permanently by the user or distributor using special equipment. They can only be programmed once. Before the data is written into the PROM, users should verify the correctness of the contents.
2/13/2012 Er. KAPIL PRASHAR 167
4. Electrically Erasable Programmable ROMs : Electrically erasable programmable ROMs (EEPROM) are similar to the EPROM. Instead of erasure of entire chip, user can erase a single bit electrically in one operation. Again, the operations require a special equipment.
2/13/2012
168
Auxiliary Memory
Devices that provide backup storage. When information not residing in main memory is required by the CPU, they are brought in from auxiliary memory Programs not currently needed in main memory are transferred to auxiliary memory to provide space for currently used programs and data. Most common auxiliary memory devices used in computer systems are magnetic disks and tapes. Other components used, but not as frequently, are magnetic drums, magnetic bubble memory, and optical disks.
2/13/2012
169
Magnetic disks
Is a circular plate constructed of metal or plastic coated with magnetized material. Often both sides of the disks are used and several disks may be stacked on one spindle with read/write heads available on each surface . Bits are stored in the magnetized surface in spots along concentric circles called tracks. the tracks are commonly divided into sections called sectors. In most systems, the minimum quantity of information which can be transferred is a sector.
2/13/2012 Er. KAPIL PRASHAR 170
Continued
In units using single read/write head for each disk surface, the track address bits are used by a mechanical assembly to move the head into the specified track position before reading or writing. In other disk systems, separate read/write heads are provided for each track in each surface.
2/13/2012 Er. KAPIL PRASHAR 171
Magnetic Disk
2/13/2012
172
Continued
A disk system is addressed by address bits that specify the disk number, the disk surface, the sector number and the track within the sector. After the read/write heads are positioned in the specified track, the system has to wait until the rotating disk reaches the specified sector under the read/write head. Information transfer is very fast once the beginning of a sector has been reached.
2/13/2012 Er. KAPIL PRASHAR 173
Continued
Disks that are permanently attached to the unit assembly and cannot be removed by the occasional user are called hard disks. Disk drive with removable disks is called a floppy disk.
2/13/2012
174
Magnetic Tape
It is a strip of plastic coated with a magnetic recording medium. Bits are recorded as magnetic spots on the tape along several tracks. Read/write heads are mounted one in each track so that data can be recorded and read as a sequence of characters. These units can be stopped, started to move forward or in reverse, or can be rewound. They cannot be started or stopped fast enough between individual characters. Information is recorded in blocks referred to as records. Gaps of unrecorded tape are inserted between records where the tape can be stopped. Each record on tape has an identification bit pattern at the beginning and end.
2/13/2012
175
Associative memory
The time required to find an item stored in memory can be reduced considerably if stored data can be identified for access by the content of the data itself rather than by an address. A memory unit accessed by content is called an associative memory or content addressable memory (CAM). During writing of word in CAM no address is given. The memory is capable of finding an empty unused location to store the word. During reading of a word from the CAM, the content of the word, or part of the word, is specified. The memory locates all words which match the specified content and marks them for reading.
2/13/2012
176
Continued
Associative memories are more expensive than a RAM because each cell must have storage capability as well as logic circuits for matching its content with an external argument.
2/13/2012
177
Hardware Organization
Block Diagram of Associative Memory: Argument register (A) Key register (K) Input Read write
Associative memory Array and logic
Match register
M
m words n bits per word
2/13/2012
178
Continued
The block diagram of Associative memory consists of a memory array and logic for m words with n bits per word. The argument register A and key register K each have n bits, one for each bit of a word. The match register M has m bits, one for each memory word. Each word in memory is compared in parallel with the content of A. The words that match the bits of A set a corresponding bit in M. Reading is accomplished by a sequential access to memory for those words whose corresponding bits in M have been set. K provides a mask for choosing a particular field or key in argument word. The entire argument is compared with each memory word if K contains all 1 s. Otherwise, only those bits in the argument that have 1 s in their corresponding position of K are compared.
2/13/2012
179
Continued
Example: A K word 1 word 2 101 111100 111 000000 100 111100 101 000001
no match match
2/13/2012
180
Continued
Match logic: The match logic for each word can be derived from the comparison algorithm. Read operation: The bits of the match register are scanned matched one at a time and the matched words having 1 in the corresponding bit position of M are read in sequence by applying a read signal to each word line whose corresponding Mi bit is a 1. Write operation: a special register named tag register is used for this purpose.
2/13/2012 Er. KAPIL PRASHAR 181
Cache memory
Locality of Reference
During the course of the execution of a program, memory references tend to cluster e.g. loops
2/13/2012
182
Cache
Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module
2/13/2012
183
Cache Design
Size Mapping Function Replacement Algorithm Write Policy Block Size Number of Caches
2/13/2012
185
Speed
cache is faster (up to a point) Checking cache for data takes time
Small enough that the average cost is about that of RAM alone Large enough that access time is about that of cache alone
2/13/2012 Er. KAPIL PRASHAR 186
Write Policy
Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly
2/13/2012
187
Write through
All writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Lots of traffic Slows down writes
2/13/2012
188
Write back
Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set Other caches get out of sync I/O must access main memory through cache
2/13/2012
189
Virtual Memory
Virtual memory is a system by which the machine or operating system fools processes running on the machine into thinking that they have a lot more memory to work with than the capacity of RAM would indicate. It does this by storing the most recently used items in RAM, and storing the lesser used items in the slower disk memory, and interchanging data between the two whenever a disk access is made. In this way, memory appears to programs to be a full 32 bit address space, when it fact memory space is probably only a mere fraction of that.
2/13/2012
190
2/13/2012
191
The physical memory is broken down into groups of equal size called blocks, which may range from 64 to 4096 words each. The term page refers to groups of address space of the same size. Portions of programs are moved from auxiliary memory to main memory in records equal to the size of a page.
2/13/2012 Er. KAPIL PRASHAR 192
Continued
Mapping from address space to memory space is facilitated if each virtual address is considered to be represented by two numbers: a page number address and a line within the page. In computers with 2p word per page, p bits are used to specify a line address and the remaining high-orders bits of the virtual address specify the page number.
Page 0 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Block 0 Block 1 Block 2 Block 3
Continued
In the example, a virtual address has 13 bits. Let the page contains 1K of words, since 1K= 210 = 1024 words, the high order three bits of a virtual address will specify one of the eight pages and the low order 10 bits give the line address within the page. The only mapping required here is from a page number to a block number.
2/13/2012
194
Continued
The organization of the memory mapping table in a paged system is shown in figure below:
2/13/2012
195
Page Replacement
Page replacement is required on page fault. Algorithms are: FIFO, LRU.
2/13/2012
196