## Chapter 3 Topics

3.1 Machine characteristics and performance
3.2 RISC vs. CISC
3.3 A CISC microprocessor: The Motorola MC68000
3.4 The SPARC: a RISC architecture

## Practical Aspects of Machine CostEffectiveness

- Cost for useful work is fundamental issue
- Mounting, case, keyboard, etc. are dominating the cost of integrated circuits
- Upward compatibility preserves software investment
- Binary compatibility
- Source compatibility
- Emulation compatibility
- Performance: strong function of application


## Performance Measures

- MIPS: Millions of Instructions Per Second
- Same job may take more instructions on one machine than on another
- MFLOPS: Million Floating Point OPs Per Second
- Other instructions counted as overhead for the floating point
- Whetstones: Synthetic benchmark
- A program made-up to test specific performance features
- Dhrystones: Synthetic competitor for Whetstone
- Made up to "correct" Whetstone’s emphasis on floating point
- SPEC: Selection of "real" programs
- Taken from the C/Unix world


## C

## S

## Quantitative Performance Measurement

Consider two auto routes, the old one, which allowed an average speed of 34 mph , and the new one, which permitted 46 mph . What is the speedup of the new one over the old one?

Conventionally the speedup is calculated as follows:

$$
\text { Speedup }=\frac{\text { SpeedOnNewRoute }}{\text { SpeedOnOldRoute }}=\frac{S_{\text {new }}}{S_{\text {old }}}=\frac{46}{34}=1.35
$$

For a speedup of 0.35 , or $35 \%$.
Alternately, the \% speedup can be calculated directly:

$$
\% \text { Speedup }=\frac{S_{\text {new }}-S_{\text {old }}}{S_{\text {old }}} \times 100=\frac{46-34}{34} \times 100=\frac{12}{34} \times 100=35 \%
$$

## C

## S Quantitative Performance Measurement

A
2/e
Many measurements are in terms of the time, T , it takes to accomplish some task. Recall that Time, $T$, is the reciprocal of Speed, $S=1 / T$. If the improvement is measured by recording travel time rather than travel speed the equation changes as follows:

$$
\text { Speedup }=\frac{S_{\text {new }}}{S_{\text {old }}}=\frac{\frac{1}{T_{\text {new }}}}{\frac{1}{T_{\text {old }}}}=\frac{T_{\text {old }}}{T_{\text {new }}}=\frac{96}{71}=1.35, \text { or } 35 \%
$$

Once again, the \% speedup can be calculated directly:

$$
\% \text { Speedup }=\frac{T_{\text {old }}-T_{\text {new }}}{T_{\text {new }}} \times 100=\frac{96-71}{71} \times 100=\frac{25}{71} \times 100=35 \%
$$

## A Classic Example

CLASSIC EXAMPLE: ESTIMATING PERFORMANCE A certain computer system takes 125 ms to render a certain graphic image, and this time is reduced to 100 ms when a graphics processor card is added to the system. What is the speedup?

$$
\text { Speedup }=\frac{T_{\text {old }}}{T_{\text {new }}}=\frac{125}{100}=1.25, \text { or a } 25 \% \text { speedup }
$$

## Getting Finer-Grained

- The execution time can be calculated from the count of how many instructions have executed, IC, the average number of clock cycles per instruction, CPI, and the clock period, $\tau$.
- This is an important equation that will be used throughout the text.

$$
\text { Execution time }=T=I C \times C P I \times \tau
$$

Example 3.1 Speedup Due to a Clock Frequency Increase The
master clock in a certain computer system is increased in frequency from 700 MHz to 1.2 GHz . What is the speedup due to this improvement if no other factors such as memory access time interfere with the improvement?

Since, according to the problem definition, neither IC nor CPI changed, and since the clock period, $\tau$, is proportional to the reciprocal of clock frequency,

$$
\text { Speedup }=\frac{(\mathrm{IC} \times \mathrm{CPI} \times \tau)_{\text {old }}}{(\mathrm{IC} \times \mathrm{CPI} \times \tau)_{\text {new }}}=\frac{1 / 700}{1 / 1200}=\frac{1200}{700}=1.71 \text {, or } 71 \% \text { speedup }
$$

## CISC Versus RISC Designs

- CISC: Complex Instruction Set Computer
- Many complex instructions and addressing modes
- Some instructions take many steps to execute
- Not always easy to find best instruction for a task
- RISC: Reduced Instruction Set Computer
- few, simple instructions, addressing modes
- usually one word per instruction
- may take several instructions to accomplish what CISC can do in one
- complex address calculations may take several instructions
- usually has load-store, general register ISA
$D^{S}$
$\mathbf{D}^{\mathbf{A}} / \mathrm{e}^{-}$


## Design Characteristics of RISCs

- Simple instructions can be done in few clocks
- Simplicity may even allow a shorter clock period
- A pipelined design can allow an instruction to complete in every clock period
- Fixed length instructions simplify fetch \& decode
- The rules may allow starting next instruction without necessary results of the previous
- Unconditionally executing the instruction after a branch
- Starting next instruction before register load is complete


## Other RISC Characteristics

- Prefetching of instructions. (Similar to I8086)
- Pipelining: beginning execution of an instruction before the previous instruction(s) have completed. (Will cover in detail in Chapter 5.)
- Superscalar operation-issuing more than one instruction simultaneously. (Instruction-level parallelism. Also covered in Chapter 5.)
- Delayed loads, stores, and branches. Operands may not be available when an instruction attempts to access them.
- Register Windows—ability to switch to a different set of CPU registers with a single command. Alleviates procedure call/return overhead. Discussed with SPARC in this Chapter.


## Tbl. 3.1 Developing an Instruction Set Architecture

2/e

- Memories: structure of data storage in the computer
- Processor state registers
- Main memory organization
- Formats and their interpretation: meanings of register fields
- Data types
- Instruction format
- Instruction address interpretation
- Instruction interpretation: things done for all instructions
- The fetch-execute cycle
- Exception handling (sometimes deferred)
- Instruction execution: behavior of individual instructions
- Grouping of instructions into classes
- Actions performed by individual instructions



## CISC: The Motorola MC68000

- Introduced in 1979
- One of first 32 bit microprocessors
- Means that most operations are on 32 bit internal data
- Some operations may use different number of bits
- External data paths may not all be 32 bits wide
- MC68000 had a 24 bit address bus
- Complex Instruction Set Computer - CISC
- Large instruction set
- 14 addressing modes
$\mathbf{D}^{S}$
$\mathbf{A}$
$2 / e^{-}$


## Fig. 3.1 MC68000 Programmer’s Model



## Features of the 68000 Processor State

- Distinction between 32 bit data registers and 32 bit address registers
- 16 bit instruction register
- Variable length instructions handled 16 bits at a time
- Stack pointer registers
- User stack pointer is one of the address registers
- System stack pointer is a separate single register
- Discuss: Why a separate system stack.
- Condition code register: System \& User bytes
- Arithmetic status ( $\mathrm{N}, \mathrm{Z}, \mathrm{V}, \mathrm{C}, \mathrm{X}$ ) is in user status byte
- System status has Supervisor \& Trace mode flags, as well as the Interrupt Mask


## S <br> RTN Processor State for the MC68000

A
2/e

| $\mathrm{D}[0 . .7]\langle 31 . .0\rangle:$ | General purpose data registers; |
| :--- | :--- |
| $\mathrm{A}[0 . .7]\langle 31 . .0\rangle:$ | Address registers; |
| $\mathrm{A} 7^{\prime}\langle 31 . .0\rangle:$ | System stack pointer; |
| $\mathrm{PC}\langle 23 . .0\rangle:$ | Program counter in original MC68000 |
| $\mathrm{IR}\langle 15 . .0\rangle:$ | Instruction register; |
| $\mathrm{Status}\langle 15 . .0\rangle:$ | System status byte and user status byte; |
| $\mathrm{SP}:=\mathrm{A}[7]:$ | User stack pointer, also called USP; |
| $\mathrm{SSP}:=\mathrm{A} 7$ |  |
| $\mathrm{C}:=$ Status $\langle 0\rangle: \mathrm{V}:=$ Status $\langle 1\rangle:$ | System stack pointer; |
| $\mathrm{Z}:=$ Status $\langle 2\rangle: \mathrm{N}:=$ Status $\langle 3\rangle:$ | Zero and oVerflow flags; |
| $\mathrm{X}:=$ Status $\langle 4\rangle:$ | Extend flag; |
| $\mathrm{INT}\langle 2 . .0\rangle:=\mathrm{Status}\langle 10 . .8\rangle:$ | Interrupt mask in system status byte; |
| $\mathrm{S}:=$ Status $\langle 13\rangle: \mathrm{T}:=$ Status $\langle 15\rangle:$ Supervisor state and Trace mode flags; |  |

## Main Memory in the MC68000

Main memory:

$$
\begin{array}{ll}
\mathrm{Mb}\left[0 . .2^{24}-1\right]\langle 7 . .0\rangle: & \text { Memory as bytes } \\
\mathrm{Mw}[\mathrm{ad}]\langle 15 . .0\rangle:=\mathrm{Mb}[\mathrm{ad}] \# \mathrm{Mb}[\mathrm{ad}+1]: & \text { Memory as words } \\
\mathrm{MI}[\mathrm{ad}]\langle 31 . .0\rangle:=\mathrm{Mw}[\mathrm{ad}] \# \mathrm{Mw}[\mathrm{ad}+2]: & \text { Memory as long words }
\end{array}
$$

- The word and longword forms are "big-endian"
- The lowest numbered byte contains the most significant bit (big end) of the word
- Words and longwords have "hard" alignment constraints not described in the above RTN
- Word addresses must end in one binary 0
- Longword addresses must end in two binary zeros


## C

## S MC68000 Supports Several Operand Types

- Like many CISC machines, the 68000 allows one instruction to operate on several types
- MOVE.B for bytes, MOVE.W for words, and MOVE.L for longwords; also ADD.B, ADD.W, ADD.L, etc.
- The default, ADD, for example, is Word operands.
- Operand length is encoded into the instruction word
- Bits coding operand type vary with instruction
- For use with RTN descriptions, we assume a function $\mathrm{d}:=$ datalen(IR) that returns 1, 2, or 4 for operand length


## Fig. 3.2 Some MC68000 Instruction Formats


(a) A 1-word move instruction

(c) A 3-word instruction

(b) A 2-word instruction

| 15 | 0 |  |
| :---: | :---: | :---: |
| ... | 110 Reg | IR |
| d/a\|Index reg|w/|000| | disp8 | Extra word |

(d) Instruction with indexed address

Copyright © 2004 Pearson Prentice Hall, Inc.

## C



## General Form of Addressing Modes in the MC68000

- A general address of an operand or result is specified by a 6-bit field with mode and register numbers


4
Provides access paths to operands

- Not all operands and results can be specified by a general address: some must be in registers.
- Not all modes are legal in all parts of an inst.
- Exception: when specifying the destination of a MOVE instruction the mode and reg fields are reversed.


## $\begin{array}{llllll}5 & 4 & 3 & 2 & 1 & 0\end{array}$ <br>  <br> MC68000 Addressing Modes

Name

Data reg. direct
Addr. reg. direct
Addr. reg. Indirect
Autoincrement
Autodecrement
Based
Based indexed short
Based indexed long
Absolute short
Absolute long
Relative
Rel. indexed short
Rel. indexed long
Immediate

| Mode | Reg. | Assembler <br> Syntax |
| :---: | :---: | :--- |
| 0 | $0-7$ | Dn |
| 1 | $0-7$ | An |
| 2 | $0-7$ | $(A n)$ |
| 3 | $0-7$ | $($ An)+ |
| 4 | $0-7$ | $-(A n)$ |
| 5 | $0-7$ | disp16(An) |
| 6 | $0-7$ | disp8(An,XnLo) |
| 6 | $0-7$ | disp8(An,Xn) |
| 7 | 0 | addr16 |
| 7 | 1 | addr32 |
| 7 | 2 | disp16(PC) |
| 7 | 3 | disp8(PC,XnLo) |
| 7 | 3 | disp8(PC,Xn) |
| 7 | 4 | \#data |

Extra Brief description Words
0 Dn

0 An
0 M[An]
$0 \quad M[A n] ; A n \leftarrow A n+d$
$0 \quad A n \leftarrow A n-d ; M[A n]$
$1 \quad M[A n+d i s p 16]$
1 M[An+XnLo+disp8]
$1 \quad M[A n+X n+d i s p 8]$
1 M[addr16]
2 M[addr32]
1 M[PC+disp16]
1 M[PC+XnLo+disp8]
1 M[PC+Xn+disp8]
1-2 data

## C



- The addressing modes interpret many items
- The instruction: in the IR register
- The following 16 bit word: described as Mw[PC]
- The D and A registers in the CPU
- Many addressing modes calculate an effective memory address
- Some modes designate a register
- Some modes result in a constant operand
- There are restrictions on the use of some modes


## RTN Formatting for Effective Address Calculation

$$
\begin{aligned}
& \mathrm{XR}[0 . .15]\langle 31 . .0\rangle:= \\
& \mathrm{D}[0 . .7]\langle 31 . .0\rangle \# \mathrm{~A}[0 . .7]\langle 31 . .0\rangle: \\
& \text { Index register can be D or A; } \\
&\mathrm{xr}\langle 3 . .0\rangle:=\mathrm{Mw}[\mathrm{PC}] / 15 . .12\rangle: \text { Index specifier for index mode; } \\
& \mathrm{wl}:=\mathrm{Mw}[\mathrm{PC}]\langle 11\rangle: \text { Short or long index flag; } \\
& \mathrm{dsp} 8\langle 7 . .0\rangle:=\mathrm{Mw}[\mathrm{PC}]\langle 7 . .0\rangle: \text { Displacement for index mode; } \\
& \text { index }:=((\mathrm{wl}=0) \rightarrow \mathrm{XR}[\mathrm{xr}]\langle 15 . .0\rangle: \text { Short or } \\
&(\mathrm{wl}=1) \rightarrow \mathrm{XR}[\mathrm{xr}]\langle 31 . .0\rangle): \text { long index value; }
\end{aligned}
$$

- Either an A or a D register can be used as an index
- A 4-bit field in the 2nd instruction word specifies the index register
- Low order 8-bits of 2nd word are used as offset
- Either 16 or 32 bits of index register may be used


S Modes That Calculate a Memory

| $\underline{5}$ | $\underline{4} \quad \underline{3}$ | $\underline{2} \quad \underline{1} \quad \underline{0}$ |
| ---: | ---: | ---: | ---: | ---: |
| $010-110$ | $000-111$ |  |

- md and rg are the 3-bit mode and reg. fields.
- ea stands for effective address


```
ea(md, rg) \(:=(\)
    \((m d=2) \rightarrow A[r g\langle 2 . .0\rangle]:\)
    \((m d=3) \rightarrow\)
    \((\mathrm{A}[\mathrm{rg}\langle 2 . .0\rangle] ; \mathrm{A}[\mathrm{rg}\langle 2 . .0\rangle] \leftarrow \mathrm{A}[\mathrm{rg}\langle 2 . .0\rangle]+\mathrm{d}):\)
    \((m d=4) \rightarrow\)
    \((\mathrm{A}[\mathrm{rg}\langle 2 . .0\rangle] \leftarrow \mathrm{A}[\mathrm{rg}\langle 2 . .0\rangle]-\mathrm{d} ; \mathrm{A}[\mathrm{rg}\langle 2 . .0\rangle]):\)
    \((m d=5) \rightarrow\)
    (A[rg〈2..0 \(]\) ] \(\mathrm{Mw}[\mathrm{PC}] ; \mathrm{PC} \leftarrow \mathrm{PC}+2\) ):
    \((m d=6) \rightarrow\)
    \((\mathrm{A}[\mathrm{rg}\langle 2 . .0\rangle]+\) index \(+\mathrm{dsp} 8 ; \mathrm{PC} \leftarrow \mathrm{PC}+2):\)
```

Mode 2 is register indirect;
Mode 3 is
autoincrement;
Mode 4 is
autodecrement;
Mode 5 is based
or offset addressing;
Mode 6 is based
indexed addressing;

## Mode 7 Uses the reg Field to Expand the Number of Modes

| $\underline{5}$ | $\underline{4}$ | $\underline{3}$ | $\underline{2}$ | $\underline{0}$ |
| ---: | ---: | ---: | ---: | ---: | ---: |
| 1 | 1 | 1 | $\underline{\text { reg }}$ |  |

- These modes still calculate a memory address

```
ea (md, rg) :=
\((\mathrm{md}=7 \wedge \mathrm{rg}=0) \rightarrow \quad\) Mode 7, register 0 is
    ( \(\mathrm{Mw}[\mathrm{PC}]\{\) sign extend to 32 bits\}; \(\mathrm{PC} \leftarrow \mathrm{PC}+2\) ):
( \(\mathrm{md}=7 \wedge \mathrm{rg}=1\) ) \(\rightarrow\)
    (MI[PC]; \(\mathrm{PC} \leftarrow \mathrm{PC}+4\) ):
( \(\mathrm{md}=7 \wedge \mathrm{rg}=2\) ) \(\rightarrow\)
    ( \(\mathrm{PC}+\mathrm{Mw}[\mathrm{PC}]\{\) sign extend to 32 bits\};
    \(P C \leftarrow P C+2):\)
\((\mathrm{md}=7 \wedge \mathrm{rg}=3) \rightarrow\)
    \((P C+\) index + dsp8; PC \(\leftarrow P C+2) \quad\) :
```

Mode 7, register 0 is short absolute;
Mode 7, register 1 is long absolute;
Mode 7, register 2 is program counter relative addressing;
Mode 7, register 3 is relative indexed.


- Same picture for autoincrement or decrement
- Address register incremented after address obtained in autoincrement
- Address register decremented before address obtained in autodecrement

| $S^{S}$ |
| :---: |
| D |
| 2/e |

Fig. 3.4 Mode 6: Based Indexed Addressing

| $\underline{5}$ | $\underline{4}$ | $\underline{3}$ | $\underline{2}$ | $\underline{1}$ |
| :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | $\underline{0}$ |  |



- Three things are added to get the address

Modes 7-0 and 7-1: Absolute Addressing



- Absolute addresses can be 16 or 32 bits


## Mode 7-3 Relative Indexed Addressing

| $\underline{5}$ | $\underline{4}$ | $\underline{3}$ | $\underline{2}$ | $\underline{1}$ | $\underline{0}$ |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 1 | 0 | 1 | 1 |



- Same as indexed mode but uses PC instead of A register as base


## Operands in Registers or Memory can Have Different Lengths

```
memval(md, rg) :=
    \(((\operatorname{md}\langle 2 . .1\rangle=1) \vee(\mathrm{md}\langle 2 . .1\rangle=2) \vee(\mathrm{md}\langle 2 . .0\rangle=6) \vee\)
        \(((\mathrm{md}\langle 2 . .0\rangle=7) \wedge(r g\langle 2\rangle=0))):\)
opnd(md, rg) := (
    \((\mathrm{d}=1) \rightarrow\) opndb(md, rg\():(\mathrm{d}=2) \rightarrow\) opndw(md, rg\():\)
    \((\mathrm{d}=4) \rightarrow\) opndl(md, rg )
opndl(md, rg) \(\langle 31 . .0\rangle:=(\)
opndw(md, rg) \(\langle 15 . .0\rangle:=(\)
    memval(md, rg) \(\rightarrow \mathrm{Mw}[\mathrm{ea}(\mathrm{md}, \mathrm{rg})]\langle 15 . .0\rangle:\)
    \(\mathrm{md}=0 \rightarrow \mathrm{D}[\mathrm{rg}]\langle 15 . .0\rangle:\)
    \(\mathrm{md}=1 \rightarrow \mathrm{~A}[\mathrm{rg}]\langle 15 . .0\rangle:\)
    \((\mathrm{md}=7 \wedge \mathrm{rg}=4) \rightarrow(\mathrm{Mw}[\mathrm{PC}]\langle 15 . .0\rangle: \mathrm{PC} \leftarrow \mathrm{PC}+2)\) ):
opndb(md, rg) \(\langle 7 . .0\rangle:=(\)
    \((\mathrm{md}=7 \wedge \mathrm{rg}=4) \rightarrow(\mathrm{Mw}[\mathrm{PC}]\langle 7 . .0\rangle: \mathrm{PC} \leftarrow \mathrm{PC}+2)):\)
```

A memory address is used with these modes only;
The operand length in the instruction tells which to use.
A long operand can be
A word operand is similar but needs only a 16 bit immediate following the instruction word;
Byte operands
instruction word.

## Modes 0 and 1: Register Direct Addressing

| $\mid \underline{5}$ | $\underline{4}$ | $\underline{3}$ | $\underline{2}$ | $\underline{1}$ | $\underline{0}$ |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 0 | 0 | 0 | $(\mathrm{D})$ |  | reg |
| 0 | 0 | 1 | $(\mathrm{~A})$ |  |  |



- The register itself provides a place to store a result or a place to get an operand
- There is no memory address with this mode


Instruction word and 1 or 2 following words

| Byte |  |  | Word |  |  | Longword |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| . $\cdot$ | 111 | 100 |  | 111 | 100 |  | 111 | 100 |
| 00000000 | value8 |  | value16 |  |  | value 16 Hi |  |  |
| 158 | 87 | 0 | 15 |  | 0 | value16Lo |  |  |
| Ex: MOVE.B \#12, |  |  | Ex: MOVE.w \#1234, |  |  |  |  | 0 |
|  |  |  | Ex: MOVE.L \#12348678, |

- Data length is specified by the opcode field, not the Mode/Reg field


## C

## S Not Every Addressing Mode Can Be Used for Results <br> rsltadr( $\mathrm{md}, \mathrm{rg}$ ) $:=$ memval $(\mathrm{md}, \mathrm{rg}) \wedge \neg(\mathrm{md}=7 \wedge(\mathrm{rg}=2 \mathrm{vrg}=3)):$

- The MC68000 disallows relative addressing ( md 7 rg 2 or 3 ) for results
- This is captured in RTN by defining a function that is true (=1) if the memory address specified by the mode is legal for results
- Register immediate is also legal for results, but will be handled separately

```
    S Result Modes Must Have a Place to Write Data: Memory or Register
```

32 bit result;

16 bit result;

8 bit result.

The result length in the instruction tells which to use.

```
rsltl(md, rg)\langle31..0\rangle := ( }32\mathrm{ bit result;
```

rsltl(md, rg)\langle31..0\rangle := ( }32\mathrm{ bit result;
rsltadr(md, rg) -> MI[ea(md, rg)]\31..0\rangle:
rsltadr(md, rg) -> MI[ea(md, rg)]\31..0\rangle:
md = 0 }->\textrm{D}[\textrm{rg}]\langle31..0\rangle
md = 0 }->\textrm{D}[\textrm{rg}]\langle31..0\rangle
md=1 }->\textrm{A}[\textrm{rg}](31..0%5Crangle)
md=1 }->\textrm{A}[\textrm{rg}](31..0%5Crangle)
rsltw(md, rg)\langle15..0\rangle := (
rsltw(md, rg)\langle15..0\rangle := (
rsltadr(md, rg) -> Mw[ea(md, rg)]<15..0\rangle:
rsltadr(md, rg) -> Mw[ea(md, rg)]<15..0\rangle:
md = 0 }->\textrm{D}[\textrm{rg}]<15..0\rangle
md = 0 }->\textrm{D}[\textrm{rg}]<15..0\rangle
md = 1 -> A[rg]<15..0\rangle ):
md = 1 -> A[rg]<15..0\rangle ):
rsltb(md, rg)\langle7..0\rangle := (
rsltb(md, rg)\langle7..0\rangle := (
rsltadr(md, rg) -> Mb[ea(md, rg)]\7..0\rangle:
rsltadr(md, rg) -> Mb[ea(md, rg)]\7..0\rangle:
md = 0 }->\textrm{D}[\textrm{rg}] \7..0\rangle
md = 0 }->\textrm{D}[\textrm{rg}] \7..0\rangle
md = 1 }->\textrm{A}[\textrm{rg}](7..0%5Crangle)
md = 1 }->\textrm{A}[\textrm{rg}](7..0%5Crangle)
rslt(md, rg) := (
rslt(md, rg) := (
(d=1) -> rsltb(md, rg): (d=2) -> rsltw(md, rg):
(d=1) -> rsltb(md, rg): (d=2) -> rsltw(md, rg):
(d=4) }->\mathrm{ rslt|(md, rg) ):

```
        (d=4) }->\mathrm{ rslt|(md, rg) ):
```


## MC68000 Instruction Interpretation

- Instruction interpretation is simple when exceptions are ignored

$$
\begin{aligned}
& \text { Instruction_interpretation :=( } \\
& \text { Run } \rightarrow\binom{(I \mathrm{R}\langle 15 . .0\rangle \leftarrow \mathrm{Mw}[\mathrm{PC}]\langle 15 . .0\rangle: \mathrm{PC} \leftarrow \mathrm{PC}+2) \text {; }}{\text { instruction_execution }) ;} \text { : }
\end{aligned}
$$

- Instructions are fetched 16 bits at a time
- PC is advanced by 2 as each 16-bit word is fetched
- Addressing mode may advance it a total of 2 or 4 or more words, under command from the control unit.

- The op code location and size depends on the instruction (Compare to SRC).


## RTN for a Typical MC68000 Move Instruction

- The instruction format for Move includes mode and register for source and destination addresses

$$
\begin{aligned}
& \operatorname{op}\langle 3 . .0\rangle:=\operatorname{IR}\langle 15 . .12\rangle: \operatorname{rg} 1\langle 2 . .0\rangle:=\operatorname{IR}\langle 2 . .0\rangle: \operatorname{md} 1\langle 2 . .0\rangle:=\operatorname{IR}\langle 5 . .3\rangle: \\
& \operatorname{rg} 2\langle 2 . .0\rangle:=\operatorname{IR}\langle 11 . .9\rangle: \operatorname{md} 2\langle 2 . .0\rangle:=\operatorname{IR}\langle 8 . .6\rangle:
\end{aligned}
$$

$\operatorname{tmp}\langle 31 . .0\rangle$ :
move $(:=o p\langle 3 . .2\rangle:=0) \rightarrow($
tmp $\leftarrow$ opnd(md1, rg1);
$(\mathrm{Z} \leftarrow(\mathrm{tmp}=0): \mathrm{N} \leftarrow(\mathrm{tmp}<0): \mathrm{V} \leftarrow 0: \mathrm{C} \leftarrow 0):$
rslt(md2, rg2) $\leftarrow \operatorname{tmp} \quad$ ):

- The temporary register tmp is used because every invocation of opnd() causes another fetch


## MC68000 Integer Arithmetic and Logic Instructions

2/e

| Op. | Operand | Inst. word | X N Z V C | Operation | Sizes |
| :---: | :---: | :---: | :---: | :---: | :---: |
| ADD | EA,Dn | 1101rrrmmmaaaaaa | $\mathrm{x} \mathrm{x} \times \mathrm{x} \mathrm{x}$ | dst $\leftarrow$ dst+src | b, w, l |
| SUB | EA,Dn | 1001rrrmmmaaaaaa | $\mathrm{x} \times \mathrm{x} \mathrm{x}$ x | dst $\leftarrow$ dst-src | b, w, I |
| CMP | EA,Dn | 1011rrrxxxaaaaaa | x x x x | dst-src | b, w, l |
| CMPI | \#dat,EA | 00001100 wwaaaaaa | x x x x | dst-imm.data | b, w, I |
| MULS | EA, Dn | $1100 r r r 111$ aaaaaa | $\mathrm{x} \times 00$ | Dn $\leftarrow$ Dn*src | $l \leftarrow W^{*}$ W |
| MULU | EA,Dn | $1100 r r r 011$ aaaaaa | $\mathrm{x} \times 00$ | $\mathrm{Dn} \leftarrow \mathrm{Dn}$ * src | $1 \leftarrow \mathrm{w}^{*} \mathrm{~W}$ |
| DIVS | EA,Dn | $1000 r r r 111$ aaaaaa | $x \mathrm{x} x 0$ | $\mathrm{Dn} \leftarrow \mathrm{Dn} / \mathrm{src}$ | $1 \leftarrow 1 / w$ |
| DIVU | EA,Dn | 1000rrr011aaaaaa | $\mathrm{x} \times \mathrm{x} 0$ | $\mathrm{Dn} \leftarrow \mathrm{Dn} / \mathrm{src}$ | $1 \leftarrow \mathrm{l} / \mathrm{w}$ |
| AND | EA,Dn | $1100 r r r m m m a a a a a$ | $\mathrm{x} \times 00$ | dst $\leftarrow$ dst^src | b, w, I |
| OR | EA,Dn | 1000rrrmmmaaaaaa | $\mathrm{x} \times 00$ | dst $\leftarrow$ dstvsrc | b, w, I |
| EOR | EA,Dn | 1011rrrwwwaaaaaa | $\mathrm{x} \times 00$ | $\mathrm{dst} \leftarrow \mathrm{dst} \oplus$ Src | b, w, I |
| CLR | EAs | 01000010 wwaaaaaa | 0100 | dst $\leftarrow 0$ | b, w, I |
| NEG | EAs | $01000100 w w a a a a a a$ | $x \mathrm{x} x \mathrm{x}$ | dst $\leftarrow 0$-dst | b, w, I |
| TST | EAs | 01001010 wwaaaaaa | $x \mathrm{x} 00$ | dst-0 | b, w, I |
| NOT | EAs | $01000110 w w a a a a a$ | - x x x x | $\mathrm{dst} \leftarrow \neg \mathrm{dst}$ | b, w, I |

aaaaaa is the 6-bit addressing mode specifier mmmrrr
www: B-100, W-101, L-110
xxx: B-000, W-001, L-010

## C

## Notes on MC68000 Arithmetic and Logic Instructions

All 2-operand ALU instructions are either $\mathrm{D} \rightarrow$ EA or EA $\rightarrow \mathrm{D}$. Which is it?

- Only one operand uses EA
- The other operand is always accessed by Data register direct
- The 3-bit mmm field specifies whether $D$ is the source or destination, and whether it is $B, W$, or $L$

| $\frac{\text { Byte }}{000}$ | $\frac{\text { Word }}{}$ |  | Long |  |
| :--- | :--- | :--- | :--- | :--- |
|  |  | Destination |  |  |
| 100 | 101 |  | 110 |  |

Ex: SUB EA, Dn: 1011 rrr mmm aaaaaa


Note: There are several exceptions to the rule above. See text and Mfr. Data sheet.

## C

## RTN Description of a Typical MC68000 Arithmetic Instruction

- Subtract is a typical arithmetic instruction
- Need a temporary register to hold an address

$$
\begin{aligned}
& \operatorname{tmp}\langle 31 . .0\rangle \text { : temporary register for address } \\
& \text { sub (:= op=9) } \rightarrow \text { ( } \\
& (\operatorname{md2} 22\rangle=0) \rightarrow \mathrm{D}[\mathrm{rg} 2] \leftarrow \mathrm{D}[\mathrm{rg} 2] \text { - opnd(md1, rg1): } \\
& (\operatorname{md} 2\langle 2\rangle=1) \rightarrow(\text { memval }(\mathrm{md} 1, \mathrm{rg} 1) \rightarrow(\mathrm{tmp} \leftarrow \mathrm{ea}(\mathrm{md} 1, \mathrm{rg} 1) ; \\
& \mathrm{M}[\mathrm{tmp}] \leftarrow \mathrm{M}[\mathrm{tmp}]-\mathrm{D}[\mathrm{rg} 2]): \\
& \neg \text { memval(md1, rg1) } \rightarrow \operatorname{rslt}(\mathrm{md} 1, \mathrm{rg} 1) \leftarrow \mathrm{rslt}(\mathrm{md} 1, \mathrm{rg} 1)-\mathrm{D}[\mathrm{rg} 2]) \\
& \text { ): }
\end{aligned}
$$

- This definition does not handle the condition codes


## MC68000 Arithmetic Shifts and Single Word Rotates

A
2/e

| Op. | $\underline{l n}$ Operands | Inst. word | XNZVC |
| :--- | :--- | :--- | :--- |
|  |  |  |  |
| ASd | EA 1110000 d 11 aaaaa | xxxxx |  |
| ASd | \#cnt,Dn $1110 c c c d w w 000 \mathrm{rrr}$ | xxxxx |  |
| ASd | Dm,Dn 1110 RRRdww100rrr | xxxxx |  |



ROd


1110011d11aaaaaa
$-x x 0 x$
ROd
ROd

| EA | $1110011 d 11$ aaaaa |
| :--- | :--- |
| \#cnt, Dn | 1110 cccdww 011 rrr |

$-x x 0 x$
Dm, Dn 1110RRRdww111rrr
$-x x 0 x$


- d is L or R for left or right shift, respectively
- EA form has shift count of 1
- ww is word size: 00-Byte, 01-Word, 10-Long Word
S MC68000 Logical Shifts and Extended Rotates

| Op. | Operands | Inst. word | XNZVC |  |
| :---: | :---: | :---: | :---: | :---: |
| LSd | EA | 1110001d11aaaaaa | xxx0x | SL |
| LSd | \#cnt, Dn | 1110 cccdww 001 rrr | xxx0x | 区 LSR |
| LSd | Dm, Dn | 1110RRRdww101rrr | xxx 0 x | $\xrightarrow[0]{\square} \rightarrow$ - ${ }^{\text {a }}$ |
|  |  |  |  | ROXL |
| ROXd | EA | $1110010 \mathrm{~d} 11 \mathrm{aa} a{ }^{\text {a }}$ | xxx 0 x | $\stackrel{\mathrm{x}}{\mathrm{Q}} \stackrel{\mathrm{R}_{\mathrm{ROXR}}}{ }$ |
| ROXd | \#cnt, Dn | 1110cccdww010rrr | xxx0x | $\cdots \mathrm{Dn} \rightarrow$ 人 |
| ROXd | Dm, Dn | 1110RRRdww110rrr | $\mathrm{xxx0x}$ |  |

- Field ww specifies byte, word, or longword
- N \& Z set according to result, C= last bit shifted out


## MC68000 Conditional Branch and Test Instructions

| Op. | Operands | Inst. word |
| :--- | :--- | :--- | Operation

- disp is dddddddd unless dddddddd $=0$, in which case it is contained in the extra word DDDDDDDDDDDDDDDD
- DBcc is used for counted loops with an optional end condition.
- "Decrement and branch until cond."
- Scc sets a byte to the outcome of a test



## C <br> Conditional Branches First Set Condition Codes, Then Branch <br> LOC:

- EQ tests the right condition codes for $=0$, as above, or $A=B$ following a compare, CMP A,B

| S | MC68000 Unconditional Control Transfers |
| :--- | :--- | :---: | :---: |
| A |  |

- Subroutine links push the return address onto the stack pointed to by A7 = SP


| Op. | Operands | Inst. word | Operation |
| :---: | :---: | :---: | :---: |
| RTR |  | 0100111001110111 | $\mathrm{CC} \leftarrow(\mathrm{SP})+; \mathrm{PC} \leftarrow(\mathrm{SP})+$ |
| RTS |  | 0100111001110101 | $\mathrm{PC} \leftarrow(\mathrm{SP})+$ |
| LINK | An,disp | $0100111001010 r r r$ DDDDDDDDDDDDDDDD | $\begin{aligned} & -(\mathrm{SP}) \leftarrow \mathrm{An} ; \mathrm{An} \leftarrow \mathrm{SP} ; \\ & \mathrm{SP} \leftarrow \mathrm{SP}+\mathrm{disp} \end{aligned}$ |
| UNLK | An | $0100111001011 r r r$ | $\mathrm{SP} \leftarrow \mathrm{An} ; \mathrm{An} \leftarrow(\mathrm{SP})+$ |

- Subroutine linkage uses stack for return address
- LINK and UNLK allocate and de-allocate multiple word stack frames


## Figure 3.6 Example Program to Search an Array

| CR | EQU | 13 |
| :--- | :--- | :--- |
| LEN | EQU | 132 |
|  | ORG | \$1000 |
| LINE | DS.B | LEN |
|  | MOVE.B \#LEN-1,DO |  |
|  | MOVEA.L \#LINE,A0 |  |
| LOOP | CMPI.B (A0)+,\#CR |  |
|  | DBEQ | DO,LOOP |
|  | (next instruction> |  |

```
;Define return character.
;Define line length.
;Locate LINE at 1000H.
;Reserve LEN bytes of storage.
;Initialize D0 to count-1.
;A0 gets start address of array.
;Make the comparison.
;Double test: if LINE[131-D0]\not=13
; then decr. D0; if D0\not=-1 branch
; to LOOP, else to next inst.
```

- Program searches an array of bytes to find the first carriage return, ASCII code 13


## C

## S Pseudo Operations in the MC68000 Assembler

- A Pseudo Operation is one that is performed by the assembler at assembly time, not by the CPU at run time.
- EQU defines a symbol to be equal to a constant. Substitution is made at assemble time.
Pi EQU 3.14
- DS.B (.W or .L) defines a block of storage
- Any label is associated with the first word of the block

Line DS.B 132

- The program loader (part of the operating system) accomplishes this
-more-


## C

## Pseudo Operations in the MC68000 Assembler (cont'd.)

- \#symbol indicates the value of the symbol instead of a location addressed by the symbol
MOVE.L \#1000, D0 ;moves 1000 to D0
MOVE.L 1000, D0 ;moves value at addr. 1000 to D0
- The assembler detects the difference and assembles the appropriate instruction.
- ORG specifies a memory address as the origin where the following code will be stored
Start ORG $\$ 4000$;next instruction/data will be loaded at ;address 4000 H .
- The Motorola assembler uses \$ in front of a number to indicate hexadecimal
- Character constants are in single quotes: 'X'


## C

## Review of Assembly, Link, Load, and Run

- At assemble time, assembly language text is converted to (binary) machine language
- They may be generated by translating instructions, hexadecimal or decimal numbers, characters, etc.
- Addresses are translated by way of a symbol table
- Addresses are adjusted to allow for blocks of memory reserved for arrays, etc.
- At link time, separately assembled modules are combined \& absolute addresses assigned
- At load time, the binary words are loaded into memory
- At run time, the PC is set to the starting address of the loaded module. (Usually the O.S. makes a jump or procedure call to that address.) <br> \section*{C <br> \section*{C <br> ${ }^{\text {c }}$ MC68000 Assembly Language Example: Clear a Block}

| MAIN | ... |  |  |
| :---: | :---: | :---: | :---: |
|  | MOVE.L | \#ARRAY, A0 | ; Base of array |
|  | MOVE.W | \#COUNT, D0 | ; Number of words to clear |
|  | JSR | CLEARW | ; Make the call |
|  | $\ldots$ |  |  |
| CLEARW | BRA | LOOPE | ; Branch for init. Decr. |
| LOOPS | CLR.W | (A0) + | ; Autoincrement by 2 |
| LOOPE | DBF | D0, LOOPS | ;Dec.D0,fall through if -1 |
|  | RTS |  | ;Finished. |

- Subroutine expects block base in A0, count in DO
- Linkage uses the stack pointer, so A7 cannot be used for anything else


## Exceptions: Changes to Sequential Instruction Execution

- Exceptions, also called interrupts, cause next instruction fetch from other than PC location
- Address supplying next instruction called exception vector
- Exceptions can arise from instruction execution, hardware faults, and external conditions
- Externally generated exceptions usually called interrupts
- Arithmetic overflow, power failure, I/O operation completion, and out of range memory access are some causes
- A trace bit =1 causes an exception after every instruction
- Used for debugging purposes


## Steps in Handling MC68000 Exceptions

- 1) Status change
- Temporary copy of status register is made
- Supervisor mode bit $S$ is set, trace bit $T$ is reset
- 2) Exception vector address is obtained
- Small address made by shifting 8 bit vector number left 2
- Contents of the longword at this vector address is the address of the next instruction to be executed
- The exception handler or interrupt service routine starts there
- 3) Old PC and Status register are pushed onto supervisor stack, addressed by A7' = SSP
- 4) PC is loaded from exception vector address
- Return from handler is done by RTE
- Like RTR except restores Status reg. instead of CCs


## Exception Priorities

- When several exceptions occur at once, which exception vector is used?
- Exceptions have priorities, and highest priority exception supplies the vector
- MC68000 allows 7 levels of priority
- Status register contains current priority
- Exceptions with priority $\leq$ current are ignored


## C D <br> Exceptions and Reset Both Affect Instruction Interpretation

- More processor state needed to describe reset and exception processing

Reset:
exc_req:
exc_lev $\langle 2 . .0\rangle$ :
vect $\langle 7 . .0\rangle$
exc :=exc_req $\wedge\left(e x c \_l e v\langle 2 . .0\rangle>\operatorname{INT}\langle 2 . .0\rangle\right)$ : There is a request, and the request
level is > current mask in status reg.

- exc_lev is the highest priority of any pending exception


## Exceptions are Sensed Before Fetching Next Instruction

```
Instruction_interpretation := (
Run \(\wedge \neg(\) Reset \(\vee\) exc \() \rightarrow(\mathrm{IR} \leftarrow \mathrm{Mw}[\mathrm{PC}]: \mathrm{PC} \leftarrow \mathrm{PC}+2)\); Normal execution state
Reset \(\rightarrow(\operatorname{INT}\langle 2 . .0\rangle \leftarrow 7: S \leftarrow 1: T \leftarrow 0\) : Machine reset
SSP \(\leftarrow \mathrm{MI}[0]: \mathrm{PC} \leftarrow \mathrm{MI}[4]:\)
Reset \(\leftarrow 0:\) Run \(\leftarrow 1\) );
Run \(\wedge \neg\) Reset \(\wedge\) exc \(\rightarrow\) (SSP \(\leftarrow\) SSP - 4; MI[SSP] \(\leftarrow\) PC; Exception handling
SSP \(\leftarrow\) SSP - 2; Mw[SSP] \(\leftarrow\) Status;
\(S \leftarrow 1: T \leftarrow 0: \operatorname{INT}\langle 2 . .0\rangle \leftarrow \operatorname{exc} \_\operatorname{lev}\langle 2 . .0\rangle:\)
\(\mathrm{PC} \leftarrow \operatorname{MI}\left[\operatorname{vect}\langle 7 . .0\rangle \# \mathrm{OO}_{2}\right]\) );
instruction_execution
```

- Reset starts the computer with a stack pointer from location 0 at the address from location 4


## Memory Mapped I/O

- No separate I/O space. Part of cpu memory space is devoted/reserved for I/O instead of RAM or ROM.
- Example: MC68000 has a total 24-bit address space. Suppose the top 32 K is reserved for I/O:


Notice that top 32K can be addressed by a negative 16-bit value.

## Memory Mapped I/O in the MC68000

- Memory mapped I/O allows uprocessor chip to have one bus for both memory and I/O
- Multiple wires for both address and data
- I/O uses address space that could otherwise contain memory
- Not popular with machines having limited address bits
- Sizes of I/O \& memory "spaces" independent
- Many or few I/O devices may be installed
- Much or little memory may be installed
- Spaces are separated by putting I/O at top end of the address space


## C



## S The SPARC (Scalable Processor Architecture) as a RISC Microprocessor Architecture

- The SPARC is a general register, Load/Store architecture
- It has only two addressing modes. Address =
- (Reg+Reg), or (Reg + 31-bit constant)
- Instructions are all 32 bits in length
- SPARC has 69 basic instructions
- Separate floating point register set
- First implementation had a 4 stage pipeline
- Some important features not inherently RISC
- Register windows:separate but overlapping register sets available to calling and called routines
- 32 bit address, big-endian organization of memory

S Fig. 3.9 The SPARC Processor State


 Copyright © 2004 Pearson Prentice Hall, Inc.


## Fig. 3.10 Register Windows: an Important Concept in SPARC


$C W P=N$



Copyright © 2004 Pearson Prentice Hall, Inc.


## SPARC Memory

RTN for the SPARC memory:
$\mathrm{Mb}\left[0 . .2^{32}-1\right]\langle 7 . .0\rangle$ : Byte memory;
$\operatorname{Mh}[\mathrm{a}]\langle 15 . .0\rangle:=\mathrm{Mb}[\mathrm{a}]\langle 7 . .0\rangle \# \mathrm{Mb}[\mathrm{a}+1]\langle 7 . .0\rangle:$ Halfword memory; $\mathrm{M}[\mathrm{a}]\langle 31 . .0\rangle:=\mathrm{Mh}[\mathrm{a}]\langle 15 . .0\rangle \# \mathrm{Mh}[\mathrm{a}+2]\langle 15 . .0\rangle:$ Word memory.

## C

## Register Windows Format the General Registers

- 32 general integer and address registers are accessible at any one time
- Global registers G0..G7 are not in any window
- G0 is always zero: writes to G0 are ignored, reads return 0
- The other 24 are in a movable window from a total set of 120
- On subroutine call, the starting point changes so that 24-31 before call become 8-15 after
- Regs. 8-15 are used for incoming parameters
- Regs. 24-31 are for outgoing parameters
- Current Window Pointer CWP locates reg. 8
- Overflow of reg. space causes trap


## SAVE, RESTORE and the Current Window

## Pointer

- CWP points to the register currently called G8
- SAVE moves it to point of the old G24
- This makes the old G24..G31 into the new G8..G15
- If parameters are placed in G24..G31 by the caller, the callee can get them from G8..G15
- When all windows are used, SAVE traps to a routine that saves registers to memory
- Windows wrap around in the available registers
- Window overflow "spills" the first window \& reuses its space


## SPARC Operand Addressing

- One mode computes address as sum of 2 registers; G0 gives zero if used
- The other mode adds sign extended 13 bit constant to a register
- These can serve several purposes
- Indexed: base in one reg., index in another
- Register indirect: G0+Gn
- Displacement: Gn+const, $\mathrm{n} \neq 0$
- Absolute: G0+const.
- Absolute addressing can only reach the bottom or top 4K bytes of memory

$$
\begin{aligned}
& \operatorname{op}\langle 1 . .0\rangle:=\operatorname{IR}\langle 31 . .30\rangle: \\
& \operatorname{disp} 30\langle 29 . .0\rangle:=\operatorname{IR}\langle 29 . .0\rangle: \\
& \mathrm{a}:=\operatorname{IR}\langle 29\rangle: \\
& \operatorname{cond}\langle 3 . .0\rangle:=\operatorname{IR}\langle 28 . .25\rangle: \\
& \operatorname{rd}\langle 4 . .0\rangle:=\operatorname{IR}\langle 29 . .25\rangle: \\
& \operatorname{op} 2\langle 2 . .0\rangle:=\operatorname{IR}\langle 24 . .22\rangle \text { : } \\
& \operatorname{disp} 22\langle 21 . .0\rangle:=\operatorname{IR}\langle 21 . .0\rangle \text { : } \\
& \operatorname{op} 3\langle 5 . .0\rangle:=\operatorname{IR}\langle 24 . .19\rangle: \\
& \operatorname{rs} 1\langle 4 . .0\rangle:=\operatorname{IR}\langle 18 . .14\rangle: \\
& \operatorname{opf}\langle 8 . .0\rangle:=\operatorname{IR}\langle 13 . .5\rangle: \\
& \mathrm{i}:=\operatorname{IR}\langle 13\rangle: \\
& \operatorname{simm} 13\langle 12 . .0\rangle:=\operatorname{IR}\langle 12 . .0\rangle: \\
& \operatorname{rs} 2\langle 4 . .0\rangle:=\operatorname{IR}\langle 4 . .0\rangle:
\end{aligned}
$$

## RTN for SPARC Instruction Formats

Instruction class, op code for format 1;
Word displacement for call, format 1 ;
Annul bit for branches, format 2a;
Branch condition select, format 2a;
Destination register for formats $2 \mathrm{~b} \& 3$;
Op code for format 2;
Constant for branch displacement or sethi;
Op code for format 3;
Source register 1 for format 3;
Sub-op code for floating point, format 3a; Immediate operand indicator, formats 3 b \& c; Signed immediate operand for format 3c;
Source register 2 for format 3 b .

## Fig. 3.11 SPARC Instruction Formats

3a. Floating point
3b. Data movement
3c. ALU

SPARC instruction formats

| 313029 | 0 |
| :--- | :--- |
| 01 | disp30 |


| 3130292825242221 |
| :--- |
| 00 a cond op2 disp22 <br> 00 rd   op2 |

Copyright © 2004 Pearson Prentice Hall, Inc.

- Three basic formats with variations



## RTN For SPARC Addressing Modes

$\operatorname{adr}\langle 31 . .0\rangle:=(\mathrm{i}=0 \rightarrow \mathrm{r}[\mathrm{rs} 1]+\mathrm{r}[\mathrm{rs} 2]:$
Address for load, store, $\mathrm{i}=1 \rightarrow \mathrm{r}[\mathrm{rs1} 1]+\operatorname{simm13}\langle 12 . .0\rangle$ \{sign ext.\}):
and jump;
calladr $\langle 31 . .0\rangle:=\mathrm{PC}\langle 31 . .0\rangle+\operatorname{disp} 30\langle 29 . .0\rangle \# 002: \quad$ Call relative address;
$\operatorname{bradr}\langle 31 . .0\rangle:=\mathrm{PC}\langle 31 . .0\rangle+\operatorname{disp} 22\langle 21 . .0\rangle \# 002\{\operatorname{sign}$ ext.\}:
Branch address.

## S RTN For SPARC Instruction Interpretation

instruction_interpretation := $(I R \leftarrow M[P C]$; instruction_execution; update_PC_and_nPC; instruction_interpretation):

## C

## S Tbl. 3.8 SPARC Data Movement Instructions <br> D A 2/e

| Inst. | Op. | OPCODE |
| :---: | :---: | :---: |
| Idsb | 11 | 001001 |
| Idsh | 11 | 001010 |
| Idsw | 11 | 001000 |
| Idub | 11 | 000001 |
| Iduh | 11 | 000010 |
| Idd | 11 | 000011 |
| stb | 11 | 000101 |
| sth | 11 | 000110 |
| stw | 11 | 000100 |
| std | 11 | 000111 |
| swap | 11 | 001111 |
| ar | 10 | 000010 |
| sethi | 00 | Op2=100 |

Meaning
Load signed byte
Load signed halfword
Load signed word
Load unsigned byte
Load unsigned halfword
Load doubleword
Store byte
Store halfword
Store word
Store double word
Swap register with memory
Rdst $\leftarrow$ Rsrc1 OR Rsrc2 (or immediate)
High order 22 bits of Rdst $\leftarrow$ disp22

## C

## S Register and Immediate Moves in the SPARC

- OR is used with a G0 operand to do register to register moves
- To load a register with a 32 bit constant, a 2 instruction sequence is used
SETHI \#upper22, R17
OR R17, \#lower10, R17
- Double words are loaded into an even register and the next higher odd one
- Floating point instructions are not covered, but the 32 FP registers can hold single length numbers, or 16 64-bit FP, or 8 128-bit FP numbers


## C

## S Tbl. 3.9 Typical SPARC Arithmetic Instructions <br> A <br> 2/e

| Inst. | OPCODE | Meaning |
| :---: | :---: | :---: |
| add | 0X 0000 | Add or add and set condition codes |
| addc | 0X 1000 | Add with carry: set CCs or not |
| sub | 0X 0100 | Subtract: set CCs or not |
| subc | 0X 1100 | Subtract with borrow: set CCs or not |
| mulscc | 101100 | Do one step of multiply |

- All are format $3, O p=10$
- CCs are set if $X=1$ and not if $X=0$
- Both register and immediate forms are available
- Multiply is done by software using MULSCC or using floating point instructions
- Multiply is hard to do in one clock but multiply step is not


## C

## S Tbl. 3.10 SPARC Logical and Shift Instructions

| Inst. | OPCODE |  | Meaning |
| :--- | :--- | :--- | :--- |
| AND | OS 0001 |  | AND, set CCs if $S=1$ or not if $S=0$ |
| ANDN | $0 S 0101$ |  | NAND, set CCs or not |
| OR | $0 S 0010$ |  | OR, set CCs or not |
| ORN | $0 S 0110$ |  | NOR, set CCs or not |
| XOR | $0 S 0011$ |  | XNOR(Equiv), set CCs or not |
| SLL | 100101 |  | Shift left logical, count in RSRC2 or imm13 |
| SRL | 100110 |  | Shift right logical, count in RSRC2 or imm13 |
| SRA | 100111 |  | Shift right arithmetic, count as above |

- All instructions use format 3 with op=10
- Both register and immediate forms are available
- Condition codes set if $S=1$ \& undisturbed if $S=0$


## Tbl. 3.11 SPARC Branch and Control Instructions

| Inst. | Fmt. Op |  | $\begin{aligned} & \text { OPCODE } \\ & \text { or Op2 } \end{aligned}$ | Meaning |
| :---: | :---: | :---: | :---: | :---: |
| ba | 2 | 00 | 010 | Unconditional branch |
| bcc | 2 | 00 | 010 | Conditional branch |
| call | 1 | 01 |  | Call \& save PC in R15 |
| jmpl | 3 |  | 111000 | Jmp to EA, save PC in Rdst |
| save | 3 |  | 111100 | New register window, \& ADD |
| restore | 3 |  | 111101 | Restore reg. window, \& ADD |

Some condition fields:

| Inst. | COND | Inst. | COND | Inst. | COND | Inst. | COND |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ba | 1000 | bne | 1001 | be | 0001 | ble | 0010 |
| bcc | 1101 | bcs | 0101 | bneg | 0110 | bvc | 1111 |
| bvs | 0111 |  |  |  |  |  |  |



2/e
.begin
. org

| progl: | Idw | [ x ], | \%r1 | ! load a word from $\mathrm{M}[\mathrm{x}]$ into register \%r1 |
| :---: | :---: | :---: | :---: | :---: |
|  | Idw |  | \%r2 | ! load a word from M[y] into register \%r2 |
|  | addcc | \%r1, | \%r2, \%r3 | $!\% \mathrm{r} 3 \leftarrow \% \mathrm{r} 1+\% \mathrm{r} 2$; set CCs |
|  | st | \%r3, | [z] | ! store sum into M[z] |
|  | jmpl | \%r15 | +8, \%ro | to caller |
|  | nop |  |  | $!$ branch delay slot |
| x : | 15 |  |  | ! reserve storage for $\mathrm{x}, \mathrm{y}$, and z |
| $y$ : | 9 |  |  |  |
| z: | 0 |  |  |  |

Note different syntax for SPARC.
Note r15 contains return address-placed there by the OS in this case.


## Pipelining of the SPARC Architecture

- Many aspects of the SPARC design are in support of a pipelined implementation
- Simple addressing modes, simple instructions, delayed branches, load/store architecture
- Simplest form of pipelining is fetch/execute overlap-fetching next inst. while executing current inst.
- Pipelining breaks inst. processing into steps
- A step of one instruction overlaps different steps for others
- A new inst. is started (issued) before previously issued instructions are complete
- Instructions guaranteed to complete in order
$S^{S}$
$\mathbf{D}^{\mathbf{A}} / \mathrm{e}^{-}$


## Fig. 3.14 The SPARC MB86900 Pipeline

Clock Cycle

|  | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Instr. 1 | Fetch | Dec. | Exec. | Write |  |  |  |
| Instr. 2 |  | Fetch | Dec. | Exec. | Write |  |  |
| Instr. 3 |  |  | Fetch | Dec. | Exec. | Write |  |
| Instr. 4 |  |  |  | Fetch | Dec. | Exec. | Write |

- 4 pipeline stages are Fetch, Decode, Execute, and Write
- Results are written to registers in Write stage


## Pipeline Hazards

- Will be discussed later, but main issue is:
- Branch or jump change the PC as late as Exec. or Write, but next inst. has already been fetched
- One solution is 'Delayed Branch'
- One (maybe 2) instruction following branch is always executed, regardless of whether branch is taken
- SPARC has a delayed branch with one 'delay slot", but also allows the delay slot instruction to be annulled (have no effect on the machine state) if the branch is not taken
- Registers to be written by one instruction may be needed by another already in the pipeline, before the update has happened (Data Hazard)


## CISC vs. RISC: Recap

- CISCs supply powerful instructions tailored to commonly used operations, stack operations, subroutine linkage, etc.
- RISCs require more instructions to do the same job
- CISC instructions take varying lengths of time
- RISC instructions can all be executed in the same, few cycle, pipeline
- RISCs should be able to finish (nearly) one instruction per clock cycle


## Key Concepts: RISC vs. CISC

- While a RISC machine may possibly have fewer instructions than a CISC, the instructions are always simpler. Multi-step arithmetic operations are confined to special units.
- Like all RISCs, the SPARC is a load/store machine. Arithmetic operates only on values in registers.
- A few, regular, instruction formats and limited addressing modes make instruction decode and operand determination fast.
- Branch delays are quite typical of RISC machines and arise from the way a pipeline processes branch instructions.
- The SPARC does not have a load delay, which some RISCs do, and does have register windows, which many RISCs do not.


## Chapter Summary

- Machine price/performance are the driving forces.
- Performance can be measured in many ways: MIPS, execution time, Whetstone, Dhrystone, SPEC benchmarks.
- CISC machines have fewer instructions that do more.
- Instruction word length may vary widely
- Addressing modes encourage memory traffic
- CISC instructions are hard to map onto modern architectures
- RISC machines usually have
- One word per instruction
- Load/store memory access
- Simple instructions and addressing modes
- Result in allowing higher clock cycles, prefetching, etc.

