Microprocessors and Microcontrollers (part 2)

Home | Articles | Forum | Glossary | Books

AMAZON multi-meters discounts AMAZON oscilloscope discounts

(cont from part 1)

7. Using Assembly Language

The size of the instruction set and the types of instructions depend on the category of microprocessor, accumulator based or register based.

7.1 Accumulator-Based Microprocessors

Accumulator-based microprocessors carry an accumulator register that is involved in a majority of its internal operations. Therefore, most of the instructions in the instruction set refer to the accumulator (e.g., the 6502 is an accumulator-based microprocessor). The purpose of all the microprocessors internal registers of 6502 is predefined by the manufacturer. Therefore, these are special purpose registers. The programmer can define the use of a general purpose register. In the instruction set, many instructions perform their operations in relation to the accumulator, so the instruction set of these microprocessors is smaller.

7.2 Register-Based Microprocessors

The internal register bank of register-based microprocessors consists of both general purpose and special purpose registers. The programmer is free to use general purpose registers as appropriate to the specific case. For example, the Z80's A, B, C, DE, and HL registers all are general purpose ones. B and C may be used as a pair or individually, as may the D and E and H and L registers.

In general when the number of registers in a microprocessor increases, the instruction set grows in size. Therefore, Z80, a register-based microprocessor, has a bigger instruction set than a 6502 or M6800, which are accumulator based.

7.3 Mnemonics and the Instruction Set

A mnemonic represents a long instruction. For example, the mnemonic LDA means "load accumulator with a specified data byte." The set of all commands to which the microprocessor responds is known as its instruction set, which consists of all the mnemonics and their operational codes. A typical instruction consists of two parts: the op code (operational code) and the operand. The op code specifies the operation to be carried out (e.g., LDA, TAX, INC). The operand specifies the location of the data to carry out this operation. Therefore, it depends on the addressing mode used in the instruction.

The instructions of any microprocessor can be broadly classified into three groups: data transfer instructions, arithmetic and logic instructions, and test and branch instructions.

7.3.1 Data Transfer Instructions

Data transfer instructions help the programmer to transfer data to or between internal registers, registers and memory, registers and input/output devices, or memory to memory. Data can be transferred between a microprocessor's internal registers at high speed. Short, single-byte instructions are available for this task.

Some examples are TAX (transfer accumulator to register X) of 6502 and LD X, A (load X register from accumulator) of Z80. Memory to register instructions transfer data from a memory location to an internal register and vice versa. Table 3 provides a few examples from a 6502 environment.

In register-based microprocessors, I/O operations are isolated from memory operations and use special instructions. This is called the isolated I/0 method.

Table 4 lists some examples.

===

TABLE 3 Examples of Memory to Register Transfers for the 6502 Processor

Register Mnemonic Typical Instruction Remarks

A LDA; LDA 50H X LDX; LDX 80H Y STY; STY F0H Loading A from memory location 0050H Loading X from memory location 0080H Storing Y in memory location 00FOH

===

TABLE 4 Examples of Register and I/O Device Transfers with Isolated I/O

Register Mnemonic Typical Instruction Remarks

A OUT; OUT A, 02H A IN; IN A, 03H Sending the contents of A to port 02H Transferring a byte from port 03H to A

===

TABLE 5 Examples of Register and I/O Device Transfers with Memory Mapped I/O

Register Mnemonic Typical Instruction Remarks

A STA; STA 02H A LDA; LDA 03H

Sending the contents of A to port 02H

Transferring a byte from port 03H to A

===

But accumulator-based microprocessors use a memory-mapped I/0 method, in which there is no differentiation between memory operations and I/O operations. The same instructions are used for both types of operations. Ports also are treated as memory locations. Table 5 lists some examples.

In this format 0002 and 0003 may be memory locations or I/O ports, so a port and a memory location cannot have the same address, unlike in the isolated I/O method. Therefore, a portion of the address map needs to be reserved for I/O ports.

Memory to memory instructions transfer data bytes directly from one memory location to another. They take more time to execute. For instance, the MOV (Move) instruction of the Motorola CPU08 can transfer data directly from memory to memory.

7.3.2 Arithmetic and Logic Instructions

Five subcategories can be identified in this group of instructions: arithmetic instructions, logic instructions, shift and rotate instructions, increment/decrement instructions, and comparisons.

7.3.2. 1 Arithmetic Instructions

To explain the arithmetic instructions, let us take some examples for 6502 and Z80 instruction sets.

The 6502 has no direct addition or subtraction, only ADC (add with carry) and SBC (subtract with carry). An ADC instruction adds the contents of register A to the contents of the necessary location selected plus the value of Cflag. A simple method of expressing the steps of an instruction is to use a symbolic expression:

A + M + C flag ~ A

This indicates that the accumulator contents, the contents of the selected memory location, and the C flag are added to obtain the result; then the result is transferred back to the accumulator. Therefore, the programmer should make sure that the C flag is 0 prior to doing an 8-bit addition. For this purpose the CLC (clear carry flag) instruction is available. The ADC should follow the CLC when doing an 8-bit addition. When adding a 16-bit number, CLC is not necessary.

In the Z80 and 8080 family, the register pair HL is used as a memory pointer.

HL must point to a memory location on which the operation is to be carried out.

Microprocessors like the 6800 and Z80 have a direct addition instruction as well.

In the Z80, instruction LD (in format LD HL, ByteH) loads HL with the necessary data byte.

LD HL, 2400H load HL with word 2400H ADD (HL) add the byte in the memory location pointed at by HL to the number in the accumulator.

Similarly, when subtracting 8-bit numbers with the 6502, the programmer should remember to set the carry flag beforehand using the SEC (set carry flag) command. Operation of the SBC (subtract with carry) command can be symbolized as A -M -C --+ A. Therefore, the SEC should precede SBC when doing 8-bit subtraction. Table 4-6 shows how several microprocessors handle some common instructions.

7.3.2.2 Logic Instructions

Logic instructions include AND, OR, and Exclusive OR operations.

Table 7 summarizes different formats used by three typical microprocessors for these logic instructions.

TABLE 6 Comparison of Formats for Arithmetic Instructions

TABLE 7 Comparison of Formats in Logic Instructions

A bit-by-bit logic AND operation is performed by the processor in 2 bytes.

Furthermore, an AND operation can be used to mask unwanted bit positions in a byte. For example, LDA 2400H AND #00001111B STA 2400H BRK

The byte at address 2400H is not known. But, after this operation, the first 4 bits become 0. These positions get masked:

Assume (2400) = 11011110

00001111

00001110

A bit-by-bit logic OR operation is performed by the processor in 2 bytes.

Furthermore, an OR operation can be used to "force" logic ones into certain bit positions in a byte. For example,

LDA PORTA ORA #00000001B STA PORTA BRK

This will set the LSB in the byte read from PORTA without affecting other bit positions.

A bit-by-bit exclusive OR (EOR or XOR) operation is performed in 2 bytes, which is useful for comparisons. If the 2 bits are not equal, the result bit will be set. This operation can be used for checking bit positions in ports or memory locations. In the 6502, it also can be used to complement a number (negate), since the 6502 has no direct negate command.

7.3.2.3 Shift and Rotate Instructions

Shift and rotate instructions include shift left, shift right, rotate left, and rotate right. In shift operations, all bits are shifted to the left or right by one bit position.

The bit falling out is sent to the carry flag. Table 8 summarizes different formats used by three typical microprocessors for these logic instructions.

In arithmetic shift operations, the sign bit is preserved during rotation. In logical shift operations, all the bits including the sign bit are shifted or rotated.

TABLE 8 Comparison of Formats in Shift and Rotate Instructions

TABLE 9 Comparison of Formats in Increment and Decrement Instructions

7.3.2.4 Increment and Decrement Instructions

Increment and decrement instructions allow us to increase or decrease the value of a selected register or a memory location by 1 (see Table 9).

TABLE 10 Comparison of Formats in Compare Instructions

7.3.2.5 Compare Instructions

The compare operation has no arithmetic result. It compares two given bytes and decides whether the first byte is less than, equal to, or greater than the second byte. Result of the comparison is stored in the flag register. In the 6502, the CMP (compare) instruction is used. For example, Load Accumulator with 50 • LDA #50H

Compare A with the value in address 0050 • CMP 50H

After the operation read the C and Z flags to find the result. If Z=I (SET) C=I C=0

• 2 numbers are equal

; A ~> Other Number

; A < Other Number

But, to check the A ~> other number condition, we need to check two flags in sequence. First check the Z flag and then the C flag (i.e., Z = 0 and C = 1). In addition, the CPX (compare X register) and CPY (compare Y register)

instructions compare the X and Y register contents with another number (see Table 4-10). For example,

To compare X register with the value in address 2401H : CPX 2401H

7.3.3 Test and Branch Instructions


FIG. 10 The typical flow of a conditional jump


FIG. 11 Flowchart for checking N flag to make the jump

TABLE 11 Comparison of Formats in Jump and Branch Instructions

The normal sequential flow of the program can be changed using jump or branch instructions. Jumps can be unconditional or conditional (sometimes called branches). Unconditional jumps can be considered "jump always" instructions. For example, assume that, at address 2400H, we write an instruction to jump unconditionally to address 3600H. The command in a 6502 will be JMP 3600H. The machine code for JMP 3600H is 4C,00,36.

A conditional jump checks a flag to decide whether or not to make the jump (see FIG. 10). When this condition is satisfied (i.e., a flag is set or cleared), the jump occurs.

These instructions allow us to select one of two alternative courses of action, depending on the result of the test. The most important flags and the instructions available to test them and make a branching decision in a 6502 microprocessor follow:

Flag Z BEQ ; Branch if equal zero (Z=I) BNE ; Branch if Z=0 C BCS ; Branch if C=set(1)

BCC ; Branch if C=clear(0) N BPL ; Branch if plus (N=0) BMI ; Branch if minus (N=I) (See FIG. 11)

7.4 Addressing Modes

A microprocessor could have many different modes of addressing. The format of an instruction follows the pattern:

Instruction = OPCODE followed by OPERAND

The op code specifies the operation to be carried out. The operand indicates to the microprocessor where the data can be found in memory. The different ways of specifying the location of the data byte are called addressing modes. Some typical addressing modes are implied addressing (inherent/implicit addressing), immediate addressing, short addressing (zero page addressing/direct addressing), absolute addressing (extended addressing), indexed addressing, relative addressing, indirect addressing, or some combinations of these (e.g., indirect indexed addressing).

7.4.1 Implied Addressing

In implied addressing, the op code itself implies the location of the data.

These generally are single-byte instructions that operate on the internal registers of the microprocessor. TAX (transfer A to X) in the 6502 implies that the data byte is in register A. In INX (increment X register), it’s clear that the data byte is in the X register.

7.4.2 Immediate Addressing

In immediate addressing, the data byte immediately follows the op code. In the 6502, For example, Code=l A9 I 24 I In this example, LDA #24H. To take another example, where ADC #08H (see FIG. 12), Code=l 69 I 08 I This addressing mode can be used only if the data byte is known at the time of writing the program.


FIG. 12 Implementation of the ADC #08H instruction

7.4.3 Zero Page Addressing

In zero page addressing (short/direct), the op code is followed by an 8-bit address (short address): code -Iop code I Location I Only the location is given without a page number. Then the microprocessor seeks the data in the given location on page 00. For example, where ADC 25H, Code= 65125 1 As another example, LDA 25H, The op code now will take the data byte in the 25th location of page 0 and add it to the accumulator.

7.4.4 Absolute Addressing

In absolute (extended) addressing the absolute address of the data byte is given in the instruction. The format is as follows: Code -I Op code I Location I Page I Consider an example instruction from the 6502 (see FIG. 13): Code-I 6D I S0 02 I That is, ADC 0250. In this example, the microprocessor searches the 50th location of page 02 for data. Therefore, in absolute addressing, the full address of the data should be specified by the programmer.

Consider the following examples from the 6502. When you want to rotate the content of address 20FFH by one bit position to the right, Code= 1 6E FF 1 20 This is written in mnemonic form as ROR 20FFH To compare the value of the X register with the value of the address 2050, Code:l EC I S0 1 20 1 CPX 2050H is the mnemonic form for this instruction.


FIG. 13 Implementation of absolute addressing

7.4.5 Relative Addressing

Relative addressing is used to make a relative jump with respect to the current position of the program counter. It’s used only by test and branch instructions. In a test and branch instruction, if the test fails, the program counter (PC) remains the same and no jump is made. If the test is successful and the condition is satisfied, the displacement (jump length) value is added to the PC and the jump is made. Syntax of the instruction is,

I Op code displacement I

As an example, if we take M6800, code for the instruction BNE 12H is

I 26 ! 12 I

First, the Z flag must be checked. Then, depending on the result, one of the two courses of action will be taken. FIG. 14 depicts relative addressing (a forward jump). OFFSET = EFFECTIVE ADDRESS-PROGRAM COUNTER

= 0039H--0027H

= 0012 S


FIG. 14 Relative addressing, forward jump

Here, when the jump is made, the microprocessor expects to find a new instruction in the destination. The preceding example is a forward jump of 12 locations. Next, consider a backward jump in the M6800 (see FIG. 15): Check the Z flag and, if it’s not set, jump backward 45 locations.

(45 Decimal = 2D Hex)

Normally, the base address is stored in the index register and offset is given in the instruction. But the 6502 does it another way. It keeps the offset in the index register, since it has only a single 8-bit index register: LDA 6000, X Code = I BD I 00 1 60 I In the following sample program, index addressing is used to access successive memory locations in a memory table:


FIG. 15 Relative addressing, a backward jump

PROGI LOOP1 LDX #16D LDA 6000, STA PORTA DEX BNE LOOP1 BRK

In this program, 16 data bytes of a memory table are sent to port A in sequence.

The offset in the X register is incremented in each pass through the loop.

In the 6502 microprocessor, zero page addressing and index addressing can be combined if the base address is on page 00. For example, where LDA 60, X if X = 02 effective address = 0060 + 02 = 0062

This is to similar to normal index addressing except that the base address is on page 00. But zero page addressing saves 1 byte on the instruction, which can save a significant amount of time when program loops are running.

7.4.7 Indirect Addressing

Indirect addressing uses the following syntax" I OP CODE J LOCATION NUMBER I PAGE NUMBER I The address given in the instruction is the address of the data. The microprocessor reads the address of the data from the specified location and uses it to locate the data byte. Therefore, it has to read the address of the data from the address given in the instruction. Consider the following indirect jump in a 6502: JUMP (6000)

Code-I 6C I 00 I 60 I

Now the microprocessor searches for the address of the data byte in memory location 6000H. Once this address is read, the data byte can be located. This method is called memory indirection. In register-based microprocessors, another method, register indirection, commonly is available. A memory pointer register (e.g., the HL pair in the Z80) is preloaded with the address of the data byte and the instruction refers only to this pointer register; For example, LD A, (HL) in the Z80. In this case, the microprocessor reads the data byte pointed to by the address in the memory pointer register HL.

7.4.8 Combinations of Basic Addressing Modes

Some combinations of the basic addressing modes provide sophisticated addressing methods. Commonly combined areas are index indirect addressing and indirect index addressing. The details of such cases are given in user manuals of microprocessors.

7.5 Program Execution

7.5.1 Stages of Execution

The microprocessor must go through two distinct steps when carrying out an instruction: a fetch cycle and an execute cycle.

The fetch cycle brings the op code of the instruction from the memory to the microprocessor. Op code is loaded to the instruction register inside the microprocessor. The execute cycle decodes and carries out the instruction. At the start of the fetch cycle, the microprocessor deposits the contents of the program counter on the address bus, then it issues a read command to the memory. The memory now searches for the location. Meanwhile, the microprocessor increments the program counter in preparation for the next cycle. After the short access time, the memory deposits the op code on the data bus. Now the microprocessor loads the op code to the instruction register.

The fetch cycle follows these steps:

1. PC --+ addresses bus.

2. R/W--1 (read command).

3. PC + I --+ PC.

4. The op code is deposited on the data bus by the memory subsystem.

5. The microprocessor collects the op code and loads it into the instruction register.

The execute cycle is different for each instruction. The control unit collects the instruction from the instruction register (IR) and decodes it, using its microcode decoder. By decoding the op code, it finds the following information" What operation is requested.

What addressing mode is used.

Where to store the results.

Where the next instruction will be.

The decoded information is used by the control unit to generate a set of system control signals to carry out the operation.

7.5.2 Execution of Typical Instructions

Consider the following examples of an M6800 microprocessor executing some common instructions. In the first example, the load accumulator from memory is using an immediate addressing mode and LDA #30H. The fetch cycle steps are

1. PC ~ address bus.

2. R/W = 1 (read operation).

3. PC+I ~PC.

4. The op code comes on the data bus; the microprocessor collects and loads it to the IR.

The execute cycle steps are:

1. Decode the op code and identify the operation.

2. Deposit the address in PC on address bus (PC now contains the address of the data byte)

3. Issue the read command.

4. Search the memory for the data (meanwhile, the program counter is incremented).

5. The memory subsystem deposits the data byte on the data bus.

6. The microprocessor collects it and loads it into the accumulator.

Table 12 shows the steps taken in executing an instruction using immediate addressing.

In the second example, the load accumulator instruction uses extended (absolute) addressing. Table 13 shows the steps taken in executing an instruction using absolute addressing. The instruction has 3 bytes, and LDA 0020.

TABLE 12 Steps in the Execution of an Instruction-- Using the Immediate Addressing Mode

TABLE 13 Steps in the Execution of an Instruction Using the Absolute Addressing Mode

For the third example, the microprocessor stores the accumulator instruction using extended (absolute) addressing. Table 14 shows the steps taken.

Instruction: STA 2015H During cycle no. 4, the microprocessor is internally preparing for the write operation. In the fourth example, the microprocessor executes an add instruction using the immediate addressing mode. Table 15 shows the steps taken in executing the instruction ADD #30. In the fifth example, the microprocessor executes a subtract instruction using zero page (direct) addressing. Table 16 shows the steps taken in executing the instruction SUBA 30:

Op code Address (zero page) I

7.6 Program Creation

A programmer follows five steps in creating a program: defining the problem, specifying an algorithm, preparing a flowchart of the operations, writing the program in code, and testing and debugging the results.

The requirements are clearly defined at the start to ensure that the final program fulfills all the needs. The following information is unearthed at this stage:

TABLE 14 Steps in the Execution of a Memory Store Instruction

TABLE 15 Steps in the Execution of an Addition Instruction Using the Immediate Addressing Mode

TABLE 16 Steps in the Execution of a Subtraction Instruction Using the Direct Addressing Mode

1. The exact function of the program.

2. The number and types of program input and output.

3. The execution time restrictions.

4. The level of accuracy expected in the result.

5. What memory will be occupied.

6. What action will be taken if operator errors occur.

A program that handles mistakes very well and provides the user guidelines is considered a robust program.

The algorithm is a sequence of steps that detail the procedure to solve the problem. A problem can have more than one algorithm.

The flowchart is a graphical representation of the algorithm. Preparing a flowchart minimizes logic errors and duplications and supports documentation.

Coding is the process of translating the steps of the flowchart to instructions in the computer language.

The coded program is entered into the system, and a test run is carried out.

It’s difficult to write a completely error-free program in one attempt. Program errors (bugs) occur even to very experienced programmers. Finding them and fixing them are called debugging. Debugging tools are available to assist in this process.

8. Single-Chip Microcontrollers and Embedded Processor Core Applications

8.1 Single-Chip Microcontrollers

A microcontroller is a single-chip microcomputer. The advancement of IC manufacturing technology has made possible the integration of all three subsystems of the central controller on one silicon chip. As a result, the IC count of a system becomes smaller, reducing its cost, size, and weight and making it more compact and reliable. Like microprocessors, microcontrollers appear in the market as families. Different members of a family differ in the provision of memory, input and output ports, the speed of operation, and computing power.

Therefore, it’s easy to find a family member to closely fit into the requirements of a system designer. Microcontrollers are popular in dedicated microelectronic systems (e.g., industrial control systems, domestic appliances). In a microcontroller, the following components generally are integrated on the same semiconductor chip: a processor, input/output interface circuits, clock generators, and memory circuits. Some common examples include the Intel 8035, 8048, and 8051 and the National COP420L. Consider the internal architecture of the popular 8051 microcontroller ( FIG. 16), an accumulator-based microcontroller. The programmer has 255 instructions available to command the processor. A basic instruction cycle takes 12 clocks; however, Dallas Semiconductors has designed the instruction-execution circuitry to reduce the instruction cycle to 4 clocks.

It has four banks of 8-bit registers in its on-chip RAM, which helps context switching. These registers reside within the lower 128 bytes of internal RAM, along with a bit-operation area and scratch pad RAM. These lower bytes can be addressed directly or indirectly using an 8-bit address. The upper 128 bytes of on-chip data RAM encompass two overlapping address spaces. One space is for directly addressed special function registers (SFRs) and the other space is for indirectly addressed RAM or stack. The SFRs define peripheral operations and configurations. The 8051 also has 16-bit addressable bytes of on-chip RAM for flags or variables. The 8051 processor can directly address 64 kB of data memory.

However, software tools with an external latch can extend it to any multiple of 64 kB pages. The software seamlessly handles all page transitions. Register indirection uses an 8-bit register for on-chip RAM addresses. Off-chip addresses need a 16-bit data-pointer register (DPTR). The DPTR cannot be indexed but can be incremented.

The 8051 performs extensive bit manipulation via special instructions, such as set, clear, complement, and jump on bit set or jump on bit clear, only for a 16-byte area of RAM and some SFRs. It also can execute AND or OR instructions with a carry bit. Dallas versions have variable-length move external data instructions, Math functions include add, subtract, increment, decrement, multiply, divide, complement, rotate, and swap nibbles. Some of the Siemens devices have a hardware multiplier/divider for 16-bit multiplication and 32-bit division (see FIG. 17). Microcontrollers offer a low-cost solution especially to the microelectronic system designers who design dedicated systems.

8.2 Embedded Processor Core Applications

Microcontrollers may not have the right mix of internal components to perfectly match the system designer's requirements; therefore, they may not be a good fit for some applications. To solve this problem, many suppliers have gone back and stripped off all functional support blocks from the integrated CPU. This results in a derivative called the processor core. Then the suppliers placed the CPU and support functions in megacell libraries. Those libraries can be used to assemble the optimum single-chip microcontroller for a particular customer. The product is customized, and in many cases provides a unique solution. Furthermore, suppliers can design application-specific integrated circuits (ASIC) using the processor cores and megacell libraries. The technology has evolved into the construction of application-specific cores as well. A core microprocessor may be added to an ASIC. Reduced instruction set computing cores are popularly used to couple with ASICs. A core microprocessor contains only those logic elements needed to execute its instruction set, such as microcode ROM, microcode state machines, a program counter, special function registers, ALU, and interrupt logic.


FIG. 16 Intel 8051 microcontroller. (of Intel Corporation.)




FIG. 17 An Industrial version of 8051 by Siemens Corporation. (a) Pin Assignment (Source: Siemens Microcontroller Databook) (b) Block Diagram


FIG. 18 Intel 80386. (of Intel Corporation.)

An embedded application uses a processor to perform a specific task. Both microcontrollers and microprocessors can be used as embedded controllers.

Microcontrollers are designed specifically to perform embedded tasks. Today, it has become easier and more convenient for a new system designer to embed a standard processor core into a new product design. An embedded processor core is a feature or component of an application. It provides standardized mechanisms for feeding and displaying data into a system. It’s can provide hardware and software flexibility in the system being developed. Buying an embedded processor core is cheaper and much more convenient than designing that part. Then the compatibility with standard desktop computers is automatically available.

Before embedding a processor core into a product, the designer should decide the degree of personal computer compatibility required. The processor (and core logic chip set) chosen by the designer determines the level of personal computer compatibility in the embedded processor core. To be fully compatible, the embedded processor core must support the complete PC-AT specification.

For example, the Intel 80386 is now in the embedded processor core market. A block diagram is shown in FIG. 18.

The 386 developed a strong presence in embedded processor core applications. It has a register-based architecture with four general purpose registers and four index/pointer registers, supplemented by six 16-bit segment registers and two 32-bit status and control registers.

The main features of 80386 include:

1. Segmentation, 64 kB segments to extend addressing to 1 MB. 2, Segment limits extended to the full 4 GB addressing range.

3. Protected mode addressing, the base address adds to the 32-bit effective address, producing a 32-bit linear address, which is used as a physical address or a linear-page address.

4. Six debugging registers, four code/data breakpoint registers, and two control registers. The breakpoint registers can be set with addresses for halting execution on a program or data access.

5. Power management, in which a system management mode and a power management mechanism are available. System management operating modes reduce chip power dissipation. Integrated versions of the 386 ( For example, Intel's 386EX) have idle and power down modes. The idle mode discontinues CPU processing but keeps the peripherals active. The power down mode shuts down the entire chip.

6. The 386 instruction set is a superset of the 8086/186. The 386 has seven additional instructions to support the system management mode.

9. RISC vs. CISC Microprocessor Architecture

Microprocessors fall into two main categories, according to the way they process instructions: the PowerPC is a RISC processor and the Pentium is a CISC. By reducing the number of instructions that a CPU supports and thereby the complexity of the chip, it's possible to make individual instructions execute faster and achieve a net gain in performance even though more instructions might be required to accomplish a task.

When a program runs, the microprocessor reads, or fetches, the instructions one by one and executes them. For a 233 MHz Pentium, one clock is less than 5 ns. Programs can be made to run faster by increasing the clock speed or decreasing the number of clock cycles required to execute an instruction.

Traditionally, CISC microprocessors like the x86 have required anywhere from 1 to more than 100 clocks to complete an instruction. For example, the multiply instruction on the 8088 that powered the original IBM personal computer required up to 133 clocks, to multiply on a 486 required as few as 13 clocks, and register-to-register transfer requires just 1.

The original goal of RISC was to limit the number of instructions in the instruction set. More instructions are needed to carry out a job, but if the instruction timings are low enough, the RISC chip can complete the task earlier than a CISC chip. There is a trade-off between instruction set complexity and instruction execution time. RISC seeks to strike a better balance between the two to produce a faster microprocessor. The transistors it saves on large instructions can be used for cache, pipelines, and more registers.

Today's RISC chips often have richer and more complex instruction sets than CISC chips. The PowerPC 601, for example, supports more instructions than the Pentium. Yet the 601 is considered a RISC chip, while the Pentium is definitely CISC. In reality, the instruction sets are not really reduced; the difference is not in the size of the instruction set anymore. Furthermore, the number of transistors that can be packed into a silicon wafer has steadily increased. A Pentium contains more than 3 million transistors in an area roughly a half-inch square, It executes two instructions per clock cycle, not one instruction in half a clock cycle. Actually, many instructions still require multiple clock cycles.

But the use of pipelines allows instructions to overlap, so that five instructions each requiting five clock cycles can be completed in a total of five clocks m an average of one instruction per clock cycle. Pentium's two independent execution units qualify it as a superscalar microprocessor. They permit two pipelines to run in parallel. What really distinguishes RISC from CISC is more deeply rooted in the chip architecture.

9.1 Architectural Differences

RISC microprocessors have more general purpose registers (e.g., Pentium has just 8 general purpose registers but a PowerPC chip has 32), which help to minimize the number of times data stored in memory is accessed. Accessing registers is much faster than accessing memory addresses.

RISC microprocessors use load and store architectures. CPU instructions that operate on data in memory consume more time. RISC designs minimize the number of instructions that access memory in favor of load and store operations.

For instance, when adding two numbers the instructions require one number to be loaded into a register before the other is added and then stored back to memory.

RISC microprocessors use uniform instruction lengths. On a Pentium, the length of one instruction can vary from as few as 1 byte to as many as 7. RISC favors making all instructions the same length, usually 32 bits. This simplifies the instruction fetching and decoding logic. Furthermore, an instruction can be fetched with one 32-bit memory access.

RISC microprocessors emphasize floating point performance, with high-performance floating point units built in. In general, RISC designers have been quick to adopt cutting-edge technologies such as on-chip code and data caches, superscalar designs, instruction pipelines, and branch prediction logic. But now these are available in CISC as well, so it’s difficult to distinguish RISC from CISC on the basis of those features. For example, the MIPS RX000 family combines reduced instruction sets with huge internal caches and AMD, Cyrix, and NexGen incorporate RISC-like features into their Pentium clones.

The problem with CISC processors is the lack of registers. With only 16 or so registers, CISC processors allow for little flexibility. CISC processors generally dissipate much more heat and consume more power than RISC processors. This is a barrier to achieving high clock speeds with CISC-based processors. Table 17 lists additional differences between RISC and CISC.

TABLE 4: 7 Advantages of CISC vs. RISC

====

CISC

Microprogramming is as easier to implement and much less expensive than hardwiring a control unit.

The ease of microcoding new instructions allows CISC machines to be upwardly compatible: A new computer could run the same programs as earlier computers because the new computer would contain a superset of the instructions of the earlier computers.

As each instruction became more capable, fewer instructions could be used to implement a given task. This made more efficient use of the relatively slow main memory.

Because microprogram instruction sets can be written to match the constructs of high-level languages, the compiler need not be very complicated.

RISC

Since a simplified instruction set allows for a pipelined, superscalar design, RISC processors often achieve two to four times the performance of CISC processors using comparable semiconductor technology and the same clock rates.

Because the instruction set of a RISC processor is simpler, it uses up much less chip space than a CISC processor. Extra functions, such as memory management units or floating point arithmetic units, can be placed on the same chip.

Since RISC processors can be designed more quickly, they can take advantage of new technological developments sooner than corresponding CISC designs.

====

9.2 The Intel Pentium: A CISC Example

The Pentium has achieved code compatibility with earlier x86 CPUs while attaining third-generation RISC performance. It implements the complex x86 instruction set and emphasizes simple instruction execution over the more complex ones. In the Pentium, the simple, RISC-like register-to-register instructions drive the implementation, keeping the microcoded complex instructions as second priority.

The Pentium achieves a two-instruction issue peak and has two five-stage pipelines (U and V). These pipelines are not symmetric. The U pipeline has precedence over the V one. If the first instruction does not cause data interlocks, then the second instruction is scheduled for the V pipe. The U and V pipelines are fed from a common instruction fetch/align stage that fetches multiple instructions from the cache. The CPU fetches and passes a full line (256 bits) to the instruction decoder. Each pipeline has two decoder stages to decode simple and complex instructions. The wide cache-to-decoder path, coupled with a two-stage decode, enables the Pentium to decode the x86's variable-length instructions and deliver competitive performance.

It carries out superscalar dual-instruction load and store operations. Both data and instructions are cached. Pentium's floating point unit (FPU) features an eight-stage pipeline, which shares the first five stages of the U and V pipeline.

Data transfers to or from the FPU use a wide 64-bit data path to the data cache to keep the FPU pipeline fed. The Pentium uses burst reads to fill its 256-bit-wide cache line. It also has burst write-back writes. The memory interface uses a pipeline, allowing a second bus cycle to set up while the first bus cycle completes.

The Pentium reads or writes a 64-bit double word each cycle in burst mode.

The following key issues must be addressed when moving code from a CISC processor to a RISC processor:

1. The quality of the code. The performance of a RISC application depends critically on the quality of the code generated by the compiler. Therefore, developers have to choose the compiler carefully, based on the quality of the generated code. If a compiler does not schedule instructions properly, execution slows down significantly.

2. Debugging. Instruction scheduling makes debugging more difficult. When scheduled for execution assembly language instructions are arranged in a different order. Therefore, finding bugs and tracing the execution are difficult unless debugging is done on the unscheduled form.

3. Code expansion. Code expansion is the increase in size when a program written for a CISC processor is converted for use in a RISC machine. CISC machines perform complex actions with a single instruction. RISC machines require multiple instructions for the same action; therefore, the code expands with the conversion.

4. System design. RISC machines require instructions at a faster rate; therefore, their chips contain large first-level memory caches.


FIG. 19 The internal architecture of the MIPS R10000 RISC processor.

9.3 The MIPS R10000: A RISC Example

Although the architecture of the R10000 ( FIG. 19) is new, its designers decided that it would run R4000 code with no performance degradation. After they go through a two-stage instruction fetch-and-decode unit, the RI0000 can dispatch instructions simultaneously to five functional units: floating point add, floating point multiply, two integer operations, and a load and store. The pipeline integer unit consists of six stages: fetch, decode, issue, execute, cache access, and write back. Floating point instructions use a seventh stage attached to the integer pipe.

During sequencing, the R10000 maintains an instruction status table to determine the instructions waiting to graduate and to put the instructions in order.

When handling out-of-order execution, a completed instruction may never graduate because an exception or branch could invalidate the results. For this reason, an instruction may complete, but until graduation, its results are tentative and may be discarded.

The R10000 also facilitates designing tightly coupled multiprocessing systems. To accomplish this goal, the CPU has a 64-bit cluster bus configuration that allows direct connection of four RI0000 processors. Attaching the R10000 to an external agent, or cluster coordinator, creates a cluster bus that manages the flow of data within the cluster.

Top of Page

PREV. | Next | Related Articles | HOME