Cocojunk

🚀 Dive deep with CocoJunk – your destination for detailed, well-researched articles across science, technology, culture, and more. Explore knowledge that matters, explained in plain English.

Navigation: Home

Instruction set

Published: Sat May 03 2025 19:14:06 GMT+0000 (Coordinated Universal Time) Last Updated: 5/3/2025, 7:14:06 PM

Read the original article here.

Understanding Instruction Sets: The Language of Your Computer

When embarking on the journey of "The Lost Art of Building a Computer from Scratch," one of the most fundamental concepts you must grasp is the Instruction Set Architecture (ISA), or simply, the Instruction Set. Think of the instruction set as the vocabulary and grammar of the language that your computer's central processing unit (CPU) understands and speaks. Without this defined language, software cannot communicate with the hardware, and the hardware wouldn't know what operations to perform.

This resource dives into the core components and concepts of instruction sets, providing the necessary foundation for anyone looking to understand or design the brain of a computer.

What is an Instruction Set?

At its heart, a computer is a machine that executes a sequence of operations. These operations are dictated by programs, which are essentially lists of instructions. The instruction set defines what these instructions are.

An Instruction Set (IS), also known as an Instruction Set Architecture (ISA), is the set of commands (instructions) that a particular processor understands and can execute. It defines the complete list of operations (such as addition, data movement, branching, etc.) and the format of these operations, acting as the fundamental interface between software and hardware.

Every CPU design is built around a specific instruction set. Software (like operating systems, applications, and even compilers) is written or compiled to target a particular ISA. This means that software written for a processor with one ISA (e.g., ARM) will generally not run directly on a processor with a different ISA (e.g., x86) without some form of translation or emulation.

For someone building a computer from scratch, defining the instruction set is often one of the very first steps. It dictates the hardware components you'll need (e.g., what operations your Arithmetic Logic Unit must support) and the control logic required to fetch, decode, and execute instructions.

Anatomy of an Instruction

Each instruction in the set is a command that tells the CPU to perform a specific, atomic task. An instruction, in its binary form, typically consists of two main parts:

Opcode (Operation Code): This part specifies the operation to be performed (e.g., add, subtract, move data, jump).
Operands: These specify the data or locations where the operation should be performed. Operands can represent:
- Data values themselves (immediate values)
- Registers within the CPU
- Memory addresses

Example: Consider a hypothetical instruction in a simple ISA: ADD R1, R2, R3

Opcode: ADD (Indicates an addition operation)
Operands: R1, R2, R3 (Indicates that the contents of Register 2 and Register 3 should be added, and the result stored in Register 1).

In the computer's memory, this instruction is stored as a sequence of bits. The opcode would be represented by a unique binary code (e.g., 0010), and the registers would also have binary encodings.

Instruction Format

The instruction format defines how the bits of an instruction are arranged – how many bits are allocated for the opcode, how many for operands, and how the operands are interpreted. The format significantly impacts the complexity of both the hardware required to decode instructions and the compiler/assembler software.

There are two primary approaches to instruction format:

Fixed-Length Instructions: Every instruction occupies the same number of bits (e.g., all instructions are 32 bits long).
- Pros: Simpler instruction fetching and decoding hardware. Easier to implement techniques like pipelining (executing multiple instructions concurrently).
- Cons: Can be less code-dense, potentially wasting bits for instructions that don't require many operands.
- Examples: Most RISC architectures (like ARM, MIPS, RISC-V).
Variable-Length Instructions: Instructions can have different lengths depending on the operation and the number/type of operands.
- Pros: Can achieve higher code density, using only the bits necessary for each instruction.
- Cons: More complex instruction fetching and decoding hardware, as the CPU needs to determine the length of the current instruction before fetching the next one. More challenging for pipelining.
- Examples: Most CISC architectures (like x86).

When building a simple computer, a fixed-length instruction format is often chosen due to its relative simplicity in hardware implementation.

Types of Instructions

Instruction sets typically include instructions that fall into several categories, enabling a wide range of computational tasks:

Data Movement Instructions: Move data between registers, between registers and memory, or load immediate values into registers. Examples: LOAD, STORE, MOVE, PUSH, POP.
Arithmetic Instructions: Perform mathematical operations. Examples: ADD, SUBTRACT, MULTIPLY, DIVIDE, INCREMENT, DECREMENT.
Logical Instructions: Perform bitwise logical operations. Examples: AND, OR, NOT, XOR, SHIFT, ROTATE.
Control Flow Instructions: Alter the sequence of instruction execution based on conditions or explicitly jump to different parts of the program. Examples: JUMP (unconditional), BRANCH (conditional, e.g., BEQ - Branch if Equal), CALL (subroutine call), RETURN.
Input/Output Instructions: Communicate with peripheral devices. (Often handled through memory-mapped I/O in modern systems, but some ISAs have dedicated I/O instructions).
System Instructions: Operations related to the system state, privileged operations, handling interrupts, etc. Examples: SYSCALL, NOP (No Operation).

A well-designed instruction set provides a balance of these types to allow for efficient execution of common programming constructs.

Addressing Modes

Addressing modes define how the operand part of an instruction is interpreted to find the actual data (the effective address) that the instruction will operate on. Different addressing modes offer flexibility and efficiency in accessing various types of data structures.

Here are some common addressing modes:

Immediate Addressing: The operand itself is the data value.
- Example: ADD R1, #10 (Add the value 10 to the content of Register 1).
- Use Case: Loading constants.
Register Addressing: The operand specifies a register containing the data.
- Example: ADD R1, R2 (Add the content of Register 2 to Register 1).
- Use Case: Fast operations on data already in registers.
Direct (Absolute) Addressing: The operand specifies the exact memory address where the data is located.
- Example: LOAD R1, 2000 (Load the data from memory address 2000 into Register 1).
- Use Case: Accessing fixed memory locations (less common in modern ISAs for general data due to lack of flexibility).
Indirect Addressing: The operand specifies a register or memory location that contains the memory address of the data.
- Example: LOAD R1, [R2] (Load the data from the memory address stored in Register 2 into Register 1).
- Use Case: Implementing pointers.
Register Indirect Addressing: Similar to indirect, but the operand specifies a register holding the address.
- Example: LOAD R1, (R2) or LOAD R1, [R2] (Syntax varies by ISA). Same as the indirect example above.
Indexed Addressing: The operand is calculated by adding an offset value (often an immediate value) to the content of a base register.
- Example: LOAD R1, [R2 + #20] (Load the data from the memory address calculated by adding 20 to the content of Register 2).
- Use Case: Accessing elements in arrays or structures.
Based Addressing: Similar to indexed, but often implies the base register holds the start address of a segment, and the offset is within that segment.
PC-Relative Addressing: The effective address is calculated based on the current value of the Program Counter (PC).
- Example: BRANCH Equal, #50 (If the condition is true, jump 50 instructions forward from the current PC).
- Use Case: Implementing position-independent code, branches within a small range.

Choosing which addressing modes to include is another design decision when creating an ISA. More modes provide flexibility but add complexity to the instruction decoder hardware.

Instruction Set Architectures (ISAs): Philosophies of Design

Over the history of computing, different philosophies have emerged regarding what constitutes a good instruction set. The most famous distinction is between Complex Instruction Set Computing (CISC) and Reduced Instruction Set Computing (RISC).

CISC (Complex Instruction Set Computing)

Philosophy: Provide a rich set of complex instructions, some of which may perform multiple low-level operations (like memory access, arithmetic, and loop control) in a single instruction. Instructions often vary in length.
Goal: Reduce the number of instructions needed to perform a task, potentially leading to smaller program size (code density). Ease compiler development by providing high-level instructions that map closely to high-level language constructs (though this benefit became less significant as compiler technology advanced).
Characteristics:
- Many instructions, often specialized.
- Complex addressing modes.
- Variable instruction lengths.
- Operations can often directly manipulate memory.
Implications for Hardware: Requires complex control logic to decode and execute the various intricate instructions. Pipelining is more challenging due to variable instruction lengths and complex instruction execution stages.
Examples: x86 (used in most desktop/laptop computers), Motorola 68k.

RISC (Reduced Instruction Set Computing)

Philosophy: Use a smaller, highly optimized set of simple instructions. Each instruction performs only one basic operation. Complex tasks are achieved by combining sequences of these simple instructions. Instructions are typically fixed-length.
Goal: Simplify instruction decoding and execution hardware, enabling faster clock speeds and more efficient pipelining. Focus on load/store architecture, meaning arithmetic/logical operations only work on data in registers; data must be explicitly loaded from memory into registers before computation and stored back to memory afterwards.
Characteristics:
- Few instructions, generally simple and atomic.
- Few addressing modes, often simple ones like register, immediate, and indexed.
- Fixed instruction length.
- Load/Store architecture: Memory access is separate from computation.
Implications for Hardware: Simpler control logic. Highly conducive to efficient pipelining, leading to high throughput. Easier to design and implement.
Examples: ARM (dominates mobile devices, growing in servers/desktops), MIPS (historically popular in embedded systems and research), RISC-V (open standard, gaining popularity), PowerPC.

Choosing Between CISC and RISC for a Custom Build

For someone building a computer from scratch, a RISC-like approach is generally much easier to design and implement in hardware. The fixed instruction length simplifies fetching and decoding, and the limited number of simple operations requires less complex control circuitry compared to a CISC design. This is why many educational or hobbyist CPU designs are based on RISC principles.

ISA vs. Microarchitecture

It's important to distinguish between the Instruction Set Architecture (ISA) and the Microarchitecture.

ISA: The abstract definition of the instruction set – what instructions exist, their format, and what they do logically. This is the programmer's view of the CPU.
Microarchitecture: The specific hardware implementation of the ISA – how the data path, control unit, cache memory, and pipelines are designed to execute the instructions defined by the ISA. This is the hardware designer's view.

Multiple different microarchitectures can implement the same ISA. For example, Intel's "Core i7" and "Pentium" processors both implement the x86 ISA, but they have vastly different internal designs (microarchitectures) resulting in different performance characteristics.

When building from scratch, you first define your ISA (the language) and then design the microarchitecture (the circuitry) that can understand and execute that language efficiently.

The Fetch-Decode-Execute Cycle

The instruction set is brought to life by the CPU through a continuous process called the Fetch-Decode-Execute cycle, also known as the Instruction Cycle. This cycle is the fundamental operation of any stored-program computer.

Fetch: The CPU retrieves the next instruction from memory. The memory address of the instruction is held in a special register called the Program Counter (PC). After fetching the instruction, the PC is typically incremented to point to the next instruction in sequence. The fetched instruction is placed in the Instruction Register (IR).
Decode: The CPU's control unit analyzes the instruction in the IR to determine the operation to be performed and the operands involved. This step involves interpreting the opcode and operand fields according to the ISA's defined format.
Execute: The CPU performs the operation specified by the instruction. This might involve using the Arithmetic Logic Unit (ALU) for calculations, moving data between registers, accessing memory, or changing the flow of execution by updating the PC (e.g., for jumps or branches).

This cycle repeats continuously as the CPU runs a program. Understanding this cycle is essential for building a CPU, as you must design hardware for each stage.

Conclusion: The Bedrock of Computation

The instruction set is not just a list of commands; it's the foundational contract between software and hardware. Defining or understanding an ISA is a critical step in comprehending how computers work at their lowest level. For anyone undertaking the challenge of building a computer from scratch, designing or selecting an ISA is the very definition of the machine they are about to create – determining its capabilities, complexity, and performance potential. It is, truly, the lost art of teaching silicon to speak.

Instruction pipelining