16-bit x86 assembly programming
From Wikipedia, the free encyclopedia
This article applies when the default operand size is 16-bit.
Contents |
[edit] Registers
Each register is specialized for a certain task, and operations that deal with that task are often run more efficiently if the right register is used.
16-bit registers include:
- data registers
- AX, the accumulator
- BX, the base register
- CX, the counter register
- DX, the data register
- address registers
- SI, the source register
- DI, the destination register
- SP, the stack pointer register
- BP, the stack base pointer register
Each data register can be broken up into two eight-bit registers - that is 16 bits of data in a 16 bit register can be addressed 8 bits at a time: the upper eight and the lower eight bits, and can be treated as registers in their own right. For example, in the AX register, the AH register addresses the upper eight bits of the AX register, and the AL register addresses the lower eight bits of the AX register. The other data registers can be addressed in this way by changing the suffix - "X" for extended, "H" for high, and "L" for low.
Collectively the data and address registers are called the general registers.
With the general registers, there are additionally the:
- segment registers
- other registers
- IP, the instruction pointer register
- FLAGS, the flags register
The IP register points to where in the program the processor is currently executing its code. The IP register cannot be accessed by the programmer directly.
The FLAGS register contains the current state of the processor. Each bit in this register is called a flag. Each flag can be either 1 or 0, set or not set. Some of the flags that the FLAGS register contains is carry, overflow, zero and single step.
Flags are notably used in the x86 architecture for comparisons. A comparison is made between two registers, for example, and in comparison of their difference a flag is raised. A jump instruction then checks the respective flag and jumps if the flag has been raised: for example
cmp ax, bx jne do_something
first compares the AX and BX registers, and if they are unequal, the code branches off to the do_something label.
[edit] Mnemonics for opcodes
In real mode, the following mnemonics are available: aaa, aad, aam, aas, adc, add, and, call, cbw, clc, cld, cli, cmc, cmp, cmpsb, cmpsw, cwd, daa, das, dec, div, esc, hlt, idiv, imul, in, inc, int, into, iret, ja, jae, jb, jbe, jc, jcxz, je, jg, jge, jl, jle, jmp, jna, jnae, jnb, jnbe, jnc, jne, jng, jnge, jnl, jnle, jno, jnp, jns, jnz, jo, jp, jpe, jpo, js, jz, lahf, lds, lea, les, lock, lodsb, lodsw, loop, loope, loopne, loopnz, loopz, mov, movsb, movsw, mul, neg, nop, not, or, out, pop, popf, push, push, puchf, rcl, rcr, rep, repe, repne, repnz, repz, ret, rol, ror, sahf, sal, sar, sbb, scasb, scasw, shl, shr, stc, std, sti, stosb, stosw, sub, test, wait, xchg, xlat, xor
There are also some undocumented opcodes that have no mnemonics named after them. For example, 0x0F while executed by most 8086 processors could be translated to "POP CS". Other processors in the x86-family may not interpret undocumented opcodes as earlier processors do. Therefore, use of undocumented opcodes might render your program useless for future x86 processors. On the 80186 and later, invalid opcodes generate an exception.
[edit] The segmented addressing model
The x86 architecture uses a process known as segmentation to address memory, and not a linear method as used in other architectures. Segmentation involves decomposing a linear address into two parts - a segment and an offset. The segment address points to the beginning of a 64K group of addresses and an offset from the base address of the specified segment. In real mode, to translate back into a linear address, the segment address is shifted 4 bits left and then added to the offset. The formula looks like this: segment*0x10+offset.
Two registers are used for a memory address: one to hold the segment, and one to hold the offset.
In real mode only, for example, if DS contains the hexadecimal number 0xDEAD and DX contains the number 0xCAFE they would together point to the memory address 0xDEAD * 0x10 + 0xCAFE = 0xEB5CE
In protected mode, the segment selector can be broken down into three parts: A 13-bit index, a TI bit that indicates whether the entry is in the GDT or LDT (which when loaded, looked up for the base), and a 2-bit RPL. See Memory segment.
In referring to an address with a segment and an offset, the notation of segment:offset is used, in the above example (for real mode only), the linear address 0xEB5CE can be written as 0xDEAD:0xCAFE, or if one has a segment and offset register pair, DS:DX.
There are some special combinations of segment registers and general registers that point to important addresses:
- CS:IP points to the address where the processor will fetch the next byte of code.
- SS:SP points to the location of the last item pushed onto the stack.
- DS:SI is often used to point to data that is about to be copied to ES:DI