Full general-Purpose Register

Cortex-M3 Nuts

Joseph Yiu , in The Definitive Guide to the ARM Cortex-M3 (2d Edition), 2010

3.1 Registers

As nosotros've seen, the Cortex™-M3 processor has registers R0 through R15 and a number of special registers. R0 through R12 are general purpose, only some of the 16-flake Thumb® instructions can but access R0 through R7 (low registers), whereas 32-bit Thumb-2 instructions tin can access all these registers. Special registers have predefined functions and can only be accessed by special register access instructions.

3.1.one Full general Purpose Registers R0 through R7

The R0 through R7 full general purpose registers are also called low registers. They can be accessed by all 16-flake Pollex instructions and all 32-bit Thumb-ii instructions. They are all 32 $.25; the reset value is unpredictable.

3.ane.2 General Purpose Registers R8 through R12

The R8 through R12 registers are besides called high registers. They are accessible by all Thumb-2 instructions but not by all xvi-bit Thumb instructions. These registers are all 32 bits; the reset value is unpredictable (run across Figure 3.1).

FIGURE 3.1. Registers in the Cortex-M3.

3.i.3 Stack Arrow R13

R13 is the stack arrow (SP). In the Cortex-M3 processor, there are two SPs. This duality allows two separate stack memories to exist set up. When using the register name R13, you tin can only access the current SP; the other one is inaccessible unless y'all employ special instructions to motility to special register from full general-purpose annals (MSR) and motion special register to general-purpose register (MRS). The ii SPs are equally follows:

Main Stack Pointer (MSP) or SP_main in ARM documentation: This is the default SP; it is used by the operating system (Os) kernel, exception handlers, and all application codes that require privileged access.

Process Stack Pointer (PSP) or SP_process in ARM documentation: This is used by the base-level application code (when not running an exception handler).

Stack Push and Pop

Stack is a memory usage model. It is simply part of the system memory, and a pointer annals (inside the processor) is used to go far work as a first-in/final-out buffer. The common use of a stack is to save register contents before some data processing and then restore those contents from the stack after the processing job is done.

Figure 3.ii. Basic Concept of Stack Memory.

When doing PUSH and POP operations, the pointer register, commonly called stack pointer, is adjusted automatically to prevent next stack operations from corrupting previous stacked data. More details on stack operations are provided on later function of this chapter.

Information technology is not necessary to utilize both SPs. Simple applications can rely purely on the MSP. The SPs are used for accessing stack retention processes such as Button and POP.

In the Cortex-M3, the instructions for accessing stack memory are PUSH and Popular. The associates language syntax is as follows (text subsequently each semicolon [;] is a comment):

Push   {R0}   ; R13=R13-4, and then Retentiveness[R13] = R0

Popular   {R0}   ; R0 = Memory[R13], then R13 = R13 + 4

The Cortex-M3 uses a full-descending stack arrangement. (More than detail on this field of study can be found in the "Stack Memory Operations" department of this chapter.) Therefore, the SP decrements when new data is stored in the stack. Button and POP are ordinarily used to save register contents to stack memory at the outset of a subroutine and so restore the registers from stack at the end of the subroutine. You can PUSH or Popular multiple registers in ane pedagogy:

subroutine_1

  Push button   {R0-R7, R12, R14} ; Relieve registers

  ...   ; Do your processing

  Popular   {R0-R7, R12, R14} ; Restore registers

  BX   R14   ; Render to calling function

Instead of using R13, you tin use SP (for SP) in your program codes. It ways the aforementioned affair. Inside plan code, both the MSP and the PSP can be called R13/SP. Notwithstanding, y'all can admission a particular ane using special annals access instructions (MRS/MSR).

The MSP, also chosen SP_main in ARM documentation, is the default SP after power-upwardly; it is used past kernel lawmaking and exception handlers. The PSP, or SP_process in ARM documentation, is typically used by thread processes in system with embedded Os running.

Because register PUSH and Pop operations are e'er word aligned (their addresses must be 0x0, 0x4, 0x8, ...), the SP/R13 chip 0 and chip 1 are hardwired to 0 and always read as nothing (RAZ).

3.ane.four Link Register R14

R14 is the link annals (LR). Inside an associates programme, you can write information technology as either R14 or LR. LR is used to store the return program counter (PC) when a subroutine or part is called—for example, when you're using the branch and link (BL) instruction:

main   ; Main program

  ...

  BL function1 ; Call function1 using Co-operative with Link pedagogy.

  ; PC = function1 and

  ; LR = the next teaching in primary

  ...

function1

  ...   ; Programme lawmaking for function 1

  BX LR   ; Render

Despite the fact that scrap 0 of the PC is always 0 (considering instructions are discussion aligned or half discussion aligned), the LR fleck 0 is readable and writable. This is because in the Thumb instruction ready, scrap 0 is often used to bespeak ARM/Thumb states. To allow the Thumb-2 program for the Cortex-M3 to work with other ARM processors that back up the Pollex-2 technology, this least significant scrap (LSB) is writable and readable.

3.1.5 Program Counter R15

R15 is the PC. Yous tin can access information technology in assembler lawmaking by either R15 or PC. Because of the pipelined nature of the Cortex-M3 processor, when you read this annals, y'all will find that the value is different than the location of the executing teaching, normally by 4. For case:

0x1000 :   MOV   R0, PC   ; R0 = 0x1004

In other instructions similar literal load (reading of a retentivity location related to current PC value), the effective value of PC might not be instruction accost plus 4 due to alignment in accost calculation. Simply the PC value is still at least 2 bytes alee of the instruction address during execution.

Writing to the PC will cause a co-operative (only LRs practise not get updated). Because an pedagogy address must be half discussion aligned, the LSB (bit 0) of the PC read value is e'er 0. However, in branching, either past writing to PC or using branch instructions, the LSB of the target address should exist set to 1 considering it is used to betoken the Thumb country operations. If information technology is 0, information technology tin can imply trying to switch to the ARM land and will result in a fault exception in the Cortex-M3.

Read full chapter

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9781856179638000065

INTRODUCTION TO THE ARM Instruction SET

ANDREW N. SLOSS , ... CHRIS WRIGHT , in ARM System Developer's Guide, 2004

3.5 PROGRAM Condition Annals INSTRUCTIONS

The ARM instruction set provides ii instructions to directly control a plan status register (psr). The MRS instruction transfers the contents of either the cpsr or spsr into a annals; in the reverse direction, the MSR instruction transfers the contents of a annals into the cpsr or spsr. Together these instructions are used to read and write the cpsr and spsr.

In the syntax yous tin see a label called fields. This tin can exist any combination of command (c), extension (x), status (southward), and flags (f). These fields relate to item byte regions in a psr, every bit shown in Figure 3.ix.

Figure 3.9. psr byte fields.

MRS copy plan status annals to a general-purpose annals Rd = psr
MSR move a general-purpose register to a program status register psr[field] = Rm
MSR move an firsthand value to a plan condition register psr[field] = immediate

The c field controls the interrupt masks, Thumb state, and processor mode. Case 3.26 shows how to enable IRQ interrupts past clearing the I mask. This performance involves using both the MRS and MSR instructions to read from and and then write to the cpsr.

Example 3.26

The MSR starting time copies the cpsr into register r1. The BIC instruction clears bit 7 of r1. Register r1 is then copied dorsum into the cpsr, which enables IRQ interrupts. Yous can run across from this example that this code preserves all the other settings in the cpsr and only modifies the I bit in the control field.

This instance is in SVC way. In user mode you can read all cpsr $.25, but you tin can but update the condition flag field f.

iii.v.1 COPROCESSOR INSTRUCTIONS

Coprocessor instructions are used to extend the instruction set. A coprocessor can either provide additional computation capability or exist used to command the memory subsystem including caches and memory management. The coprocessor instructions include information processing, register transfer, and retentivity transfer instructions. We will provide merely a short overview since these instructions are coprocessor specific. Notation that these instructions are only used by cores with a coprocessor.

CDP coprocessor data processing—perform an operation in a coprocessor
MRC MCR coprocessor register transfer—motion data to/from coprocessor registers
LDC STC coprocessor retentiveness transfer—load and store blocks of retentiveness to/from a coprocessor

In the syntax of the coprocessor instructions, the cp field represents the coprocessor number between p0 and p15. The opcode fields describe the operation to take place on the coprocessor. The Cn, Cm, and Cd fields describe registers within the coprocessor. The coprocessor operations and registers depend on the specific coprocessor you are using. Coprocessor xv (CP15) is reserved for system control purposes, such as retentiveness management, write buffer control, enshroud control, and identification registers.

Case 3.27

This example shows a CP15 register being copied into a full general-purpose register.

Hither CP15 register-0 contains the processor identification number. This register is copied into the general-purpose register r10.

3.5.two COPROCESSOR 15 Education SYNTAX

CP15 configures the processor cadre and has a ready of defended registers to shop configuration data, every bit shown in Example three.27. A value written into a register sets a configuration attribute—for example, switching on the cache.

CP15 is called the organization control coprocessor. Both MRC and MCR instructions are used to read and write to CP15, where register Rd is the core destination register, Cn is the primary register, Cm is the secondary annals, and opcode2 is a secondary annals modifier. You may occasionally hear secondary registers called "extended registers."

As an example, here is the education to move the contents of CP15 control register c1 into register r1 of the processor cadre:

Nosotros use a shorthand note for CP15 reference that makes referring to configuration registers easier to follow. The reference notation uses the following format:

The showtime term, CP15, defines it as coprocessor xv. The second term, afterwards the separating colon, is the primary register. The principal register X can take a value between 0 and 15. The 3rd term is the secondary or extended register. The secondary register Y can have a value between 0 and 15. The final term, opcode2, is an educational activity modifier and can accept a value betwixt 0 and vii. Some operations may also apply a nonzero value w of opcode1. We write these equally CP15:w:cX:cY:Z.

Read total chapter

URL:

https://world wide web.sciencedirect.com/science/article/pii/B9781558608740500046

Overview of the Cortex-M3

Joseph Yiu , in The Definitive Guide to the ARM Cortex-M3 (2d Edition), 2010

2.2 Registers

The Cortex-M3 processor has registers R0 through R15 (come across Figure 2.2). R13 (the stack pointer) is banked, with only 1 copy of the R13 visible at a time.

FIGURE two.2. Registers in the Cortex-M3.

2.two.i R0–R12: General-Purpose Registers

R0–R12 are 32-bit full general-purpose registers for data operations. Some sixteen-scrap Thumb ® instructions can only access a subset of these registers (low registers, R0–R7).

2.ii.2 R13: Stack Pointers

The Cortex-M3 contains two stack pointers (R13). They are banked so that merely ane is visible at a time. The ii stack pointers are as follows:

Primary Stack Pointer (MSP): The default stack pointer, used by the operating system (Bone) kernel and exception handlers

Process Stack Pointer (PSP): Used by user application code

The lowest 2 bits of the stack pointers are always 0, which means they are always word aligned.

two.2.3 R14: The Link Register

When a subroutine is called, the render accost is stored in the link register.

2.2.4 R15: The Program Counter

The program counter is the current programme address. This register can exist written to control the programme period.

two.two.v Special Registers

The Cortex-M3 processor also has a number of special registers (see Figure 2.iii). They are as follows:

Plan Status registers (PSRs)

Interrupt Mask registers (PRIMASK, FAULTMASK, and BASEPRI)

Control register (Command)

FIGURE ii.3. Special Registers in the Cortex-M3.

These registers have special functions and tin be accessed only past special instructions. They cannot be used for normal data processing (see Table 2.1).

Table 2.i. Special Registers and Their Functions

Annals Function
xPSR Provide arithmetic and logic processing flags (zero flag and bear flag), execution status, and current executing interrupt number
PRIMASK Disable all interrupts except the nonmaskable interrupt (NMI) and hard error
FAULTMASK Disable all interrupts except the NMI
BASEPRI Disable all interrupts of specific priority level or lower priority level
CONTROL Define privileged condition and stack pointer selection

For more information on these registers, run across Chapter 3.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781856179638000053

Early on Intel® Compages

In Power and Performance, 2015

1.1.2 Registers

Bated from the four segment registers introduced in the previous section, the 8086 has seven general purpose registers, and two status registers.

The general purpose registers are divided into two categories. 4 registers, AX, BX, CX, and DX, are classified equally information registers. These data registers are attainable equally either the full sixteen-bit register, represented with the X suffix, the depression byte of the full 16-bit register, designated with an L suffix, or the high byte of the sixteen-flake annals, delineated with an H suffix. For example, AX would access the full xvi-bit annals, whereas AL and AH would access the register's low and high bytes, respectively.

The second classification of registers are the arrow/index registers. This includes the following four registers: SP, BP, SI, and DI, The SP register, the stack pointer, is reserved for usage as a pointer to the top of the stack. The SI and DI registers are typically used implicitly equally the source and destination pointers, respectively. Unlike the data registers, the pointer/index registers are only accessible as full 16-bit registers.

Every bit this categorization may indicate, the full general purpose registers come with some guidance for their intended usage. This guidance is reflected in the instruction forms with implicit operands. Instructions with implicit operands, that is, operands which are assumed to be a certain register and therefore don't crave that operand to be encoded, let for shorter encodings for common usages. For convenience, instructions with implicit forms typically also take explicit forms, which require more bytes to encode. The recommended uses for the registers are every bit follows:

AX Accumulator

BX Information (relative to DS)

CX Loop counter

DX Data

SI Source pointer (relative to DS)

DI Destination arrow (relative to ES)

SP Stack arrow (relative to SS)

BP Base arrow of stack frame (relative to SS)

Bated from assuasive for shorter educational activity encodings, this guidance is also an aid to the developer who, once familiar with the various register meanings, will be able to deduce the meaning of assembly, assuming it conforms to the guidelines, much faster. This parallels, to some caste, how variable names help the developer reason about their contents. Information technology's important to annotation that these are but suggestions, not rules.

Additionally, there are ii condition registers, the instruction pointer and the flags register.

The instruction arrow, IP, is also often referred to as the program counter. This annals contains the retentivity address of the next instruction to be executed. Until 64-bit mode was introduced, the instruction arrow was not straight accessible to the programmer, that is, it wasn't possible to access it similar the other general purpose registers. Despite this, the instruction pointer was indirectly attainable. Whereas the instruction pointer couldn't exist modified through a MOV instruction, it could be modified past any educational activity that alters the program period, such as the CALL or JMP instructions.

Reading the contents of the didactics pointer was also possible by taking reward of how x86 handles function calls. Transfer from one function to another occurs through the Telephone call and RET instructions. The Phone call instruction preserves the current value of the didactics arrow, pushing it onto the stack in lodge to back up nested office calls, and and then loads the instruction pointer with the new address, provided as an operand to the instruction. This value on the stack is referred to equally the return address. Whenever the function has finished executing, the RET instruction pops the return address off of the stack and restores information technology into the instruction arrow, thus transferring control dorsum to the function that initiated the part telephone call. Leveraging this, the programmer tin create a special thunk function that would just re-create the return value off of the stack, load it into i of the registers, and then render. For example, when compiling Position-Independent-Code (Picture show), which is discussed in Chapter 12, the compiler will automatically add functions that employ this technique to obtain the instruction pointer. These functions are usually chosen __x86.get_pc_thunk.bx(), __x86.get_pc_thunk.cx(), __x86.get_pc_thunk.dx(), and so on, depending on which annals the education pointer is loaded.

The second status register, the EFLAGS register, is comprised of 1-bit status and control flags. These bits are set by various instructions, typically arithmetic or logic instructions, to signal certain atmospheric condition. These condition flags tin and so be checked in order to brand decisions. For a listing of the flags modified by each instruction, meet the Intel SDM. The 8086 defined the following status and control bits in EFLAGS:

Zippo Flag (ZF) Set if the result of the teaching is zero.

Sign Flag (SF) Set up if the upshot of the didactics is negative.

Overflow Flag (OF) Ready if the issue of the instruction overflowed.

Parity Flag (PF) Set if the effect has an even number of bits set.

Behave Flag (CF) Used for storing the bear bit in instructions that perform arithmetic with carry (for implementing extended precision).

Adjust Flag (AF) Similar to the Behave Flag. In the parlance of the 8086 documentation, this was referred to as the Auxiliary Carry Flag.

Direction Flag (DF) For instructions that either autoincrement or autodecrement a pointer, this flag chooses which to perform. If gear up, autodecrement, otherwise autoincrement.

Interrupt Enable Flag (IF) Determines whether maskable interrupts are enabled.

Trap Flag (TF) If set CPU operates in unmarried-step debugging mode.

Read full chapter

URL:

https://world wide web.sciencedirect.com/scientific discipline/article/pii/B978012800726600001X

Intel® Pentium® Processors

In Ability and Performance, 2015

2.two.3 Out-of-Order Execution

Equally discussed in Section 2.one.i, prior to the 80486, the processor handled one instruction at a time. As a upshot, the processor's resource remained idle while the currently executing education was not utilizing them. With the introduction of pipelining, the pipeline was partitioned to allow multiple instructions to coexist simultaneously. Therefore, when the currently executing instruction had finished with some of the processor's resources, the next education could begin utilizing them before the first pedagogy had completely finished executing. The introduction of μops expanded significantly on this concept, splitting educational activity execution into smaller steps.

Each type of μop has a respective type of execution unit of measurement. The Pentium Pro has 5 execution units: two for handling integer μops, ii for handling floating indicate μops, and one for treatment retentivity μops. Therefore, upwardly to five μops can execute in parallel. An instruction, divided into i or more μops, is not done executing until all of its corresponding μops accept finished. Plainly, μops from the same instruction accept dependencies upon i another so they tin can't all execute simultaneously. Therefore, μops from multiple instructions are dispatched to the execution units.

Taking reward of the fine granularity of μops, out-of-gild execution significantly improves utilization of the execution units. Upward until the Pentium Pro, Intel processors executed in-gild, meaning that instructions were executed in the same sequence as they were organized in memory. With out-of-guild execution, μops are scheduled based on the available resources, every bit opposed to their ordering. As instructions are fetched and decoded, the resulting μops are stored in the Reorder Buffer. As execution units and other resources become bachelor, the Reservation Station dispatches the corresponding μop to one of the execution units. Once the μop has finished executing, the consequence is stored dorsum into the Reorder Buffer. In one case all of the μops associated with an instruction take completed execution, the μops retire, that is, they are removed from the Reorder Buffer and whatsoever results or side-effects are made visible to the rest of the system. While instructions can execute in any gild, instructions always retire in-order, ensuring that the programmer does not demand to worry about handling out-of-guild execution.

To illustrate the problem with in-order execution and the do good of out-of-gild execution, consider the following hypothetical situation. Presume that a processor has ii execution units capable of handling integer μops and one capable of handling floating point μops. With in-order scheduling, the most efficient usage of this processor would be to intermix integer and floating point instructions following the ii-to-ane ratio. This would involve carefully scheduling instructions based on their didactics latencies, along with the latencies for fetching any retention resources, to ensure that when an execution unit becomes available, the next μop in the queue would be executable with that unit.

For instance, consider four instructions scheduled on this example processor, 3 integer instructions followed by a floating point instruction. Assume that each pedagogy corresponds to one μop, that these instructions have no interdependencies, and that all three execution units are currently bachelor. The first two integer instructions would exist dispatched to the two bachelor integer execution units, but the floating point teaching would not be dispatched, even though the floating point execution unit was bachelor. This is considering the tertiary integer instruction, waiting for one of the two integer execution units to become available, must exist issued first. This underutilizes the processor's resource. With out-of-club execution, the start two integer instructions and the floating point teaching would exist dispatched together.

In other words, out-of-order execution improves the utilization of the processor'south resources. Additionally, considering μops are scheduled based on available resources, some instruction latencies, such as an expensive load from retentiveness, may be partially or completely masked if other work can be scheduled instead.

Register Renaming

From the didactics set perspective, Intel processors have eight general purpose registers in 32-chip mode, and sixteen general purpose registers in 64-bit mode, however, from the internal hardware perspective, Intel processors have many more registers. For instance, the Pentium Pro has 40 registers, organized in a structure referred to as a Concrete Annals File.

While this many actress registers might seem similar a functioning boon, especially if the reader is familiar with the performance gain received from the eight actress registers in 64-bit style, these registers serve a dissimilar purpose. Rather than providing the process with more registers, these extra registers serve to handle data dependencies in the out-of-lodge execution engine.

When a value is stored into a register, a new register file entry is assigned to contain that value. In one case another value is stored into that register, a different register file entry is assigned to comprise this new value. Internal to the processor cadre, each data dependency on the commencement value will reference the beginning entry, and each information dependency on the second value will reference the second entry. Therefore, the out-of-order engine is able to execute instructions in an society that would otherwise exist incommunicable due to fake data dependencies.

Read total affiliate

URL:

https://www.sciencedirect.com/science/article/pii/B9780128007266000021

Load/shop and branch instructions

Larry D. Pyeatt , William Ughetta , in ARM 64-Fleck Associates Linguistic communication, 2020

iii.2 AArch64 user registers

As shown in Fig. three.two , the AArch64 ISA provides 31 full general-purpose registers, which are called

Image 2

through

Image 3

. These registers tin can each store 64 bits of data. To apply all 64 bits, they are referred to as

Image 4

through

Image 5

(capitalization is optional). To use merely the lower (least significant) 32 bits, they are referred to as

Image 6

. Since each register has a 64-bit name and a 32-bit name, we use

Image 7

through

Image 8

to specify a annals without specifying the number of bits. For instance, when we refer to

Image 9

, we are really referring to either

Image 10

or

Image 11

.

Figure 3.2

Effigy iii.2. AArch64 general purpose registers (

Image 1
) and special registers.

3.2.1 Full general purpose registers

The general-purpose registers are each used co-ordinate to specific conventions. These rules are defined in the application binary interface (ABI). The AArch64 ABI is called AAPCS64. The deviation betwixt callee saved and caller saved registers will also exist explained in Section 5.4.4.

Registers

Image 12
are used for passing arguments when calling a procedure or office Registers
Image 13
are scratch registers and can be used at whatever fourth dimension because no assumptions are made about what they incorporate. They are called scratch registers because they are useful for holding temporary results of calculations. Registers
Image 14
can also exist used as scratch registers, but their contents must be saved before they are used, and restored to their original contents before the process exits.

Some of the registers have alternate names. For example,

Image 15
is besides known as
Image 16
. Nigh of these alternate names are only of interest to people writing compilers and operating systems. However, 2 of these registers are of involvement to all AArch64 programmers.

3.2.2 Frame pointer

The frame pointer,

Image 17
, is used by high-level language compilers to track the electric current stack frame. This register tin can be helpful when the program is running nether a debugger, and tin sometimes help the compiler to generate more efficient code for returning from a subroutine. The GNU C compiler can be instructed to utilise
Image 17
as a general-purpose annals past using the –fomit-frame-arrow command line option. The utilise of
Image 17
as the frame arrow is a programming convention. Some instructions (due east.g. branches) implicitly alter the plan counter, the link register, and even the stack pointer, so they are considered to be hardware special registers. Every bit far every bit the hardware is concerned, the frame arrow is exactly the same as the other full general-purpose registers, simply AArch64 programmers use it for the frame pointer because of the ABI.

3.2.3 PSTATE register

The

Image 18

register contains bits that indicate the status of the current process, including information about the results of previous operations. Fig. three.3 shows all of its $.25. The dashed lines indicate unused space that may be reserved for future AArch64 architectural extensions. The

Image 18

register is actually a collection of independent fields, about of which are simply used past the operating system. User programs make use of the first four bits, N, Z, C, and 5. These are referred to as the condition flags field. Nearly instructions can modify these flags, and later instructions can utilise the flags to command their operation. Their meaning is as follows:

Negative:

This bit is gear up to one if the signed issue of an operation is negative, and set to zero if the result is positive or zero.

Zilch:

This bit is prepare to one if the result of an functioning is goose egg, and set up to zero if the result is not-zero.

Bear:

This fleck is set to one if an add operation results in a carry out of the about meaning bit, or if a subtract functioning results in a borrow. For shift operations, this flag is ready to the final chip shifted out by the shifter.

oVerflow:

For addition and subtraction, this flag is ready if a signed overflow occurred.

Figure 3.3

Figure 3.3. Fields in the PSTATE register.

3.2.4 Link register

The process link register,

Image 5
, is used to hold the return accost for subroutines. Certain instructions cause the program counter to be copied to the link register, so the plan counter is loaded with a new address. These branch-and-link instructions are briefly covered in Section 3.v and in more detail in Department v.4. The link register could theoretically be used as a scratch annals, but its contents are modified by hardware when a subroutine is called, in order to save the correct return address. Using
Image 5
every bit a general-purpose register is dangerous and is strongly discouraged.

3.2.v Stack arrow

The program stack was introduced in Section ane.4. The stack pointer,

Image 19
, is used to concord the address where the stack ends. This is commonly referred to equally the tiptop of the stack, although on most systems the stack grows downwards and the stack pointer really refers to the lowest address in the stack. The address where the stack ends may alter when registers are pushed onto the stack, or when temporary local variables (automatic variables) are allocated or deleted. The use of the stack for storing automatic variables is described in Affiliate five. The stack pointer can only be modified or read by a modest set up of instructions.

3.two.6 Cipher register

The zippo register,

Image 20
, can be referred to as a 64-bit register,
Image 21
, or a 32-bit register,
Image 22
. It e'er has the value goose egg. Near instructions tin use the zero annals as an operand, fifty-fifty every bit a destination register. If this is the case, the teaching will not change the destination annals. All the same, it can still accept side effects, including updating the
Image 18
flags based on the ALU operation and incrementing a register in pre-indexed or post-indexed addressing. The aught annals cannot always be used every bit an operand. It shares the same binary encoding with the stack pointer annals,
Image 19
, which is the value
Image 23
. Some instructions can access the goose egg register, while others can access the stack pointer.

3.2.7 Program counter

The program counter,

Image 24
, always contains the accost of the adjacent instruction that will exist executed. The processor increments this register by 4, automatically, afterward each instruction is fetched from retention. By moving an accost into this register, the programmer can cause the processor to fetch the next teaching from the new address. This gives the programmer the ability to jump to whatsoever address and brainstorm executing code there. But a modest number of instructions can access the
Image 24
directly. For example instructions that create a PC-relative address, such equally
Image 25
, and instructions which load a annals, such equally
Image 26
, are able to access the programme counter straight.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128192214000109

Knights Landing architecture

Jim Jeffers , ... Avinash Sodani , in Intel Xeon Phi Processor Loftier Performance Programming (2nd Edition), 2016

Integer execution unit

The IEU executes integer μops, which are divers every bit those that operate on general-purpose registers R0–R15 (i.east., RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI, R8…R15). There are two IEUs in the cadre. Each IEU contains 12-entry RS that issues i μop per bike. The Integer RSes are fully out-of-club in their scheduling. Nigh operations have 1-cycle latency and are supported by both IEUs, but a few operations have 3- or 5-cycles latency (e.g., multiplies) and are only supported by one of the IEUs.

Read total affiliate

URL:

https://www.sciencedirect.com/science/commodity/pii/B9780128091944000041

Computer Data Processing Hardware Compages

Paul J. Fortier , Howard East. Michel , in Estimator Systems Performance Evaluation and Prediction, 2003

2.three.1 Didactics types

Based on the number of registers available and the configuration of these registers several types of instruction are possible—for example, if many registers are available, equally would be the case in a stack computer, no address computations are needed and the didactics, therefore, can be much shorter both in format and execution fourth dimension required. On the other manus, if there are no general registers and all computations are performed by memory movements of data, then instructions volition be longer and crave more than fourth dimension due to operand fetching and storage. The following are representative of instruction types:

0-address instructions—This type of instruction is found in machines where many full general-purpose registers are bachelor. This is the case in stack machines and in some reduced instruction gear up machines. Instructions of this type perform their office totally using registers. If we have three general registers, A, B, and C, a typical format would have the form:

(2.1) R [ A ] < R [ B ] operator R [ C ]

which indicates that the contents of registers B and C have the operator (such equally add, decrease, multiply, etc.) performed on them, with the result stored in full general register C. Similarly, we could describe instructions that use just one or two registers as follows:

(2.two) R [ B ] < R [ B ] operator R [ C ]

or

(2.iii) operator R [ C ]

which represents two-register and one-register instructions, respectively. In the ii-register case one of the operand registers is also used equally the result register. In the single-register instance the operand register is also the consequence register. The increment instruction is an example of one-register education. This type of instruction is found in all machines.

1-address instructions—In this blazon of instruction a unmarried memory accost is found in the education. If another operand is used, it is typically an accumulator or the top of a stack in a stack figurer. The typical format of these instructions has the form:

(2.4) operator M [ address ]

where the contents of the named retention address have the named operator performed on them in conjunction with an implied special register. An example of such an education could be as follows:

(2.5) Motility M [ 100 ]

or

(ii.6) Add Thousand [ 100 ]

which moves the contents of retentiveness location 100 into the ALU's accumulator or adds the contents of retention address 100 with the accumulator and stores the consequence in the accumulator. If the result must be stored in memory, nosotros would need a store instruction:

(two.7) Store One thousand [ 100 ]

1-and-l/2-address instructions—Once nosotros take an compages that has some general-purpose registers, we can provide more advanced operations combining retentivity contents and the general registers. The typical instruction performs an operation on a memory location's contents with that of a general register—for case, we could add the contents of a retention location with the contents of a general register, A, as shown:

(2.viii) Add together R [ A ] , M [ 100 ]

This educational activity typically stores the result in the first named location or annals in the instruction. In this example it is register A.

2-accost instructions—Two address instructions utilize two memory locations to perform an instruction—for instance, a block move of N words from one location in retention to another, or a cake add. The motion may appear as follows:

(ii.9) Move North , M [ 100 ] , M [ 1000 ]

2-and-l/ii-address instructions—This format uses two memory locations and a general register in the instruction. Typical of this type of didactics is an functioning involving 2 retentivity locations storing the result in a annals or an performance with a general annals and a retentiveness location storing the result on another retentivity location, equally shown:

(2.10) R [ A ] > > M [ 100 ] operator M [ 1000 ] K [ m ] > > Thousand [ 100 ] operator R [ A ]

3-address instructions—Some other less common grade of instruction format is the iii-address instruction. These instructions involve 3 memory locations—two used for operands and one as the results location. A typical format is shown:

(2.xi) M [ 200 ] > > M [ 100 ] operator M [ 300 ]

Read total chapter

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9781555582609500023

Advanced Encryption Standard

Tom St Denis , Simon Johnson , in Cryptography for Developers, 2007

x86 Performance

The AMD Opteron achieves a nice boost due to the addition of the eight new full general-purpose registers. If we examine the GCC output for x86_64 and x86_32 platforms, we can see a nice difference between the two ( Table 4.2).

Table 4.two. Commencement Quarter of an AES Round

Both snippets accomplish (at least) the first MixColumns step of the starting time round in the loop. Notation that the compiler has scheduled function of the second MixColumns during the first to achieve higher parallelism. Even though in Table iv.2 the x86_64 code looks longer, it executes faster, partially because it processes more of the second MixColumns in roughly the same time and makes good use of the extra registers.

From the x86_32 side, we can conspicuously see diverse spills to the stack (in bold). Each of those costs the states three cycles (at a minimum) on the AMD processors (two cycles on most Intel processors). The 64-bit code was compiled to have zero stack spills during the master loop of rounds. The 32-bit code has virtually 15 stack spills during each round, which incurs a penalty of at to the lowest degree 45 cycles per round or 405 cycles over the course of the 9 full rounds.

Of class, we do non meet the total penalty of 405 cycles, as more than i opcode is being executed at the same time. The penalty is also masked by parallel loads that are likewise on the critical path (such as loads from the Te tables or round key). Those delays occur anyways, so the fact that we are also loading (or storing to) the stack at the same time does not add to the cycle count.

In either instance, we tin meliorate upon the code that GCC (4.one.1 in this instance) emits. In the 64-bit code, nosotros meet a pairing of "shrq $24, %rdx" and "and1 $255,%edx". The andl performance is non required since only the lower 32 bits of %rdx are guaranteed to take anything in them. This potentially saves upwards to 36 cycles over the grade of 9 rounds (depending on how the andl operation pairs upward with other opcodes).

With the 32-bit lawmaking, the double loads from (%esp) (lines two and 3) incur a needless three-cycle penalization. In the case of the AMD Athlon (and Opterons), the load shop unit will short the load performance (in certain circumstances), but the load will always have at least three cycles. Changing the second load to "movl %edx,%ebx" means that we stall waiting for %edx, but the penalty is simply i bike, not iii. That alter alone volition gratuitous up at almost 9*2*4 = 72 cycles from the ix rounds.

Read full chapter

URL:

https://www.sciencedirect.com/science/commodity/pii/B9781597491044500078

Embedded Processor Compages

Peter Barry , Patrick Crowley , in Modern Embedded Computing, 2012

Register Operands

Source and destination operands can be any of the follow registers depending on the education existence executed:

32-chip general purpose registers (EAX, EBC, ECX, EDX, ESI, EDI, ESP, or EBP)

16-scrap general purpose registers (AX, BX, CX, DX, SI, SP, BP)

8-chip general-purpose registers (AH, BH, CH, DH, AL, BL, CL, DL)

Segment registers

EFLAGS register

MMX

Control (CR0 through CR4)

System Table registers (such every bit the Interrupt Descriptor Tabular array register)

Debug registers

Machine-specific registers

On RISC embedded processors, there are generally fewer limitations in the registers that can exist used by instructions. IA-32 often reduces the registers that can exist used as operands for certain instructions.

Read full chapter

URL:

https://www.sciencedirect.com/science/commodity/pii/B9780123914903000059