The ARM64 (AARCH64) stack
I was doing some reading on ARM64 assembly and ran across the following code snippet:
STP w3, w2, [sp, #-16]! // push first pair, create space for second
STP w1, w0, [sp, #8]
What exactly is going on? Why are these two instructions adjusting the stack pointer one way for the first instruction and the other way for the second? How are the arguments actually placed on the stack?
Let’s take a look. Here is the stack before the above instructions execute:
The iOS ABI Function Call Guide specifies that the stack grows downwards, and the Stack Pointer (SP) points to the bottom of the stack (in technical terms the iOS stack is full-descending, where full means that the SP points to the location in which the last item was stored). We also know that the SP may be set to address any byte in memory but according to the Procedure Call Standard for the ARM 64-bit Architecture it must be 16-byte aligned (that is, SP mod 16 = 0) whenever it will be used to access memory.
The Red Zone, a 128 byte area immediately below the SP, can potentially be used for local variables but is otherwise not relevant to this discussion.
Both of the subject instructions are Store Pair of Registers which store two 32-bit words or two 64-bit doublewords from two registers into an address. In this case, the first instruction will move the contents of W3 and W2 (note that the W designation means we are dealing with the lower word of the corresponding X register) into the memory location pointed to by the SP.
First, the SP is adjusted by -16 bytes (i.e. we add 16 bytes to the bottom of the stack) and since the addressing mode is pre-index the resulting address is written back to the SP. If we could set a breakpoint in this instruction after the address is calculated and the SP is updated, the stack would look now like this:
After the SP is adjusted, the first argument (W3) is moved to the address pointed to by [SP]. At this point the stack looks like this:
and then W2 is moved to the address SP + sizeof W3 resulting in this:
W3 and W2 are each 4 bytes, so they only fill the bottom 8 bytes of the newly created 16 bytes.
The second instruction then uses offset addressing mode and calculates the starting address by adding 8 to the SP (without modifying it this time) and then stores W1 and W0 there. First W1 is stored at the location pointed to by SP + 8:
And then W0 is written to the memory location calculated by SP + 8 + sizeof W1 and we wind up with:
References:
- Procedure Call Standard for the ARM 64-bit Architecture
- iOS ABI Function Call Guide