calling-convention – Make Me Engineer

Why is RCX not used for passing parameters to system calls, being replaced with R10? [duplicate]

June 3, 2023 by Tarik

X86-64 system calls use syscall instruction. This instruction saves return address to rcx, and after that it loads rip from IA32_LSTAR MSR. I.e. rcx is immediately destroyed by syscall. This is the reason why rcx had to be replaced for system call ABI. This same syscall instruction also saves rflags into r11, and then masks … Read more

Why does the x86-64 System V calling convention pass args in registers instead of just the stack?

May 19, 2023 by Tarik

instead of put the first 6 arguments in registers just to move them onto the stack in the function prologue? I was looking at some code that gcc generated and that’s what it always did. Then you forgot to enable optimization. gcc -O0 spills everything to memory so you can modify them with a debugger … Read more

Is garbage allowed in high bits of parameter and return value registers in x86-64 SysV ABI?

May 17, 2023 by Tarik

It looks like you have two questions here: Do the high bits of a return value need to be zeroed before returning? (And do the high bits of arguments need to be zeroed before calling?) What are the costs/benefits associated with this decision? The answer to the first question is no, there can be garbage … Read more

Why can a T* be passed in register, but a unique_ptr cannot?

May 8, 2023 by Tarik

Is this actually an ABI requirement, or maybe it’s just some pessimization in certain scenarios? One example is System V Application Binary Interface AMD64 Architecture Processor Supplement. This ABI is for 64-bit x86-compatible CPUs (Linux x86_64 architecure). It is followed on Solaris, Linux, FreeBSD, macOS, Windows Subsystem for Linux: If a C++ object has either … Read more

Segmentation fault on printf – NASM 64bit Linux

May 6, 2023 by Tarik

The problem is with your stack usage. First, the ABI docs mandate rsp be 16 byte aligned before a call. Since a call will push an 8 byte return address on the stack, you need to adjust rsp by a multiple of 16 plus 8 to get back to 16-byte alignment. The 16 * n … Read more

Does each PUSH instruction push a multiple of 8 bytes on x64?

May 5, 2023 by Tarik

PUSH Operand Size in 64-bit mode The size of the value pushed on the stack and the amount that the stack pointer is adjusted by depends on the operand size of the PUSH instruction. In 64-bit mode the operand size can only be 16-bit or 64-bit. It’s not possible to encode a 32-bit PUSH instruction … Read more

C++ on x86-64: when are structs/classes passed and returned in registers?

April 16, 2023 by Tarik

The ABI specification is defined here. A newer version is available here. I assume the reader is accustomed to the terminology of the document and that they can classify the primitive types. If the object size is larger than two eight-bytes, it is passed in memory: struct foo { unsigned long long a; unsigned long … Read more

How do C compilers implement functions that return large structures?

December 2, 2022 by Tarik

None; no copies are done. The address of the caller’s Data return value is actually passed as a hidden argument to the function, and the createData function simply writes into the caller’s stack frame. This is known as the named return value optimisation. Also see the c++ faq on this topic. commercial-grade C++ compilers implement … Read more

Why not store function parameters in XMM vector registers?

November 22, 2022 by Tarik

Most functions don’t have more than 6 integer parameters, so this is really a corner case. Passing some excess integer params in xmm registers would make the rules for where to find floating point args more complicated, for little to no benefit. Besides the fact that it probably wouldn’t make code any faster. A further … Read more

Why does gcc use movl instead of push to pass function args?

November 21, 2022 by Tarik

Here is what the gcc manual has to say about it: -mpush-args -mno-push-args Use PUSH operations to store outgoing parameters. This method is shorter and usually equally fast as method using SUB/MOV operations and is enabled by default. In some cases disabling it may improve performance because of improved scheduling and reduced dependencies. -maccumulate-outgoing-args If … Read more