The ABI specification is defined here.
A newer version is available here.
I assume the reader is accustomed to the terminology of the document and that they can classify the primitive types.
If the object size is larger than two eight-bytes, it is passed in memory:
struct foo
{
unsigned long long a;
unsigned long long b;
unsigned long long c; //Commenting this gives mov rax, rdi
};
unsigned long long foo(struct foo f)
{
return f.a; //mov rax, QWORD PTR [rsp+8]
}
If it is non POD, it is passed in memory:
struct foo
{
unsigned long long a;
foo(const struct foo& rhs){} //Commenting this gives mov rax, rdi
};
unsigned long long foo(struct foo f)
{
return f.a; //mov rax, QWORD PTR [rdi]
}
Copy elision is at work here
If it contains unaligned fields, it passed in memory:
struct __attribute__((packed)) foo //Removing packed gives mov rax, rsi
{
char b;
unsigned long long a;
};
unsigned long long foo(struct foo f)
{
return f.a; //mov rax, QWORD PTR [rsp+9]
}
If none of the above is true, the fields of the object are considered.
If one of the field is itself a struct/class the procedure is recursively applied.
The goal is to classify each of the two eight-bytes (8B) in the object.
The the class of the fields of each 8B are considered.
Note that an integral number of fields always totally occupy one 8B thanks to the alignment requirement of above.
Set C be the class of the 8B and D be the class of the field in consideration class.
Let new_class
be pseudo-defined as
cls new_class(cls D, cls C)
{
if (D == NO_CLASS)
return C;
if (D == MEMORY || C == MEMORY)
return MEMORY;
if (D == INTEGER || C == INTEGER)
return INTEGER;
if (D == X87 || C == X87 || D == X87UP || C == X87UP)
return MEMORY;
return SSE;
}
then the class of the 8B is computed as follow
C = NO_CLASS;
for (field f : fields)
{
D = get_field_class(f); //Note this may recursively call this proc
C = new_class(D, C);
}
Once we have the class of each 8Bs, say C1 and C2, than
if (C1 == MEMORY || C2 == MEMORY)
C1 = C2 = MEMORY;
if (C2 == SSEUP AND C1 != SSE)
C2 = SSE;
Note This is my interpretation of the algorithm given in the ABI document.
Example
struct foo
{
unsigned long long a;
long double b;
};
unsigned long long foo(struct foo f)
{
return f.a;
}
The 8Bs and their fields
First 8B: a
Second 8B: b
a
is INTEGER, so the first 8B is INTEGER.
b
is X87 and X87UP so the second 8B is MEMORY.
The final class is MEMORY for both 8Bs.
Example
struct foo
{
double a;
long long b;
};
long long foo(struct foo f)
{
return f.b; //mov rax, rdi
}
The 8Bs and their fields
First 8B: a
Second 8B: b
a
is SSE, so the first 8B is SSE.
b
is INTEGER so the second 8B is INTEGER.
The final classes are the one calculated.
Return values
The values are returned accordingly to their classes:
-
MEMORY
The caller passes an hidden, first, argument to the function for it to store the result into.
In C++ this often involves a copy elision/return value optimisation.
This address must be returned back intoeax
, thereby returning MEMORY classes “by reference” to an hidden, caller, allocated buffer.If the type has class MEMORY, then the caller provides space for the return
value and passes the address of this storage in %rdi as if it were the first
argument to the function. In effect, this address becomes a “hidden” first
argument.
On return %rax will contain the address that has been passed in by the
caller in %rdi. -
INTEGER and POINTER
The registersrax
andrdx
as needed. -
SSE and SSEUP
The registersxmm0
andxmm1
as needed. -
X87 AND X87UP
The registerst0
PODs
The technical definition is here.
The definition from the ABI is reported below.
A de/constructor is trivial if it is an implicitly-declared default de/constructor and if:
• its class has no virtual functions and no virtual base classes, and
• all the direct base classes of its class have trivial de/constructors, and
• for all the nonstatic data members of its class that are of class type (or array thereof), each such class has a trivial de/constructor.
Note that each 8B is classified independently so that each one can be passed accordingly.
Particularly, they may end up on the stack if there are no more parameter registers left.