CPU and Data alignment

CPUs are word oriented, not byte oriented. In a simple CPU, memory is generally configured to return one word (32bits, 64bits, etc) per address strobe, where the bottom two (or more) address lines are generally don’t-care bits. Intel CPUs can perform accesses on non-word boundries for many instructions, however there is a performance penalty as … Read more

Determine word size of my processor

Your assumption about sizeof(int) is untrue; see this. Since you must know the processor, OS and compiler at compilation time, the word size can be inferred using predefined architecture/OS/compiler macros provided by the compiler. However while on simpler and most RISC processors, word size, bus width, register size and memory organisation are often consistently one … Read more

How do cache lines work?

If the cache line containing the byte or word you’re loading is not already present in the cache, your CPU will request the 64 bytes that begin at the cache line boundary (the largest address below the one you need that is multiple of 64). Modern PC memory modules transfer 64 bits (8 bytes) at … Read more

How to determine whether a given Linux is 32 bit or 64 bit?

Try uname -m. Which is short of uname –machine and it outputs: x86_64 ==> 64-bit kernel i686 ==> 32-bit kernel Otherwise, not for the Linux kernel, but for the CPU, you type: cat /proc/cpuinfo or: grep flags /proc/cpuinfo Under “flags” parameter, you will see various values: see “What do the flags in /proc/cpuinfo mean?” Among … Read more

how to set CPU affinity of a particular pthread?

This is a wrapper I’ve made to make my life easier. Its effect is that the calling thread gets “stuck” to the core with id core_id: // core_id = 0, 1, … n-1, where n is the system’s number of cores int stick_this_thread_to_core(int core_id) { int num_cores = sysconf(_SC_NPROCESSORS_ONLN); if (core_id < 0 || core_id … Read more

Why is the size of L1 cache smaller than that of the L2 cache in most of the processors?

L1 is very tightly coupled to the CPU core, and is accessed on every memory access (very frequent). Thus, it needs to return the data really fast (usually within on clock cycle). Latency and throughput (bandwidth) are both performance-critical for L1 data cache. (e.g. four cycle latency, and supporting two reads and one write by … Read more