How to get the number of CPUs in Linux using C?
#include <unistd.h> long number_of_processors = sysconf(_SC_NPROCESSORS_ONLN);
#include <unistd.h> long number_of_processors = sysconf(_SC_NPROCESSORS_ONLN);
CPUs are word oriented, not byte oriented. In a simple CPU, memory is generally configured to return one word (32bits, 64bits, etc) per address strobe, where the bottom two (or more) address lines are generally don’t-care bits. Intel CPUs can perform accesses on non-word boundries for many instructions, however there is a performance penalty as … Read more
Your assumption about sizeof(int) is untrue; see this. Since you must know the processor, OS and compiler at compilation time, the word size can be inferred using predefined architecture/OS/compiler macros provided by the compiler. However while on simpler and most RISC processors, word size, bus width, register size and memory organisation are often consistently one … Read more
If the cache line containing the byte or word you’re loading is not already present in the cache, your CPU will request the 64 bytes that begin at the cache line boundary (the largest address below the one you need that is multiple of 64). Modern PC memory modules transfer 64 bits (8 bytes) at … Read more
Try uname -m. Which is short of uname –machine and it outputs: x86_64 ==> 64-bit kernel i686 ==> 32-bit kernel Otherwise, not for the Linux kernel, but for the CPU, you type: cat /proc/cpuinfo or: grep flags /proc/cpuinfo Under “flags” parameter, you will see various values: see “What do the flags in /proc/cpuinfo mean?” Among … Read more
The platform.processor() function returns the processor name as a string. >>> import platform >>> platform.processor() ‘Intel64 Family 6 Model 23 Stepping 6, GenuineIntel’
This is a wrapper I’ve made to make my life easier. Its effect is that the calling thread gets “stuck” to the core with id core_id: // core_id = 0, 1, … n-1, where n is the system’s number of cores int stick_this_thread_to_core(int core_id) { int num_cores = sysconf(_SC_NPROCESSORS_ONLN); if (core_id < 0 || core_id … Read more
L1 is very tightly coupled to the CPU core, and is accessed on every memory access (very frequent). Thus, it needs to return the data really fast (usually within on clock cycle). Latency and throughput (bandwidth) are both performance-critical for L1 data cache. (e.g. four cycle latency, and supporting two reads and one write by … Read more