When OMP_PLACES is set to "cores", each OpenMP thread binds to one core And when OMP_PLACES is set to "threads", each OpenMP thread binds to one hyperthread.įig 1: CPUs, cores, and sockets on a Cori Haswell node.īelow is the numactl -H result from a Haswell compute node: Core 0 has 2 hyperthreads, with the logical CPUs numbered as 0 and 32 Core 1 has logical CPUs of 1 and 33, and so on. Socket 1 has physical cores 0 to 15, and socket 2 has physical cores 16 to 31. Each processor has 16 cores, and each core has 2 hyperthreads. ![]() Each node contains 2 processors There is 1 socket per processor, thus 2 sockets per node. Node architecture ¶ Cori Haswell ¶įigure 1 below illustrates a Haswell compute node. A combination of OpenMP environment variables and runtime flags are needed for different compilers and for the batch scheduler used on the system. Improper process and thread affinity could slow down code performance significantly. Threads accessing memory in a remote NUMA domain is slower than accessing memory in a local NUMA domain. Modern processors have multiple sockets and NUMA (Non-Uniform Memory Access) domains. Memory locality is the degree to which data resides in memory that is close to the processors/threads working with the data. ![]() ![]() It helps to take advantage of the local process state and to achieve better memory locality. Thread affinity means to map threads onto a particular subset of CPUs (called "places") that belong to the parent process (such as an MPI process) and to bind them to these places so the OS cannot migrate them to different places. It is important to spread MPI processes evenly onto different NUMA nodes. Process affinity (or CPU pinning) means to bind each MPI process to a CPU or a range of CPUs on the node.
0 Comments
Leave a Reply. |