I have a problem when coding using OpenMp on NUMA machines. The code does not scale well, because of having memory allocated on some nodes and accessed by other nodes.
I tried a lot of things but all tricks did not work as expected. It would be good if some one can lead me to a good article/book for beginners.