There are several implementations of BLAS - Intel MKL, ESSL, GotoBLAS and its more recent variants: OpenBLAS (that I use myself) and Survive GotoBLAS...

But they are designed for single (yet multi-core!) computers.

What about distributed environments? AFAIK, MKL contains an implementation of PBLAS, but what other than that? The reference Netlib implementation?

What can you suggest?

Similar questions and discussions