There are several implementations of BLAS - Intel MKL, ESSL, GotoBLAS and its more recent variants: OpenBLAS (that I use myself) and Survive GotoBLAS...
But they are designed for single (yet multi-core!) computers.
What about distributed environments? AFAIK, MKL contains an implementation of PBLAS, but what other than that? The reference Netlib implementation?
What can you suggest?