However, you could take a look at the LLVM sub-project "polly". It seems that the polly optimizer provides automatic OpenMP code generation from C-Code to acheive multithreading.
http://polly.llvm.org/
KernelGen is a prototype for automatic generation of GPU code (NVIDIA CUDA and OpenCL). The slides include a comparison with the automatic parallelism detection of the commercial PGI compiler (pgroup.com).
As mentioned on their website, both Polly and KernelGen are still experimental:
"It is expected to crash, produce invalid code or to hang in complex calculations even for simple examples."
Maybe it is possible to translate your binary (or only one specific function of it) into the LLVM intermediate representation (LLVM IR) and then optimize this LLVM IR with an optimizer that supports LLVM IR as input. But I really doubt that this will lead to success.