CopperSmith-Winograd can compute the product of two nxn matrices performing O(n^2.373) operations. However, we cannot perform each operation in constant time. Let's assume that each element of the matrices has at most k digits.
Using the naive algorithm, each multiplication requires time O(k^2). If we use the Schonhage-Strassen algorithm, based on Fast Fourier Transform, we can perform each multiplication in time O(k log(k) log(log(k))).
Thanks for your reply. I mean "what algorithm is the best to implement for the fastest multiplication in practical situations" ? Actually there are several algorithm exist for matrix multiplication. I am trying to find the best one in terms of runtime complexity.
mostly specialised for certain conditions (sparse matrix, band matrix, huge/small matrix, etc.). You have to try out and experiment with them before you go into "big production" phase. If you know special features of your matrices, certainly you can tune such given algorithms for your purposes. Furthermore, if you have access to special hardware features (e.g. large number of fast registers), you could try to implement them by writing special assembler routines, making powerful use of these features.
For me, this project sounds quite exciting to find the best solution for a special purpose matrix multiplication. Good luck!
I suggest you to take a look at the following book.
Ilse C.F. Ipsen, Numerical Matrix Analysis: Linear Systems and Least Squares, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, 2009, Page 9.
To evaluate A = B x C, you first transpose C, and then do A = Bx'Ct, where you trivially interchange the indexes of Ct. This is 20x faster than the textbook definition because you both in B and Ct then follow the cache-lines instead access data across the cache lines (this assumes that your programming language store a matrix by row (row-major) like C and Java. But if you use Fortran then you transpose B, because Fortran is column major language)