Matrix Multiplication Algorithm

From GM-RKB
Jump to navigation Jump to search

A Matrix Multiplication Algorithm is a matrix processing algorithm that can be implemented by a matrix multiplication system solve a matrix multiplication task.



References

2014

  • (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/Matrix_multiplication#Algorithms_for_efficient_matrix_multiplication Retrieved:2014-5-4.
    • The running time of square matrix multiplication, if carried out naïvely, is . The running time for multiplying rectangular matrices (one -matrix with one -matrix) is , however, more efficient algorithms exist, such as Strassen's algorithm, devised by Volker Strassen in 1969 and often referred to as "fast matrix multiplication". It is based on a way of multiplying two -matrices which requires only 7 multiplications (instead of the usual 8), at the expense of several additional addition and subtraction operations. Applying this recursively gives an algorithm with a multiplicative cost of [math]\displaystyle{ O( n^{\log_{2}7}) \approx O(n^{2.807}) }[/math]. Strassen's algorithm is more complex, and the numerical stability is reduced compared to the naïve algorithm. Nevertheless, it appears in several libraries, such as BLAS, where it is significantly more efficient for matrices with dimensions n > 100, [1] and is very useful for large matrices over exact domains such as finite fields, where numerical stability is not an issue. The current algorithm with the lowest known exponent is a generalization of the Coppersmith–Winograd algorithm that has an asymptotic complexity of thanks to Vassilevska Williams. [2] This algorithm, and the Coppersmith-Winograd algorithm on which it is based, are similar to Strassen's algorithm: a way is devised for multiplying two -matrices with fewer than multiplications, and this technique is applied recursively. However, the constant coefficient hidden by the Big O notation is so large that these algorithms are only worthwhile for matrices that are too large to handle on present-day computers. Since any algorithm for multiplying two -matrices has to process all -entries, there is an asymptotic lower bound of operations. Raz (2002) proves a lower bound of for bounded coefficient arithmetic circuits over the real or complex numbers. Cohn et al. (2003, 2005) put methods such as the Strassen and Coppersmith–Winograd algorithms in an entirely different group-theoretic context, by utilising triples of subsets of finite groups which satisfy a disjointness property called the triple product property (TPP). They show that if families of wreath products of Abelian groups with symmetric groups realise families of subset triples with a simultaneous version of the TPP, then there are matrix multiplication algorithms with essentially quadratic complexity. Most researchers believe that this is indeed the case. [3] However, Alon, Shpilka and Umans have recently shown that some of these conjectures implying fast matrix multiplication are incompatible with another plausible conjecture, the sunflower conjecture. [4] Because of the nature of matrix operations and the layout of matrices in memory, it is typically possible to gain substantial performance gains through use of parallelization and vectorization. It should therefore be noted that some lower time-complexity algorithms on paper may have indirect time complexity costs on real machines. Freivalds' algorithm is a simple Monte Carlo algorithm that given matrices verifies in time if {{, where is the size of fast memory. [5] The naïve algorithm is then used over the block matrices, computing products of submatrices entirely in fast memory. This reduces communication bandwidth to {{2D mesh, one submatrix of the result can be assigned to each processor, and the product can be computed with each processor transmitting words, which is asymptotically optimal assuming that each node stores the minimum elements. This can be improved by the 3D algorithm, which arranges the processors in a 3D cube mesh, assigning every product of two input submatrices to a single processor. The result submatrices are then generated by performing a reduction over each row. This algorithm transmits words per processor, which is asymptotically optimal. However, this requires replicating each input matrix element times, and so requires a factor of more memory than is needed to store the inputs. This algorithm can be combined with Strassen to further reduce runtime. "2.5D" algorithms provide a continuous tradeoff between memory usage and communication bandwidth.
  1. Press 2007, p. 108.
  2. The original algorithm was presented by Don Coppersmith and Shmuel Winograd in 1990, has an asymptotic complexity of .
  3. Robinson, 2005.
  4. Alon, Shpilka, Umans, On Sunflowers and Matrix Multiplication
  5. Lynn Elliot Cannon, A cellular computer to implement the Kalman Filter Algorithm, Technical report, Ph.D. Thesis, Montana State University, 14 July 1969.