Knowledge

Matrix multiplication algorithm

Source 📝

3977:, even in commutative field. On 4x4 matrices, AlphaTensor unexpectedly discovered a solution with 47 multiplication steps, an improvement over the 49 required with Strassen’s algorithm of 1969, albeit restricted to mod 2 arithmetic. Similarly, AlphaTensor solved 5x5 matrices with 96 rather than Strassen's 98 steps. Based on the surprising discovery that such improvements exist, other researchers were quickly able to find a similar independent 4x4 algorithm, and separately tweaked Deepmind's 96-step 5x5 algorithm down to 95 steps in mod 2 arithmetic and to 97 in normal arithmetic. Some algorithms were never discovered before, e.g. (4, 5, 5) got improved to 76 from 80 in normal and mod 2 arithmetics. 5287: 5068: 1653: 448: 2584: 1247: 3697:
are represented in the same basis. Karstadt and Schwartz computed in different bases and traded 3 additions for less expensive basis transformations. They also proved that one cannot go below 12 additions per step using different bases. In subsequent work Beniamini et el. applied this base change trick to more general decompositions than 2x2-block matrices and improved the leading constant for their run times.
1228: 1648:{\displaystyle {\begin{pmatrix}C_{11}&C_{12}\\C_{21}&C_{22}\\\end{pmatrix}}={\begin{pmatrix}A_{11}&A_{12}\\A_{21}&A_{22}\\\end{pmatrix}}{\begin{pmatrix}B_{11}&B_{12}\\B_{21}&B_{22}\\\end{pmatrix}}={\begin{pmatrix}A_{11}B_{11}+A_{12}B_{21}&A_{11}B_{12}+A_{12}B_{22}\\A_{21}B_{11}+A_{22}B_{21}&A_{21}B_{12}+A_{22}B_{22}\\\end{pmatrix}}} 3689: 3421: 4216: 995: 3696:
It is known that a Strassen-like algorithm with a 2x2-block matrix step requires at least 7 block matrix multiplications. In 1976 Probert showed that such an algorithm requires at least 15 additions (including subtractions), however, a hidden assumption was that the blocks and the 2x2-block matrix
5092:
On modern architectures with hierarchical memory, the cost of loading and storing input matrix elements tends to dominate the cost of arithmetic. On a single machine this is the amount of data transferred between RAM and cache, while on a distributed memory multi-node machine it is the amount
1863:
A variant of this algorithm that works for matrices of arbitrary shapes and is faster in practice splits matrices in two instead of four submatrices, as follows. Splitting a matrix now means dividing it into two parts of equal size, or as close to equal sizes as possible in the case of odd
5273:
more memory than is needed to store the inputs. This algorithm can be combined with Strassen to further reduce runtime. "2.5D" algorithms provide a continuous tradeoff between memory usage and communication bandwidth. On modern distributed computing environments such as
2417: 6095:
Fawzi, Alhussein; Balog, Matej; Huang, Aja; Hubert, Thomas; Romera-Paredes, Bernardino; Barekatain, Mohammadamin; Novikov, Alexander; R. Ruiz, Francisco J.; Schrittwieser, Julian; Swirszcz, Grzegorz; Silver, David; Hassabis, Demis; Kohli, Pushmeet (October 2022).
4221:
can be performed independently of each other, as can the four summations (although the algorithm needs to "join" the multiplications before doing the summations). Exploiting the full parallelism of the problem, one obtains an algorithm that can be expressed in
3973:(mod 2 arithmetic). The best "practical" (explicit low-rank decomposition of a matrix multiplication tensor) algorithm found ran in O(n). Finding low-rank decompositions of such tensors (and beyond) is NP-hard; optimal multiplication even for 3x3 matrices 2227: 3934:
that used a single-player game analogy to invent thousands of matrix multiplication algorithms, including some previously discovered by humans and some that were not. Operations were restricted to the non-commutative ground field (normal arithmetic) and
2077: 2443:
environment where cache sizes are effectively dynamic due to other processes taking up cache space. (The simple iterative algorithm is cache-oblivious as well, but much slower in practice if the matrix layout is not adapted to the algorithm.)
458:
The three loops in iterative matrix multiplication can be arbitrarily swapped with each other without an effect on correctness or asymptotic running time. However, the order can have a considerable impact on practical performance due to the
3153: 2978: 3533: 2567: 1223:{\displaystyle C={\begin{pmatrix}C_{11}&C_{12}\\C_{21}&C_{22}\\\end{pmatrix}},\,A={\begin{pmatrix}A_{11}&A_{12}\\A_{21}&A_{22}\\\end{pmatrix}},\,B={\begin{pmatrix}B_{11}&B_{12}\\B_{21}&B_{22}\\\end{pmatrix}},} 3265: 4008: 5245:
which arranges the processors in a 3D cube mesh, assigning every product of two input submatrices to a single processor. The result submatrices are then generated by performing a reduction over each row. This algorithm transmits
5321:-1 steps are needed. The performance improves further for repeated computations leading to 100% efficiency. The cross-wired mesh array may be seen as a special case of a non-planar (i.e. multilayered) processing structure. 5587: 2643:-matrices which require only 7 multiplications (instead of the usual 8), at the expense of several additional addition and subtraction operations. Applying this recursively gives an algorithm with a multiplicative cost of 5314:-2 steps although this is reduced to half this number for repeated computations. The standard array is inefficient because the data from the two matrices does not arrive simultaneously and it must be padded with zeroes. 560:
cache misses in the worst case. As of 2010, the speed of memories compared to that of processors is such that the cache misses, rather than the actual calculations, dominate the running time for sizable matrices.
6055:—the wreath product approach is unlikely to be applicable to the large matrix problems that arise in practice. the input matrices must be astronomically large for the difference in time to be apparent. 3527: 3259: 2275: 2712: 273: 1823: 3974: 5149:
is the size of fast memory. The naïve algorithm is then used over the block matrices, computing products of submatrices entirely in fast memory. This reduces communication bandwidth to
3970: 2740:
has its merits. A table that compares key aspects of the improved version based on recursive multiplication of 2x2-block matrices via 7 block matrix multiplications follows. As usual
2121: 3052: 2877: 3844: 1965: 3798: 1742: 3850:, Xu, Xu, and Zhou. This algorithm, like all other recent algorithms in this line of research, is a generalization of the Coppersmith–Winograd algorithm, which was given by 2625: 3756: 5964:"Worst-case complexity bounds on algorithms for computing the canonical structure of finite abelian groups and the Hermite and Smith normal forms of an integer matrix" 3818: 3730: 5963: 1658:
which consists of eight multiplications of pairs of submatrices, followed by an addition step. The divide-and-conquer algorithm computes the smaller multiplications
3684:{\displaystyle 4\left({\frac {{\sqrt {3}}n}{\sqrt {M}}}\right)^{\log _{2}7}\cdot M-12n^{2}+1.5n^{2}\cdot \log _{2}\left({\frac {{\sqrt {2}}n}{\sqrt {M}}}\right)+5M} 3058: 2883: 3416:{\displaystyle 4\left({\frac {{\sqrt {3}}n}{\sqrt {M}}}\right)^{\log _{2}7}\cdot M-12n^{2}+3n^{2}\cdot \log _{2}\left({\frac {{\sqrt {2}}n}{\sqrt {M}}}\right)+5M} 2461: 4211:{\displaystyle {\begin{pmatrix}A_{11}B_{11}+A_{12}B_{21}&A_{11}B_{12}+A_{12}B_{22}\\A_{21}B_{11}+A_{22}B_{21}&A_{21}B_{12}+A_{22}B_{22}\\\end{pmatrix}}} 2778: 2758: 5906:
Beniamini, Gal; Cheng, Nathan; Holtz, Olga; Karstadt, Elaye; Schwartz, Oded (2020). "Sparsifying the Operators of Fast Matrix Multiplication Algorithms".
6346:
Agarwal, R.C.; Balle, S. M.; Gustavson, F. G.; Joshi, M.; Palkar, P. (September 1995). "A three-dimensional approach to parallel matrix multiplication".
5330: 6019:
The Coppersmith–Winograd algorithm is not practical, due to the very large hidden constant in the upper bound on the number of multiplications required.
6736: 5335: 2578: 94: 974:
amounts to several orders of magnitude on modern machines, so that the actual calculations dominate the running time, rather than the cache misses.
2736:
Since Strassen's algorithm is actually used in practical numerical software and computer algebra systems improving on the constants hidden in the
6306:
Irony, Dror; Toledo, Sivan; Tiskin, Alexander (September 2004). "Communication lower bounds for distributed-memory matrix multiplication".
5052:
on any real computer. The algorithm isn't practical due to the communication cost inherent in moving data to and from the temporary matrix
6857: 5206:
2D mesh, one submatrix of the result can be assigned to each processor, and the product can be computed with each processor transmitting
5036:
steps, meaning it takes that much time on an ideal machine with an infinite number of processors; therefore, it has a maximum possible
6913: 6882: 5506: 6284: 6183: 5350: 1844: 6877: 6729: 3441: 2412:{\displaystyle C={\begin{pmatrix}A_{1}&A_{2}\end{pmatrix}}{\begin{pmatrix}B_{1}\\B_{2}\end{pmatrix}}=A_{1}B_{1}+A_{2}B_{2}} 6407: 5741: 5630: 5409: 3173: 3858:
in 1990. The conceptual idea of these algorithms are similar to Strassen's algorithm: a way is devised for multiplying two
2646: 6004: 3878:
is so large that these algorithms are only worthwhile for matrices that are too large to handle on present-day computers.
195: 6903: 6836: 6722: 6680:
Van Zee, Field G.; van de Geijn, Robert A. (2015). "BLIS: A Framework for Rapidly Instantiating BLAS Functionality".
5561: 5480: 6867: 5261:
words per processor, which is asymptotically optimal. However, this requires replicating each input matrix element
5116: 2726: 1748: 43:. Many different algorithms have been designed for multiplying matrices on different types of hardware, including 2631:
Algorithms exist that provide better running times than the straightforward ones. The first to be discovered was
3940: 2222:{\displaystyle C=A{\begin{pmatrix}B_{1}&B_{2}\end{pmatrix}}={\begin{pmatrix}AB_{1}&AB_{2}\end{pmatrix}}} 5479:
Alman, Josh; Williams, Virginia Vassilevska (2020), "A Refined Laser Method and Faster Matrix Multiplication",
6067: 5286: 31:
efficient. Applications of matrix multiplication in computational problems are found in many fields including
3847: 110: 5617:. ASPLOS91: 4th Int'l Conference on Architecture Support for Programming Languages & Operating Systems. 3874:
multiplications, and this technique is applied recursively. However, the constant coefficient hidden by the
2998: 2823: 2072:{\displaystyle C={\begin{pmatrix}A_{1}\\A_{2}\end{pmatrix}}{B}={\begin{pmatrix}A_{1}B\\A_{2}B\end{pmatrix}}} 6826: 3999: 3701: 983: 6467: 5345: 3823: 468: 452: 5013:
is a keyword that signal a computation may be run in parallel with the rest of the function call, while
134: 6780: 5729: 5551: 2639:
in 1969 and often referred to as "fast matrix multiplication". It is based on a way of multiplying two
6641:
Goto, Kazushige; van de Geijn, Robert A. (2008). "Anatomy of high-performance matrix multiplication".
3761: 1700: 6831: 5340: 5071:
Block matrix multiplication. In the 2D algorithm, each processor is responsible for one submatrix of
2436: 6655: 5983: 5696: 2594: 6745: 6571: 6513: 6360: 6320: 5733: 2439:: there is no tuning parameter required to get optimal cache performance, and it behaves well in a 51:
systems, where the computational work is spread over multiple processors (perhaps over a network).
5393: 5355: 3881: 89:). Better asymptotic bounds on the time required to multiply matrices have been known since the 6908: 6650: 6566: 6508: 6384: 6355: 6315: 5978: 5691: 3735: 418: 55: 54:
Directly applying the mathematical definition of matrix multiplication gives an algorithm that
6212: 5928: 3148:{\displaystyle 5\left({\frac {{\sqrt {3}}n}{\sqrt {M}}}\right)^{\log _{2}7}\cdot M-15n^{2}+3M} 2973:{\displaystyle 6\left({\frac {{\sqrt {3}}n}{\sqrt {M}}}\right)^{\log _{2}7}\cdot M-18n^{2}+3M} 6790: 5307: 5119:
that partitions each input matrix into a block matrix whose elements are submatrices of size
5108: 3885: 3803: 3715: 1663: 1233:
which works for all square matrices whose dimensions are powers of two, i.e., the shapes are
467:
use of the algorithm; which order is best also depends on whether the matrices are stored in
460: 48: 20: 5385: 6785: 6602:(2009). "A class of parallel tiled linear algebra algorithms for multicore architectures". 6441: 6385:"Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms" 6109: 6000: 5539: 5025: 2562:{\displaystyle \Theta \left(m+n+p+{\frac {mn+np+mp}{b}}+{\frac {mnp}{b{\sqrt {M}}}}\right)} 40: 32: 5290:
Matrix multiplication completed in 2n-1 steps for two n×n matrices on a cross-wired mesh.
8: 6764: 5386: 4223: 2715: 414: 64: 36: 24: 6445: 6257: 6161:
Kauers, Manuel; Moosbauer, Jakob (2022-12-02). "Flip Graphs for Matrix Multiplication".
6113: 6697: 6668: 6629: 6611: 6537: 6431: 6276: 6162: 6138: 6097: 5907: 5774: 5486: 5457: 5433: 3995: 2763: 2743: 2632: 122: 90: 44: 5948: 6580: 6536:
Kak, S. (2014). "Efficiency of matrix multiplication on the cross-wired mesh array".
6522: 6403: 6143: 6125: 5810: 5793: 5778: 5737: 5723: 5655: 5626: 5557: 5405: 6265:
Proceedings of the thirteenth annual ACM symposium on Theory of computing - STOC '81
5831:
Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures
6800: 6701: 6689: 6672: 6660: 6621: 6576: 6518: 6479: 6395: 6365: 6325: 6280: 6268: 6221: 6133: 6117: 6048:
Even if someone manages to prove one of the conjectures—thereby demonstrating that
5988: 5943: 5888: 5861: 5834: 5805: 5766: 5701: 5664: 5618: 5535: 5397: 2440: 475: 5879:
Probert, Robert L. (1976). "On the additive complexity of matrix multiplication".
5226:
words, which is asymptotically optimal assuming that each node stores the minimum
6841: 6399: 6098:"Discovering faster matrix multiplication algorithms with reinforcement learning" 5996: 5826: 5295: 4002:. These are based on the fact that the eight recursive matrix multiplications in 3855: 3851: 3705: 2636: 403: 98: 6708: 6625: 6484: 6424: 5854:"Pebbling Game and Alternative Basis for High Performance Matrix Multiplication" 5401: 2431:
The cache miss rate of recursive matrix multiplication is the same as that of a
6759: 6633: 6329: 6121: 5719: 5547: 3931: 3875: 2737: 1918: 86: 5428:
Williams, Virginia Vassilevska; Xu, Yinzhan; Xu, Zixuan; Zhou, Renfei (2024),
5067: 278:
From this, a simple algorithm can be constructed which loops over the indices
6897: 6805: 6599: 6129: 5651: 5381: 1843:
to sum the four pairs of resulting matrices element-wise. Application of the
447: 6664: 6240: 6032: 5838: 2718:
is reduced compared to the naïve algorithm, but it is faster in cases where
6714: 6147: 2730: 2583: 987: 540:, every iteration of the inner loop (a simultaneous sweep through a row of 6272: 5622: 578:
version, where the matrix is implicitly divided into square tiles of size
97:) remains unknown. As of April 2024, the best announced bound on the 5682:
Miller, Webb (1975), "Computational complexity and numerical stability",
5317:
The result is even faster on a two-layered cross-wired mesh, where only 2
2447:
The number of cache misses incurred by this algorithm, on a machine with
2435:
iterative version, but unlike that algorithm, the recursive algorithm is
2432: 574: 39:
and in seemingly unrelated problems such as counting the paths through a
6369: 6225: 5668: 6468:"A faster parallel algorithm for matrix multiplication on a mesh array" 6392:
Proceedings of the 17th International Conference on Parallel Processing
5865: 5770: 5543: 4226: 6821: 6499:
Kak, S (1988). "A two-layered mesh array for matrix multiplication".
5853: 5275: 2784:
Progress for Strassen-like recursive 2x2-block matrix multiplication
1659: 464: 6693: 5992: 5892: 5705: 5087: 2591:
over time for the computational complexity of matrix multiplication
6167: 5912: 5491: 5462: 5438: 3927: 125:
because of the large constants and cannot be realized practically.
6616: 6542: 6436: 2729:. It is very useful for large matrices over exact domains such as 5482:
32nd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2021)
5037: 6220:(Ph.D.). Massachusetts Institute of Technology. pp. 54–57. 5757:
Strassen, Volker (1969). "Gaussian Elimination is not Optimal".
5717: 121:
time, given by Alman and Williams. However, this algorithm is a
6872: 6862: 5613:
Lam, Monica S.; Rothberg, Edward E.; Wolf, Michael E. (1991).
5588:"6.172 Performance Engineering of Software Systems, Lecture 8" 5615:
The Cache Performance and Optimizations of Blocked Algorithms
5278:, specialized multiplication algorithms have been developed. 5169:, which is asymptotically optimal (for algorithms performing 3936: 1828:
accounting for the eight recursive calls on matrices of size
421:
is to assume that the inputs are all square matrices of size
6242:
A cellular computer to implement the Kalman Filter Algorithm
5017:
waits for all previously "forked" computations to complete.
6597: 5905: 5556:(3rd ed.). MIT Press and McGraw-Hill. pp. 75–79. 5093:
transferred between nodes; in either case it is called the
6345: 5534: 3704:
how well Strassen's algorithm can be improved in terms of
5430:
New Bounds for Matrix Multiplication: from Alpha to Omega
3522:{\displaystyle 5n^{\log _{2}7}-4n^{2}+1.5n^{2}\log _{2}n} 928:
In the idealized cache model, this algorithm incurs only
5294:
There are a variety of algorithms for multiplication on
6184:"AI Reveals New Possibilities in Matrix Multiplication" 6033:"Toward an Optimal Algorithm for Matrix Multiplication" 3254:{\displaystyle 5n^{\log _{2}7}-4n^{2}+3n^{2}\log _{2}n} 5075:. In the 3D algorithm, every pair of submatrices from 4017: 3980: 2328: 2290: 2180: 2139: 2028: 1980: 1454: 1387: 1323: 1256: 1158: 1084: 1010: 6425:"Dimension Independent Matrix Square Using MapReduce" 6094: 4011: 3943: 3826: 3806: 3764: 3758:
matrix over a field can be multiplied together using
3738: 3718: 3536: 3444: 3268: 3176: 3061: 3001: 2886: 2826: 2766: 2746: 2707:{\displaystyle O(n^{\log _{2}7})\approx O(n^{2.807})} 2649: 2597: 2464: 2278: 2124: 1968: 1751: 1703: 1250: 998: 506:
cache lines), the above algorithm is sub-optimal for
198: 6422: 5585: 5507:"Matrix Multiplication Inches Closer to Mythic Goal" 5097:. The naïve algorithm using three nested loops uses 6679: 5929:"Matrix multiplication via arithmetic progressions" 5926: 5454:
Faster Matrix Multiplication via Asymmetric Hashing
5331:
Computational complexity of mathematical operations
564:The optimal variant of the iterative algorithm for 548:) incurs a cache miss when accessing an element of 268:{\displaystyle c_{ij}=\sum _{k=1}^{m}a_{ik}b_{kj}.} 113:, Xu, Xu, and Zhou. This improves on the bound of 5725:Numerical Recipes: The Art of Scientific Computing 4210: 3964: 3838: 3812: 3792: 3750: 3724: 3683: 3521: 3415: 3253: 3147: 3046: 2972: 2871: 2772: 2752: 2706: 2619: 2561: 2411: 2221: 2071: 1817: 1736: 1690:The complexity of this algorithm as a function of 1647: 1222: 267: 6598:Buttari, Alfredo; Langou, Julien; Kurzak, Jakub; 6305: 5336:Computational complexity of matrix multiplication 5088:Communication-avoiding and distributed algorithms 3991: 2579:Computational complexity of matrix multiplication 1845:master theorem for divide-and-conquer recurrences 982:An alternative to the iterative algorithm is the 95:computational complexity of matrix multiplication 93:in the 1960s, but the optimal time (that is, the 6895: 6640: 5824: 5427: 5306:on a standard two-dimensional mesh using the 2D 5083:that is multiplied is assigned to one processor. 5021:achieves its goal by pointer manipulation only. 2725:or so and appears in several libraries, such as 2714:. Strassen's algorithm is more complex, and the 6557:Kak, S (1988). "Multilayered array computing". 6465: 6382: 6160: 6068:"Discovering novel algorithms with AlphaTensor" 5612: 5586:Amarasinghe, Saman; Leiserson, Charles (2010). 977: 6090: 6088: 5851: 986:for matrix multiplication. This relies on the 417:). A common simplification for the purpose of 6730: 6466:Bae, S.E.; Shinn, T.-W.; Takaoka, T. (2014). 6423:Bosagh Zadeh, Reza; Carlsson, Gunnar (2013). 5825:Karstadt, Elaye; Schwartz, Oded (July 2017). 5478: 5452:Duan, Ran; Wu, Hongxun; Zhou, Renfei (2022), 2733:, where numerical stability is not an issue. 6744: 3985: 3800:field operations. The current best bound on 3732:, is the smallest real number for which any 1818:{\displaystyle T(n)=8T(n/2)+\Theta (n^{2}),} 439:, i.e., cubic in the size of the dimension. 6085: 5927:Coppersmith, Don; Winograd, Shmuel (1990), 5920: 5064:speedup, without using a temporary matrix. 4289:Otherwise, allocate space for a new matrix 294:, computing the above using a nested loop: 6737: 6723: 6341: 6339: 6258:"I/O complexity: The red-blue pebble game" 5961: 5451: 5310:, one can complete the multiplication in 3 3965:{\displaystyle \mathbb {Z} /2\mathbb {Z} } 1847:shows this recursion to have the solution 474:In particular, in the idealized case of a 6682:ACM Transactions on Mathematical Software 6654: 6643:ACM Transactions on Mathematical Software 6615: 6570: 6541: 6512: 6483: 6435: 6359: 6319: 6166: 6137: 5982: 5947: 5911: 5809: 5695: 5490: 5461: 5437: 3958: 3945: 1146: 1072: 6383:Solomonik, Edgar; Demmel, James (2011). 6301: 6299: 6297: 6255: 6181: 6030: 6024: 5827:"Matrix Multiplication, a Little Faster" 5791: 5756: 5646: 5644: 5642: 5504: 5423: 5421: 5376: 5374: 5372: 5370: 5285: 5281: 5066: 5056:, but a more practical variant achieves 2582: 446: 101:of a matrix multiplication algorithm is 27:, much work has been invested in making 6868:Basic Linear Algebra Subprograms (BLAS) 6336: 6290:from the original on December 15, 2019. 6214:Cilk: Efficient Multithreaded Computing 6210: 6206: 6204: 5878: 5718:Press, William H.; Flannery, Brian P.; 5594:. Massachusetts Institute of Technology 4809:(or do a short loop, perhaps unrolled). 2760:gives the dimensions of the matrix and 2572: 1855:, the same as the iterative algorithm. 620:be a new matrix of the appropriate size 552:. This means that the algorithm incurs 314:be a new matrix of the appropriate size 6896: 6394:. Vol. Part II. pp. 90–109. 6238: 5681: 5650: 5530: 5528: 5526: 5474: 5472: 5380: 5241:elements. This can be improved by the 4726:(wait for parallel forks to complete). 3047:{\displaystyle 6n^{\log _{2}7}-5n^{2}} 2872:{\displaystyle 7n^{\log _{2}7}-6n^{2}} 1858: 128: 6718: 6294: 5639: 5581: 5579: 5577: 5575: 5573: 5498: 5418: 5367: 2587:Improvement of estimates of exponent 6239:Cannon, Lynn Elliot (14 July 1969). 6201: 5858:SIAM Journal on Scientific Computing 5852:Schwartz, Oded; Vaknin, Noa (2023). 5445: 431:, in which case the running time is 23:is such a central operation in many 6556: 6535: 6498: 5798:Linear Algebra and Its Applications 5794:"On multiplication of 2×2 matrices" 5523: 5469: 5351:Sparse matrix–vector multiplication 5341:CYK algorithm § Valiant's algorithm 5267:times, and so requires a factor of 4286:(or multiply a small block matrix). 3981:Parallel and distributed algorithms 3839:{\displaystyle \omega <2.371552} 2451:lines of ideal cache, each of size 2426: 1921:version of the iterative algorithm. 469:row-major order, column-major order 135:definition of matrix multiplication 13: 6590: 6245:(Ph.D.). Montana State University. 6182:Brubaker, Ben (23 November 2022). 5570: 2465: 1793: 1719: 14: 6925: 6256:Hong, J. W.; Kung, H. T. (1981). 5722:; Vetterling, William T. (2007). 5505:Hartnett, Kevin (23 March 2021). 5384:(2012). "Sorting and Searching". 442: 6914:Matrix multiplication algorithms 6031:Robinson, Sara (November 2005), 5117:communication-avoiding algorithm 3793:{\displaystyle n^{\omega +o(1)}} 2795:#matrix multiplications per step 1917:is below some threshold, use an 1737:{\displaystyle T(1)=\Theta (1);} 514:stored in row-major order. When 29:matrix multiplication algorithms 6550: 6529: 6492: 6459: 6416: 6376: 6249: 6232: 6175: 6154: 6060: 5955: 5936:Journal of Symbolic Computation 5899: 5872: 5845: 5818: 5785: 5750: 5711: 5675: 5962:Iliopoulos, Costas S. (1989), 5833:. SPAA '17. pp. 101–110. 5606: 5180:In a distributed setting with 3921: 3785: 3779: 3710:matrix multiplication exponent 2701: 2688: 2679: 2653: 2620:{\displaystyle O(n^{\omega })} 2614: 2601: 1809: 1796: 1787: 1773: 1761: 1755: 1728: 1722: 1713: 1707: 16:Algorithm to multiply matrices 1: 5949:10.1016/S0747-7171(08)80013-2 5361: 4000:shared-memory multiprocessors 6581:10.1016/0020-0255(88)90010-2 6523:10.1016/0167-8191(88)90078-6 6400:10.1007/978-3-642-23397-5_10 5811:10.1016/0024-3795(71)90009-7 5298:. For multiplication of two 3992:divide-and-conquer algorithm 3702:theoretical computer science 2780:designates the memory size. 1241:. The matrix product is now 984:divide-and-conquer algorithm 978:Divide-and-conquer algorithm 7: 6626:10.1016/j.parco.2008.10.002 6485:10.1016/j.procs.2014.05.208 6308:J. Parallel Distrib. Comput 5402:10.1007/978-1-84800-070-4_4 5388:The Algorithm Design Manual 5346:Matrix chain multiplication 5324: 2801:total arithmetic operations 1694:is given by the recurrence 486:bytes per cache line (i.e. 453:row- and column-major order 67:operations to multiply two 10: 6930: 6781:System of linear equations 6330:10.1016/j.jpdc.2004.03.021 6211:Randall, Keith H. (1998). 6122:10.1038/s41586-022-05172-4 5730:Cambridge University Press 5657:Cache-Oblivious Algorithms 5553:Introduction to Algorithms 3930:introduced AlphaTensor, a 3868:-matrices with fewer than 3700:It is an open question in 2798:#matrix additions per step 2576: 960:cache misses; the divisor 77:matrices over that field ( 6850: 6832:Cache-oblivious algorithm 6814: 6773: 6752: 6472:Procedia Computer Science 5971:SIAM Journal on Computing 5792:Winograd, Shmuel (1971). 5184:processors arranged in a 5105:communication bandwidth. 3986:Shared-memory parallelism 3751:{\displaystyle n\times n} 572:in row-major layout is a 6904:Numerical linear algebra 6883:General purpose software 6746:Numerical linear algebra 3994:sketched earlier can be 6665:10.1145/1356052.1356053 5839:10.1145/3087556.3087579 5356:Method of Four Russians 5095:communication bandwidth 3813:{\displaystyle \omega } 3725:{\displaystyle \omega } 476:fully associative cache 5860:. pp. C277–C303. 5291: 5084: 4212: 3966: 3840: 3814: 3794: 3752: 3726: 3685: 3523: 3417: 3255: 3149: 3048: 2974: 2873: 2774: 2754: 2708: 2628: 2621: 2563: 2413: 2223: 2073: 1819: 1738: 1649: 1224: 461:memory access patterns 455: 269: 235: 6878:Specialized libraries 6791:Matrix multiplication 6786:Matrix decompositions 6273:10.1145/800076.802486 5623:10.1145/106972.106981 5540:Leiserson, Charles E. 5392:. Springer. pp.  5289: 5282:Algorithms for meshes 5070: 5024:This algorithm has a 4213: 3967: 3888:that, given matrices 3886:Monte Carlo algorithm 3841: 3815: 3795: 3753: 3727: 3706:asymptotic complexity 3686: 3524: 3418: 3256: 3150: 3049: 2975: 2874: 2804:total I/O-complexity 2775: 2755: 2709: 2622: 2586: 2577:Further information: 2564: 2455:bytes, is bounded by 2414: 2224: 2074: 1820: 1739: 1664:scalar multiplication 1650: 1225: 450: 402:This algorithm takes 270: 215: 99:asymptotic complexity 21:matrix multiplication 6709:How To Optimize GEMM 6559:Information Sciences 6267:. pp. 326–333. 5111:, also known as the 5026:critical path length 4478:Parallel execution: 4009: 3941: 3882:Freivalds' algorithm 3824: 3804: 3762: 3736: 3716: 3534: 3442: 3266: 3174: 3059: 2999: 2884: 2824: 2764: 2744: 2647: 2633:Strassen's algorithm 2595: 2573:Sub-cubic algorithms 2462: 2276: 2122: 1966: 1749: 1701: 1248: 996: 471:, or a mix of both. 196: 189:matrix with entries 91:Strassen's algorithm 33:scientific computing 25:numerical algorithms 6765:Numerical stability 6446:2013arXiv1304.1467B 6370:10.1147/rd.395.0575 6114:2022Natur.610...47F 2785: 2716:numerical stability 1859:Non-square matrices 415:asymptotic notation 129:Iterative algorithm 37:pattern recognition 6604:Parallel Computing 6501:Parallel Computing 5866:10.1137/22M1502719 5771:10.1007/BF02165411 5720:Teukolsky, Saul A. 5592:MIT OpenCourseWare 5308:Cannon's algorithm 5292: 5109:Cannon's algorithm 5085: 4208: 4202: 3962: 3836: 3810: 3790: 3748: 3722: 3712:, usually denoted 3681: 3519: 3413: 3251: 3162:Karstadt, Schwartz 3145: 3044: 2970: 2869: 2783: 2770: 2750: 2704: 2629: 2617: 2559: 2409: 2357: 2317: 2219: 2213: 2166: 2069: 2063: 2009: 1815: 1734: 1687:as its base case. 1645: 1639: 1440: 1376: 1309: 1220: 1211: 1137: 1063: 988:block partitioning 456: 419:algorithm analysis 265: 123:galactic algorithm 6891: 6890: 6409:978-3-642-23397-5 5743:978-0-521-88068-8 5663:(Master's). MIT. 5632:978-0-89791-380-5 5544:Rivest, Ronald L. 5536:Cormen, Thomas H. 5411:978-1-84800-069-8 3694: 3693: 3666: 3665: 3655: 3564: 3563: 3553: 3398: 3397: 3387: 3296: 3295: 3285: 3089: 3088: 3078: 2914: 2913: 2903: 2773:{\displaystyle M} 2753:{\displaystyle n} 2552: 2549: 2522: 1870:Inputs: matrices 623:Pick a tile size 6921: 6801:Matrix splitting 6739: 6732: 6725: 6716: 6715: 6705: 6676: 6658: 6637: 6619: 6585: 6584: 6574: 6554: 6548: 6547: 6545: 6533: 6527: 6526: 6516: 6496: 6490: 6489: 6487: 6463: 6457: 6456: 6454: 6452: 6439: 6429: 6420: 6414: 6413: 6389: 6380: 6374: 6373: 6363: 6343: 6334: 6333: 6323: 6303: 6292: 6291: 6289: 6262: 6253: 6247: 6246: 6236: 6230: 6229: 6219: 6208: 6199: 6198: 6196: 6194: 6179: 6173: 6172: 6170: 6158: 6152: 6151: 6141: 6092: 6083: 6082: 6080: 6079: 6074:. 5 October 2022 6072:www.deepmind.com 6064: 6058: 6057: 6054: 6037: 6028: 6022: 6021: 6016: 6015: 6009: 6003:, archived from 5986: 5968: 5959: 5953: 5952: 5951: 5933: 5924: 5918: 5917: 5915: 5903: 5897: 5896: 5876: 5870: 5869: 5849: 5843: 5842: 5822: 5816: 5815: 5813: 5789: 5783: 5782: 5754: 5748: 5747: 5728:(3rd ed.). 5715: 5709: 5708: 5699: 5679: 5673: 5672: 5662: 5648: 5637: 5636: 5610: 5604: 5603: 5601: 5599: 5583: 5568: 5567: 5532: 5521: 5520: 5518: 5517: 5502: 5496: 5495: 5494: 5476: 5467: 5466: 5465: 5449: 5443: 5442: 5441: 5425: 5416: 5415: 5391: 5378: 5272: 5266: 5260: 5240: 5225: 5223: 5222: 5205: 5204: 5203: 5194: 5193: 5192: 5183: 5176: 5168: 5166: 5165: 5148: 5142: 5141: 5140: 5130: 5129: 5128: 5104: 5082: 5078: 5074: 5063: 5055: 5051: 5035: 5020: 4992: 4969: 4946: 4923: 4897: 4888: 4879: 4870: 4861: 4854: 4845: 4836: 4827: 4818: 4808: 4785: 4775:, element-wise: 4774: 4770: 4766: 4746: 4739: 4717: 4687: 4657: 4627: 4597: 4567: 4537: 4507: 4474: 4465: 4456: 4447: 4438: 4431: 4422: 4413: 4404: 4395: 4388: 4379: 4370: 4361: 4352: 4345: 4336: 4327: 4318: 4309: 4302: 4292: 4285: 4262: 4251: 4217: 4215: 4214: 4209: 4207: 4206: 4199: 4198: 4189: 4188: 4176: 4175: 4166: 4165: 4154: 4153: 4144: 4143: 4131: 4130: 4121: 4120: 4107: 4106: 4097: 4096: 4084: 4083: 4074: 4073: 4062: 4061: 4052: 4051: 4039: 4038: 4029: 4028: 3998:in two ways for 3971: 3969: 3968: 3963: 3961: 3953: 3948: 3917: 3907: 3899: 3895: 3891: 3873: 3867: 3845: 3843: 3842: 3837: 3819: 3817: 3816: 3811: 3799: 3797: 3796: 3791: 3789: 3788: 3757: 3755: 3754: 3749: 3731: 3729: 3728: 3723: 3690: 3688: 3687: 3682: 3671: 3667: 3661: 3660: 3656: 3651: 3648: 3639: 3638: 3626: 3625: 3610: 3609: 3588: 3587: 3580: 3579: 3569: 3565: 3559: 3558: 3554: 3549: 3546: 3528: 3526: 3525: 3520: 3512: 3511: 3502: 3501: 3486: 3485: 3470: 3469: 3462: 3461: 3430:Schwartz, Vaknin 3422: 3420: 3419: 3414: 3403: 3399: 3393: 3392: 3388: 3383: 3380: 3371: 3370: 3358: 3357: 3342: 3341: 3320: 3319: 3312: 3311: 3301: 3297: 3291: 3290: 3286: 3281: 3278: 3260: 3258: 3257: 3252: 3244: 3243: 3234: 3233: 3218: 3217: 3202: 3201: 3194: 3193: 3154: 3152: 3151: 3146: 3135: 3134: 3113: 3112: 3105: 3104: 3094: 3090: 3084: 3083: 3079: 3074: 3071: 3053: 3051: 3050: 3045: 3043: 3042: 3027: 3026: 3019: 3018: 2979: 2977: 2976: 2971: 2960: 2959: 2938: 2937: 2930: 2929: 2919: 2915: 2909: 2908: 2904: 2899: 2896: 2878: 2876: 2875: 2870: 2868: 2867: 2852: 2851: 2844: 2843: 2786: 2782: 2779: 2777: 2776: 2771: 2759: 2757: 2756: 2751: 2724: 2713: 2711: 2710: 2705: 2700: 2699: 2678: 2677: 2670: 2669: 2642: 2626: 2624: 2623: 2618: 2613: 2612: 2590: 2568: 2566: 2565: 2560: 2558: 2554: 2553: 2551: 2550: 2545: 2539: 2528: 2523: 2518: 2492: 2454: 2450: 2441:multiprogramming 2418: 2416: 2415: 2410: 2408: 2407: 2398: 2397: 2385: 2384: 2375: 2374: 2362: 2361: 2354: 2353: 2340: 2339: 2322: 2321: 2314: 2313: 2302: 2301: 2263: 2259: 2255: 2228: 2226: 2225: 2220: 2218: 2217: 2210: 2209: 2195: 2194: 2171: 2170: 2163: 2162: 2151: 2150: 2109: 2105: 2078: 2076: 2075: 2070: 2068: 2067: 2057: 2056: 2040: 2039: 2019: 2014: 2013: 2006: 2005: 1992: 1991: 1953: 1949: 1924:Recursive cases: 1916: 1897: 1887: 1883: 1873: 1854: 1842: 1834: 1824: 1822: 1821: 1816: 1808: 1807: 1783: 1743: 1741: 1740: 1735: 1693: 1686: 1654: 1652: 1651: 1646: 1644: 1643: 1636: 1635: 1626: 1625: 1613: 1612: 1603: 1602: 1591: 1590: 1581: 1580: 1568: 1567: 1558: 1557: 1544: 1543: 1534: 1533: 1521: 1520: 1511: 1510: 1499: 1498: 1489: 1488: 1476: 1475: 1466: 1465: 1445: 1444: 1437: 1436: 1425: 1424: 1411: 1410: 1399: 1398: 1381: 1380: 1373: 1372: 1361: 1360: 1347: 1346: 1335: 1334: 1314: 1313: 1306: 1305: 1294: 1293: 1280: 1279: 1268: 1267: 1240: 1236: 1229: 1227: 1226: 1221: 1216: 1215: 1208: 1207: 1196: 1195: 1182: 1181: 1170: 1169: 1142: 1141: 1134: 1133: 1122: 1121: 1108: 1107: 1096: 1095: 1068: 1067: 1060: 1059: 1048: 1047: 1034: 1033: 1022: 1021: 973: 972: 971: 959: 957: 955: 954: 953: 952: 941: 938: 922: 906: 885: 865: 849: 845: 839: 832: 816: 812: 805: 789: 785: 778: 747: 716: 682: 678: 674: 667: 663: 659: 652: 648: 644: 638: 636: 635: 619: 613: 609: 606:Input: matrices 599: 598: 597: 588: 587: 586: 571: 567: 559: 551: 547: 544:and a column of 543: 539: 538: 536: 535: 530: 527: 513: 509: 505: 503: 502: 497: 494: 485: 481: 451:Illustration of 438: 430: 412: 396: 386: 372: 352: 348: 342: 335: 331: 324: 320: 313: 307: 303: 300:Input: matrices 293: 289: 285: 281: 274: 272: 271: 266: 261: 260: 248: 247: 234: 229: 211: 210: 188: 178: 174: 170: 160: 156: 146: 120: 108: 84: 76: 63: 58:on the order of 6929: 6928: 6924: 6923: 6922: 6920: 6919: 6918: 6894: 6893: 6892: 6887: 6846: 6842:Multiprocessing 6810: 6806:Sparse problems 6769: 6748: 6743: 6713: 6694:10.1145/2764454 6656:10.1.1.140.3583 6593: 6591:Further reading 6588: 6555: 6551: 6534: 6530: 6497: 6493: 6464: 6460: 6450: 6448: 6427: 6421: 6417: 6410: 6387: 6381: 6377: 6348:IBM J. Res. Dev 6344: 6337: 6304: 6295: 6287: 6260: 6254: 6250: 6237: 6233: 6217: 6209: 6202: 6192: 6190: 6188:Quanta Magazine 6180: 6176: 6159: 6155: 6108:(7930): 47–53. 6093: 6086: 6077: 6075: 6066: 6065: 6061: 6049: 6035: 6029: 6025: 6013: 6011: 6007: 5993:10.1137/0218045 5984:10.1.1.531.9309 5966: 5960: 5956: 5931: 5925: 5921: 5904: 5900: 5893:10.1137/0205016 5877: 5873: 5850: 5846: 5823: 5819: 5790: 5786: 5755: 5751: 5744: 5716: 5712: 5706:10.1137/0204009 5697:10.1.1.148.9947 5680: 5676: 5660: 5649: 5640: 5633: 5611: 5607: 5597: 5595: 5584: 5571: 5564: 5548:Stein, Clifford 5533: 5524: 5515: 5513: 5511:Quanta Magazine 5503: 5499: 5477: 5470: 5450: 5446: 5426: 5419: 5412: 5379: 5368: 5364: 5327: 5284: 5268: 5262: 5247: 5227: 5218: 5216: 5207: 5199: 5197: 5196: 5188: 5186: 5185: 5181: 5170: 5161: 5159: 5150: 5144: 5135: 5133: 5132: 5123: 5121: 5120: 5098: 5090: 5080: 5076: 5072: 5057: 5053: 5041: 5029: 5018: 5007: 5006: 4990: 4983: 4976: 4967: 4960: 4953: 4944: 4937: 4930: 4921: 4914: 4907: 4896: 4890: 4887: 4881: 4878: 4872: 4869: 4863: 4859: 4853: 4847: 4844: 4838: 4835: 4829: 4826: 4820: 4816: 4807: 4800: 4793: 4787: 4780: 4772: 4768: 4756: 4744: 4729: 4715: 4708: 4701: 4694: 4685: 4678: 4671: 4664: 4655: 4648: 4641: 4634: 4625: 4618: 4611: 4604: 4595: 4588: 4581: 4574: 4565: 4558: 4551: 4544: 4535: 4528: 4521: 4514: 4505: 4498: 4491: 4484: 4473: 4467: 4464: 4458: 4455: 4449: 4446: 4440: 4436: 4430: 4424: 4421: 4415: 4412: 4406: 4403: 4397: 4393: 4387: 4381: 4378: 4372: 4369: 4363: 4360: 4354: 4350: 4344: 4338: 4335: 4329: 4326: 4320: 4317: 4311: 4307: 4294: 4290: 4284: 4277: 4270: 4264: 4257: 4237: 4224:fork–join style 4201: 4200: 4194: 4190: 4184: 4180: 4171: 4167: 4161: 4157: 4155: 4149: 4145: 4139: 4135: 4126: 4122: 4116: 4112: 4109: 4108: 4102: 4098: 4092: 4088: 4079: 4075: 4069: 4065: 4063: 4057: 4053: 4047: 4043: 4034: 4030: 4024: 4020: 4013: 4012: 4010: 4007: 4006: 3988: 3983: 3975:remains unknown 3957: 3949: 3944: 3942: 3939: 3938: 3924: 3909: 3901: 3897: 3893: 3889: 3869: 3859: 3856:Shmuel Winograd 3852:Don Coppersmith 3825: 3822: 3821: 3805: 3802: 3801: 3769: 3765: 3763: 3760: 3759: 3737: 3734: 3733: 3717: 3714: 3713: 3650: 3649: 3647: 3643: 3634: 3630: 3621: 3617: 3605: 3601: 3575: 3571: 3570: 3548: 3547: 3545: 3541: 3540: 3535: 3532: 3531: 3507: 3503: 3497: 3493: 3481: 3477: 3457: 3453: 3452: 3448: 3443: 3440: 3439: 3382: 3381: 3379: 3375: 3366: 3362: 3353: 3349: 3337: 3333: 3307: 3303: 3302: 3280: 3279: 3277: 3273: 3272: 3267: 3264: 3263: 3239: 3235: 3229: 3225: 3213: 3209: 3189: 3185: 3184: 3180: 3175: 3172: 3171: 3130: 3126: 3100: 3096: 3095: 3073: 3072: 3070: 3066: 3065: 3060: 3057: 3056: 3038: 3034: 3014: 3010: 3009: 3005: 3000: 2997: 2996: 2955: 2951: 2925: 2921: 2920: 2898: 2897: 2895: 2891: 2890: 2885: 2882: 2881: 2863: 2859: 2839: 2835: 2834: 2830: 2825: 2822: 2821: 2765: 2762: 2761: 2745: 2742: 2741: 2719: 2695: 2691: 2665: 2661: 2660: 2656: 2648: 2645: 2644: 2640: 2637:Volker Strassen 2608: 2604: 2596: 2593: 2592: 2588: 2581: 2575: 2544: 2540: 2529: 2527: 2493: 2491: 2472: 2468: 2463: 2460: 2459: 2452: 2448: 2437:cache-oblivious 2429: 2424: 2423: 2403: 2399: 2393: 2389: 2380: 2376: 2370: 2366: 2356: 2355: 2349: 2345: 2342: 2341: 2335: 2331: 2324: 2323: 2316: 2315: 2309: 2305: 2303: 2297: 2293: 2286: 2285: 2277: 2274: 2273: 2261: 2260:vertically and 2257: 2238: 2212: 2211: 2205: 2201: 2196: 2190: 2186: 2176: 2175: 2165: 2164: 2158: 2154: 2152: 2146: 2142: 2135: 2134: 2123: 2120: 2119: 2107: 2088: 2062: 2061: 2052: 2048: 2045: 2044: 2035: 2031: 2024: 2023: 2015: 2008: 2007: 2001: 1997: 1994: 1993: 1987: 1983: 1976: 1975: 1967: 1964: 1963: 1951: 1932: 1902: 1889: 1885: 1875: 1871: 1861: 1848: 1836: 1829: 1803: 1799: 1779: 1750: 1747: 1746: 1702: 1699: 1698: 1691: 1685: 1679: 1672: 1666: 1638: 1637: 1631: 1627: 1621: 1617: 1608: 1604: 1598: 1594: 1592: 1586: 1582: 1576: 1572: 1563: 1559: 1553: 1549: 1546: 1545: 1539: 1535: 1529: 1525: 1516: 1512: 1506: 1502: 1500: 1494: 1490: 1484: 1480: 1471: 1467: 1461: 1457: 1450: 1449: 1439: 1438: 1432: 1428: 1426: 1420: 1416: 1413: 1412: 1406: 1402: 1400: 1394: 1390: 1383: 1382: 1375: 1374: 1368: 1364: 1362: 1356: 1352: 1349: 1348: 1342: 1338: 1336: 1330: 1326: 1319: 1318: 1308: 1307: 1301: 1297: 1295: 1289: 1285: 1282: 1281: 1275: 1271: 1269: 1263: 1259: 1252: 1251: 1249: 1246: 1245: 1238: 1234: 1210: 1209: 1203: 1199: 1197: 1191: 1187: 1184: 1183: 1177: 1173: 1171: 1165: 1161: 1154: 1153: 1136: 1135: 1129: 1125: 1123: 1117: 1113: 1110: 1109: 1103: 1099: 1097: 1091: 1087: 1080: 1079: 1062: 1061: 1055: 1051: 1049: 1043: 1039: 1036: 1035: 1029: 1025: 1023: 1017: 1013: 1006: 1005: 997: 994: 993: 980: 967: 965: 961: 948: 946: 942: 939: 934: 933: 931: 929: 926: 925: 920: 903: 896: 891: 883: 876: 870: 851: 847: 843: 837: 818: 814: 810: 791: 787: 783: 777: 749: 746: 718: 715: 687: 680: 676: 672: 665: 661: 657: 650: 646: 642: 631: 629: 624: 617: 611: 607: 593: 591: 590: 582: 580: 579: 569: 565: 553: 549: 545: 541: 531: 528: 523: 522: 520: 515: 511: 507: 498: 495: 490: 489: 487: 483: 479: 445: 432: 422: 406: 400: 399: 394: 383: 378: 370: 363: 357: 350: 346: 340: 333: 329: 322: 318: 311: 305: 301: 291: 290:from 1 through 287: 283: 282:from 1 through 279: 253: 249: 240: 236: 230: 219: 203: 199: 197: 194: 193: 180: 176: 172: 162: 158: 148: 138: 131: 114: 109:time, given by 102: 78: 68: 59: 17: 12: 11: 5: 6927: 6917: 6916: 6911: 6906: 6889: 6888: 6886: 6885: 6880: 6875: 6870: 6865: 6860: 6854: 6852: 6848: 6847: 6845: 6844: 6839: 6834: 6829: 6824: 6818: 6816: 6812: 6811: 6809: 6808: 6803: 6798: 6788: 6783: 6777: 6775: 6771: 6770: 6768: 6767: 6762: 6760:Floating point 6756: 6754: 6750: 6749: 6742: 6741: 6734: 6727: 6719: 6712: 6711: 6706: 6677: 6638: 6600:Dongarra, Jack 6594: 6592: 6589: 6587: 6586: 6572:10.1.1.90.4753 6565:(3): 347–365. 6549: 6528: 6514:10.1.1.88.8527 6491: 6458: 6415: 6408: 6375: 6361:10.1.1.44.3404 6354:(5): 575–582. 6335: 6321:10.1.1.20.7034 6314:(9): 1017–26. 6293: 6248: 6231: 6200: 6174: 6153: 6084: 6059: 6023: 5977:(4): 658–669, 5954: 5919: 5898: 5887:(2): 187–203. 5881:SIAM J. Comput 5871: 5844: 5817: 5804:(4): 381–388. 5784: 5765:(4): 354–356. 5749: 5742: 5710: 5674: 5652:Prokop, Harald 5638: 5631: 5605: 5569: 5562: 5522: 5497: 5468: 5444: 5417: 5410: 5382:Skiena, Steven 5365: 5363: 5360: 5359: 5358: 5353: 5348: 5343: 5338: 5333: 5326: 5323: 5283: 5280: 5177:computation). 5089: 5086: 5005: 5004: 5003: 5002: 4996: 4995: 4994: 4988: 4981: 4971: 4965: 4958: 4948: 4942: 4935: 4925: 4919: 4912: 4899: 4894: 4885: 4876: 4867: 4856: 4851: 4842: 4833: 4824: 4810: 4805: 4798: 4791: 4779:Base case: if 4751: 4750: 4749: 4748: 4741: 4727: 4721: 4720: 4719: 4713: 4706: 4699: 4689: 4683: 4676: 4669: 4659: 4653: 4646: 4639: 4629: 4623: 4616: 4609: 4599: 4593: 4586: 4579: 4569: 4563: 4556: 4549: 4539: 4533: 4526: 4519: 4509: 4503: 4496: 4489: 4476: 4471: 4462: 4453: 4444: 4433: 4428: 4419: 4410: 4401: 4390: 4385: 4376: 4367: 4358: 4347: 4342: 4333: 4324: 4315: 4287: 4282: 4275: 4268: 4256:Base case: if 4232: 4231: 4219: 4218: 4205: 4197: 4193: 4187: 4183: 4179: 4174: 4170: 4164: 4160: 4156: 4152: 4148: 4142: 4138: 4134: 4129: 4125: 4119: 4115: 4111: 4110: 4105: 4101: 4095: 4091: 4087: 4082: 4078: 4072: 4068: 4064: 4060: 4056: 4050: 4046: 4042: 4037: 4033: 4027: 4023: 4019: 4018: 4016: 3987: 3984: 3982: 3979: 3960: 3956: 3952: 3947: 3932:neural network 3923: 3920: 3900:, verifies in 3876:Big O notation 3835: 3832: 3829: 3809: 3787: 3784: 3781: 3778: 3775: 3772: 3768: 3747: 3744: 3741: 3721: 3692: 3691: 3680: 3677: 3674: 3670: 3664: 3659: 3654: 3646: 3642: 3637: 3633: 3629: 3624: 3620: 3616: 3613: 3608: 3604: 3600: 3597: 3594: 3591: 3586: 3583: 3578: 3574: 3568: 3562: 3557: 3552: 3544: 3539: 3529: 3518: 3515: 3510: 3506: 3500: 3496: 3492: 3489: 3484: 3480: 3476: 3473: 3468: 3465: 3460: 3456: 3451: 3447: 3437: 3434: 3431: 3428: 3424: 3423: 3412: 3409: 3406: 3402: 3396: 3391: 3386: 3378: 3374: 3369: 3365: 3361: 3356: 3352: 3348: 3345: 3340: 3336: 3332: 3329: 3326: 3323: 3318: 3315: 3310: 3306: 3300: 3294: 3289: 3284: 3276: 3271: 3261: 3250: 3247: 3242: 3238: 3232: 3228: 3224: 3221: 3216: 3212: 3208: 3205: 3200: 3197: 3192: 3188: 3183: 3179: 3169: 3166: 3163: 3160: 3156: 3155: 3144: 3141: 3138: 3133: 3129: 3125: 3122: 3119: 3116: 3111: 3108: 3103: 3099: 3093: 3087: 3082: 3077: 3069: 3064: 3054: 3041: 3037: 3033: 3030: 3025: 3022: 3017: 3013: 3008: 3004: 2994: 2991: 2988: 2985: 2981: 2980: 2969: 2966: 2963: 2958: 2954: 2950: 2947: 2944: 2941: 2936: 2933: 2928: 2924: 2918: 2912: 2907: 2902: 2894: 2889: 2879: 2866: 2862: 2858: 2855: 2850: 2847: 2842: 2838: 2833: 2829: 2819: 2816: 2813: 2810: 2806: 2805: 2802: 2799: 2796: 2793: 2790: 2769: 2749: 2738:big O notation 2703: 2698: 2694: 2690: 2687: 2684: 2681: 2676: 2673: 2668: 2664: 2659: 2655: 2652: 2616: 2611: 2607: 2603: 2600: 2574: 2571: 2570: 2569: 2557: 2548: 2543: 2538: 2535: 2532: 2526: 2521: 2517: 2514: 2511: 2508: 2505: 2502: 2499: 2496: 2490: 2487: 2484: 2481: 2478: 2475: 2471: 2467: 2428: 2427:Cache behavior 2425: 2422: 2421: 2420: 2419: 2406: 2402: 2396: 2392: 2388: 2383: 2379: 2373: 2369: 2365: 2360: 2352: 2348: 2344: 2343: 2338: 2334: 2330: 2329: 2327: 2320: 2312: 2308: 2304: 2300: 2296: 2292: 2291: 2289: 2284: 2281: 2268: 2267: 2266: 2265: 2232: 2231: 2230: 2229: 2216: 2208: 2204: 2200: 2197: 2193: 2189: 2185: 2182: 2181: 2179: 2174: 2169: 2161: 2157: 2153: 2149: 2145: 2141: 2140: 2138: 2133: 2130: 2127: 2114: 2113: 2112: 2111: 2082: 2081: 2080: 2079: 2066: 2060: 2055: 2051: 2047: 2046: 2043: 2038: 2034: 2030: 2029: 2027: 2022: 2018: 2012: 2004: 2000: 1996: 1995: 1990: 1986: 1982: 1981: 1979: 1974: 1971: 1958: 1957: 1956: 1955: 1926: 1925: 1922: 1901:Base case: if 1899: 1867: 1866: 1860: 1857: 1826: 1825: 1814: 1811: 1806: 1802: 1798: 1795: 1792: 1789: 1786: 1782: 1778: 1775: 1772: 1769: 1766: 1763: 1760: 1757: 1754: 1744: 1733: 1730: 1727: 1724: 1721: 1718: 1715: 1712: 1709: 1706: 1683: 1677: 1670: 1656: 1655: 1642: 1634: 1630: 1624: 1620: 1616: 1611: 1607: 1601: 1597: 1593: 1589: 1585: 1579: 1575: 1571: 1566: 1562: 1556: 1552: 1548: 1547: 1542: 1538: 1532: 1528: 1524: 1519: 1515: 1509: 1505: 1501: 1497: 1493: 1487: 1483: 1479: 1474: 1470: 1464: 1460: 1456: 1455: 1453: 1448: 1443: 1435: 1431: 1427: 1423: 1419: 1415: 1414: 1409: 1405: 1401: 1397: 1393: 1389: 1388: 1386: 1379: 1371: 1367: 1363: 1359: 1355: 1351: 1350: 1345: 1341: 1337: 1333: 1329: 1325: 1324: 1322: 1317: 1312: 1304: 1300: 1296: 1292: 1288: 1284: 1283: 1278: 1274: 1270: 1266: 1262: 1258: 1257: 1255: 1231: 1230: 1219: 1214: 1206: 1202: 1198: 1194: 1190: 1186: 1185: 1180: 1176: 1172: 1168: 1164: 1160: 1159: 1157: 1152: 1149: 1145: 1140: 1132: 1128: 1124: 1120: 1116: 1112: 1111: 1106: 1102: 1098: 1094: 1090: 1086: 1085: 1083: 1078: 1075: 1071: 1066: 1058: 1054: 1050: 1046: 1042: 1038: 1037: 1032: 1028: 1024: 1020: 1016: 1012: 1011: 1009: 1004: 1001: 979: 976: 924: 923: 917: 916: 915: 914: 913: 912: 911: 910: 909: 908: 907: 901: 894: 888: 887: 886: 881: 874: 840: 780: 753: 722: 691: 639: 621: 614: 603: 602: 478:consisting of 444: 443:Cache behavior 441: 398: 397: 391: 390: 389: 388: 387: 381: 375: 374: 373: 368: 361: 343: 315: 308: 297: 296: 276: 275: 264: 259: 256: 252: 246: 243: 239: 233: 228: 225: 222: 218: 214: 209: 206: 202: 130: 127: 87:big O notation 15: 9: 6: 4: 3: 2: 6926: 6915: 6912: 6910: 6909:Matrix theory 6907: 6905: 6902: 6901: 6899: 6884: 6881: 6879: 6876: 6874: 6871: 6869: 6866: 6864: 6861: 6859: 6856: 6855: 6853: 6849: 6843: 6840: 6838: 6835: 6833: 6830: 6828: 6825: 6823: 6820: 6819: 6817: 6813: 6807: 6804: 6802: 6799: 6796: 6792: 6789: 6787: 6784: 6782: 6779: 6778: 6776: 6772: 6766: 6763: 6761: 6758: 6757: 6755: 6751: 6747: 6740: 6735: 6733: 6728: 6726: 6721: 6720: 6717: 6710: 6707: 6703: 6699: 6695: 6691: 6687: 6683: 6678: 6674: 6670: 6666: 6662: 6657: 6652: 6648: 6644: 6639: 6635: 6631: 6627: 6623: 6618: 6613: 6609: 6605: 6601: 6596: 6595: 6582: 6578: 6573: 6568: 6564: 6560: 6553: 6544: 6539: 6532: 6524: 6520: 6515: 6510: 6506: 6502: 6495: 6486: 6481: 6477: 6473: 6469: 6462: 6447: 6443: 6438: 6433: 6426: 6419: 6411: 6405: 6401: 6397: 6393: 6386: 6379: 6371: 6367: 6362: 6357: 6353: 6349: 6342: 6340: 6331: 6327: 6322: 6317: 6313: 6309: 6302: 6300: 6298: 6286: 6282: 6278: 6274: 6270: 6266: 6259: 6252: 6244: 6243: 6235: 6227: 6223: 6216: 6215: 6207: 6205: 6189: 6185: 6178: 6169: 6164: 6157: 6149: 6145: 6140: 6135: 6131: 6127: 6123: 6119: 6115: 6111: 6107: 6103: 6099: 6091: 6089: 6073: 6069: 6063: 6056: 6052: 6045: 6041: 6034: 6027: 6020: 6010:on 2014-03-05 6006: 6002: 5998: 5994: 5990: 5985: 5980: 5976: 5972: 5965: 5958: 5950: 5945: 5941: 5937: 5930: 5923: 5914: 5909: 5902: 5894: 5890: 5886: 5882: 5875: 5867: 5863: 5859: 5855: 5848: 5840: 5836: 5832: 5828: 5821: 5812: 5807: 5803: 5799: 5795: 5788: 5780: 5776: 5772: 5768: 5764: 5760: 5753: 5745: 5739: 5735: 5731: 5727: 5726: 5721: 5714: 5707: 5703: 5698: 5693: 5690:(2): 97–107, 5689: 5685: 5678: 5670: 5666: 5659: 5658: 5653: 5647: 5645: 5643: 5634: 5628: 5624: 5620: 5616: 5609: 5593: 5589: 5582: 5580: 5578: 5576: 5574: 5565: 5563:0-262-03384-4 5559: 5555: 5554: 5549: 5545: 5541: 5537: 5531: 5529: 5527: 5512: 5508: 5501: 5493: 5488: 5484: 5483: 5475: 5473: 5464: 5459: 5455: 5448: 5440: 5435: 5431: 5424: 5422: 5413: 5407: 5403: 5399: 5395: 5390: 5389: 5383: 5377: 5375: 5373: 5371: 5366: 5357: 5354: 5352: 5349: 5347: 5344: 5342: 5339: 5337: 5334: 5332: 5329: 5328: 5322: 5320: 5315: 5313: 5309: 5305: 5301: 5297: 5288: 5279: 5277: 5271: 5265: 5258: 5254: 5250: 5244: 5243:3D algorithm, 5238: 5234: 5230: 5221: 5214: 5210: 5202: 5191: 5178: 5174: 5164: 5157: 5153: 5147: 5138: 5126: 5118: 5114: 5110: 5106: 5102: 5096: 5069: 5065: 5061: 5049: 5045: 5039: 5033: 5027: 5022: 5016: 5012: 5000: 4997: 4987: 4980: 4975: 4972: 4964: 4957: 4952: 4949: 4941: 4934: 4929: 4926: 4918: 4911: 4906: 4903: 4902: 4901:In parallel: 4900: 4893: 4884: 4875: 4866: 4857: 4850: 4841: 4832: 4823: 4814: 4813: 4811: 4804: 4797: 4790: 4783: 4778: 4777: 4776: 4764: 4760: 4755: 4742: 4737: 4733: 4728: 4725: 4722: 4712: 4705: 4698: 4693: 4690: 4682: 4675: 4668: 4663: 4660: 4652: 4645: 4638: 4633: 4630: 4622: 4615: 4608: 4603: 4600: 4592: 4585: 4578: 4573: 4570: 4562: 4555: 4548: 4543: 4540: 4532: 4525: 4518: 4513: 4510: 4502: 4495: 4488: 4483: 4480: 4479: 4477: 4470: 4461: 4452: 4443: 4434: 4427: 4418: 4409: 4400: 4391: 4384: 4375: 4366: 4357: 4348: 4341: 4332: 4323: 4314: 4305: 4304: 4301: 4297: 4288: 4281: 4274: 4267: 4260: 4255: 4254: 4253: 4249: 4245: 4241: 4236: 4230: 4228: 4225: 4203: 4195: 4191: 4185: 4181: 4177: 4172: 4168: 4162: 4158: 4150: 4146: 4140: 4136: 4132: 4127: 4123: 4117: 4113: 4103: 4099: 4093: 4089: 4085: 4080: 4076: 4070: 4066: 4058: 4054: 4048: 4044: 4040: 4035: 4031: 4025: 4021: 4014: 4005: 4004: 4003: 4001: 3997: 3993: 3978: 3976: 3972: 3954: 3950: 3937:finite field 3933: 3929: 3919: 3916: 3912: 3905: 3887: 3883: 3879: 3877: 3872: 3866: 3862: 3857: 3853: 3849: 3833: 3830: 3827: 3807: 3782: 3776: 3773: 3770: 3766: 3745: 3742: 3739: 3719: 3711: 3707: 3703: 3698: 3678: 3675: 3672: 3668: 3662: 3657: 3652: 3644: 3640: 3635: 3631: 3627: 3622: 3618: 3614: 3611: 3606: 3602: 3598: 3595: 3592: 3589: 3584: 3581: 3576: 3572: 3566: 3560: 3555: 3550: 3542: 3537: 3530: 3516: 3513: 3508: 3504: 3498: 3494: 3490: 3487: 3482: 3478: 3474: 3471: 3466: 3463: 3458: 3454: 3449: 3445: 3438: 3435: 3432: 3429: 3426: 3425: 3410: 3407: 3404: 3400: 3394: 3389: 3384: 3376: 3372: 3367: 3363: 3359: 3354: 3350: 3346: 3343: 3338: 3334: 3330: 3327: 3324: 3321: 3316: 3313: 3308: 3304: 3298: 3292: 3287: 3282: 3274: 3269: 3262: 3248: 3245: 3240: 3236: 3230: 3226: 3222: 3219: 3214: 3210: 3206: 3203: 3198: 3195: 3190: 3186: 3181: 3177: 3170: 3167: 3164: 3161: 3158: 3157: 3142: 3139: 3136: 3131: 3127: 3123: 3120: 3117: 3114: 3109: 3106: 3101: 3097: 3091: 3085: 3080: 3075: 3067: 3062: 3055: 3039: 3035: 3031: 3028: 3023: 3020: 3015: 3011: 3006: 3002: 2995: 2992: 2989: 2986: 2983: 2982: 2967: 2964: 2961: 2956: 2952: 2948: 2945: 2942: 2939: 2934: 2931: 2926: 2922: 2916: 2910: 2905: 2900: 2892: 2887: 2880: 2864: 2860: 2856: 2853: 2848: 2845: 2840: 2836: 2831: 2827: 2820: 2817: 2814: 2811: 2808: 2807: 2803: 2800: 2797: 2794: 2791: 2788: 2787: 2781: 2767: 2747: 2739: 2734: 2732: 2731:finite fields 2728: 2722: 2717: 2696: 2692: 2685: 2682: 2674: 2671: 2666: 2662: 2657: 2650: 2638: 2635:, devised by 2634: 2609: 2605: 2598: 2585: 2580: 2555: 2546: 2541: 2536: 2533: 2530: 2524: 2519: 2515: 2512: 2509: 2506: 2503: 2500: 2497: 2494: 2488: 2485: 2482: 2479: 2476: 2473: 2469: 2458: 2457: 2456: 2445: 2442: 2438: 2434: 2404: 2400: 2394: 2390: 2386: 2381: 2377: 2371: 2367: 2363: 2358: 2350: 2346: 2336: 2332: 2325: 2318: 2310: 2306: 2298: 2294: 2287: 2282: 2279: 2272: 2271: 2270: 2269: 2264:horizontally: 2254: 2250: 2246: 2242: 2236: 2235: 2234: 2233: 2214: 2206: 2202: 2198: 2191: 2187: 2183: 2177: 2172: 2167: 2159: 2155: 2147: 2143: 2136: 2131: 2128: 2125: 2118: 2117: 2116: 2115: 2104: 2100: 2096: 2092: 2086: 2085: 2084: 2083: 2064: 2058: 2053: 2049: 2041: 2036: 2032: 2025: 2020: 2016: 2010: 2002: 1998: 1988: 1984: 1977: 1972: 1969: 1962: 1961: 1960: 1959: 1954:horizontally: 1948: 1944: 1940: 1936: 1930: 1929: 1928: 1927: 1923: 1920: 1914: 1910: 1906: 1900: 1896: 1892: 1882: 1878: 1869: 1868: 1865: 1856: 1852: 1846: 1840: 1832: 1812: 1804: 1800: 1790: 1784: 1780: 1776: 1770: 1767: 1764: 1758: 1752: 1745: 1731: 1725: 1716: 1710: 1704: 1697: 1696: 1695: 1688: 1682: 1676: 1669: 1665: 1661: 1640: 1632: 1628: 1622: 1618: 1614: 1609: 1605: 1599: 1595: 1587: 1583: 1577: 1573: 1569: 1564: 1560: 1554: 1550: 1540: 1536: 1530: 1526: 1522: 1517: 1513: 1507: 1503: 1495: 1491: 1485: 1481: 1477: 1472: 1468: 1462: 1458: 1451: 1446: 1441: 1433: 1429: 1421: 1417: 1407: 1403: 1395: 1391: 1384: 1377: 1369: 1365: 1357: 1353: 1343: 1339: 1331: 1327: 1320: 1315: 1310: 1302: 1298: 1290: 1286: 1276: 1272: 1264: 1260: 1253: 1244: 1243: 1242: 1217: 1212: 1204: 1200: 1192: 1188: 1178: 1174: 1166: 1162: 1155: 1150: 1147: 1143: 1138: 1130: 1126: 1118: 1114: 1104: 1100: 1092: 1088: 1081: 1076: 1073: 1069: 1064: 1056: 1052: 1044: 1040: 1030: 1026: 1018: 1014: 1007: 1002: 999: 992: 991: 990: 989: 985: 975: 970: 964: 951: 945: 937: 918: 904: 897: 889: 884: 877: 868: 867: 863: 859: 855: 841: 835: 834: 830: 826: 822: 808: 807: 803: 799: 795: 781: 776: 772: 768: 764: 760: 756: 752: 745: 741: 737: 733: 729: 725: 721: 714: 710: 706: 702: 698: 694: 690: 685: 684: 670: 669: 655: 654: 640: 634: 627: 622: 615: 605: 604: 601: 596: 585: 577: 576: 562: 557: 534: 526: 518: 501: 493: 477: 472: 470: 466: 462: 454: 449: 440: 436: 429: 425: 420: 416: 410: 405: 392: 384: 376: 371: 364: 355: 354: 344: 338: 337: 327: 326: 316: 309: 299: 298: 295: 262: 257: 254: 250: 244: 241: 237: 231: 226: 223: 220: 216: 212: 207: 204: 200: 192: 191: 190: 187: 183: 169: 165: 155: 151: 145: 141: 136: 126: 124: 118: 112: 106: 100: 96: 92: 88: 82: 75: 71: 66: 62: 57: 52: 50: 46: 42: 38: 34: 30: 26: 22: 6794: 6753:Key concepts 6685: 6681: 6646: 6642: 6607: 6603: 6562: 6558: 6552: 6531: 6507:(3): 383–5. 6504: 6500: 6494: 6475: 6471: 6461: 6449:. Retrieved 6418: 6391: 6378: 6351: 6347: 6311: 6307: 6264: 6251: 6241: 6234: 6226:1721.1/47519 6213: 6191:. Retrieved 6187: 6177: 6156: 6105: 6101: 6076:. Retrieved 6071: 6062: 6050: 6047: 6043: 6039: 6026: 6018: 6012:, retrieved 6005:the original 5974: 5970: 5957: 5939: 5935: 5922: 5901: 5884: 5880: 5874: 5857: 5847: 5830: 5820: 5801: 5797: 5787: 5762: 5758: 5752: 5724: 5713: 5687: 5683: 5677: 5669:1721.1/80568 5656: 5614: 5608: 5596:. Retrieved 5591: 5552: 5514:. Retrieved 5510: 5500: 5481: 5453: 5447: 5429: 5396:–46, 401–3. 5387: 5318: 5316: 5311: 5303: 5299: 5293: 5269: 5263: 5256: 5252: 5248: 5242: 5236: 5232: 5228: 5219: 5212: 5208: 5200: 5189: 5179: 5172: 5162: 5155: 5151: 5145: 5136: 5124: 5113:2D algorithm 5112: 5107: 5100: 5094: 5091: 5059: 5047: 5043: 5031: 5023: 5014: 5010: 5008: 4998: 4985: 4978: 4973: 4962: 4955: 4950: 4939: 4932: 4927: 4916: 4909: 4904: 4891: 4882: 4873: 4864: 4848: 4839: 4830: 4821: 4802: 4795: 4788: 4781: 4762: 4758: 4753: 4752: 4735: 4731: 4723: 4710: 4703: 4696: 4691: 4680: 4673: 4666: 4661: 4650: 4643: 4636: 4631: 4620: 4613: 4606: 4601: 4590: 4583: 4576: 4571: 4560: 4553: 4546: 4541: 4530: 4523: 4516: 4511: 4500: 4493: 4486: 4481: 4468: 4459: 4450: 4441: 4425: 4416: 4407: 4398: 4382: 4373: 4364: 4355: 4339: 4330: 4321: 4312: 4299: 4295: 4279: 4272: 4265: 4258: 4247: 4243: 4239: 4234: 4233: 4220: 3996:parallelized 3989: 3925: 3914: 3910: 3903: 3884:is a simple 3880: 3870: 3864: 3860: 3709: 3699: 3695: 2735: 2720: 2630: 2446: 2430: 2252: 2248: 2244: 2240: 2102: 2098: 2094: 2090: 1946: 1942: 1938: 1934: 1912: 1908: 1904: 1894: 1890: 1880: 1876: 1864:dimensions. 1862: 1850: 1838: 1830: 1827: 1689: 1680: 1674: 1667: 1662:, using the 1657: 1232: 981: 968: 962: 949: 943: 935: 927: 899: 892: 879: 872: 871:sum ← sum + 861: 857: 853: 828: 824: 820: 801: 797: 793: 774: 770: 766: 762: 758: 754: 750: 743: 739: 735: 731: 727: 723: 719: 712: 708: 704: 700: 696: 692: 688: 679:in steps of 664:in steps of 649:in steps of 632: 625: 594: 583: 573: 563: 555: 532: 524: 516: 499: 491: 473: 457: 434: 427: 423: 408: 401: 379: 366: 359: 358:sum ← sum + 277: 185: 181: 167: 163: 153: 149: 143: 139: 132: 116: 104: 80: 73: 69: 60: 53: 28: 18: 6688:(3): 1–33. 6649:(3): 1–25. 6478:: 2230–40. 6193:26 November 5759:Numer. Math 4812:Otherwise: 4743:Deallocate 3922:AlphaTensor 2237:Otherwise, 2110:vertically: 1660:recursively 137:is that if 49:distributed 6898:Categories 6795:algorithms 6168:2212.01175 6078:2022-11-01 6014:2015-01-16 5942:(3): 251, 5913:2008.03759 5732:. p.  5598:27 January 5516:2021-04-01 5492:2010.05846 5463:2210.10173 5439:2307.07970 5362:References 4858:Partition 4815:Partition 4435:Partition 4392:Partition 4349:Partition 4306:Partition 4227:pseudocode 779:, that is: 675:from 1 to 660:from 1 to 645:from 1 to 482:bytes and 349:from 1 to 332:from 1 to 321:from 1 to 56:takes time 6822:CPU cache 6651:CiteSeerX 6617:0709.1272 6610:: 38–53. 6567:CiteSeerX 6543:1411.3273 6509:CiteSeerX 6437:1304.1467 6356:CiteSeerX 6316:CiteSeerX 6130:1476-4687 6040:SIAM News 5979:CiteSeerX 5779:121656251 5692:CiteSeerX 5684:SIAM News 5550:(2009) . 5276:MapReduce 5019:partition 4754:Procedure 4695:multiply( 4665:multiply( 4635:multiply( 4605:multiply( 4575:multiply( 4545:multiply( 4515:multiply( 4485:multiply( 4293:of shape 4238:multiply( 4235:Procedure 3926:In 2022, 3828:ω 3808:ω 3771:ω 3743:× 3720:ω 3641:⁡ 3628:⋅ 3596:− 3590:⋅ 3582:⁡ 3514:⁡ 3472:− 3464:⁡ 3373:⁡ 3360:⋅ 3328:− 3322:⋅ 3314:⁡ 3246:⁡ 3204:− 3196:⁡ 3121:− 3115:⋅ 3107:⁡ 3029:− 3021:⁡ 2946:− 2940:⋅ 2932:⁡ 2854:− 2846:⁡ 2792:Reference 2683:≈ 2672:⁡ 2610:ω 2466:Θ 2087:Else, if 1794:Θ 1720:Θ 1237:for some 686:Multiply 217:∑ 6851:Software 6815:Hardware 6774:Problems 6285:Archived 6148:36198780 5654:(1999). 5325:See also 5143:, where 4303:, then: 3928:DeepMind 3908:time if 3848:Williams 3834:2.371552 2987:Winograd 2812:Strassen 2723:> 100 2256:. Split 2106:, split 1950:, split 1919:unrolled 1888:of size 1874:of size 111:Williams 45:parallel 19:Because 6702:1242360 6673:9359223 6451:12 July 6442:Bibcode 6281:8410593 6139:9534758 6110:Bibcode 6001:1004789 5217:√ 5198:√ 5187:√ 5160:√ 5134:√ 5122:√ 5115:, is a 5038:speedup 966:√ 956:⁠ 947:√ 932:⁠ 919:Return 838:sum = 0 630:√ 592:√ 581:√ 537:⁠ 521:⁠ 504:⁠ 488:⁠ 393:Return 341:sum = 0 175:, then 171:matrix 161:and an 157:matrix 147:for an 6873:LAPACK 6863:MATLAB 6700:  6671:  6653:  6632:  6569:  6511:  6406:  6358:  6318:  6279:  6146:  6136:  6128:  6102:Nature 5999:  5981:  5777:  5740:  5694:  5629:  5560:  5408:  5296:meshes 5030:Θ(log 5009:Here, 4786:, set 4263:, set 3708:. The 179:is an 6858:ATLAS 6698:S2CID 6669:S2CID 6630:S2CID 6612:arXiv 6538:arXiv 6432:arXiv 6428:(PDF) 6388:(PDF) 6288:(PDF) 6277:S2CID 6261:(PDF) 6218:(PDF) 6163:arXiv 6046:(9), 6036:(PDF) 6008:(PDF) 5967:(PDF) 5932:(PDF) 5908:arXiv 5775:S2CID 5661:(PDF) 5487:arXiv 5458:arXiv 5434:arXiv 5046:/log 4862:into 4819:into 4771:into 4767:adds 4439:into 4396:into 4353:into 4310:into 3846:, by 2697:2.807 2641:2 × 2 2433:tiled 1235:2 × 2 905:+ sum 846:from 813:from 786:from 748:into 575:tiled 519:> 465:cache 385:← sum 65:field 41:graph 6837:SIMD 6453:2014 6404:ISBN 6195:2022 6144:PMID 6126:ISSN 5738:ISBN 5627:ISBN 5600:2015 5558:ISBN 5406:ISBN 5079:and 5015:join 5011:fork 4999:Join 4977:add( 4974:Fork 4954:add( 4951:Fork 4931:add( 4928:Fork 4908:add( 4905:Fork 4757:add( 4730:add( 4724:Join 4692:Fork 4662:Fork 4632:Fork 4602:Fork 4572:Fork 4542:Fork 4512:Fork 4482:Fork 3990:The 3896:and 3854:and 3831:< 3427:2023 3159:2017 2984:1971 2809:1969 2789:Year 2727:BLAS 2251:) = 2239:max( 2101:) = 2089:max( 1945:) = 1933:max( 1903:max( 1835:and 890:Set 869:Set 852:min( 842:For 836:Let 819:min( 809:For 792:min( 782:For 717:and 671:For 656:For 641:For 628:= Θ( 616:Let 610:and 568:and 510:and 463:and 413:(in 404:time 377:Set 356:Set 345:For 339:Let 328:For 317:For 310:Let 304:and 286:and 133:The 47:and 35:and 6827:TLB 6690:doi 6661:doi 6634:955 6622:doi 6577:doi 6519:doi 6480:doi 6396:doi 6366:doi 6326:doi 6269:doi 6222:hdl 6134:PMC 6118:doi 6106:610 6053:= 2 5989:doi 5944:doi 5889:doi 5862:doi 5835:doi 5806:doi 5767:doi 5734:108 5702:doi 5665:hdl 5619:doi 5398:doi 5195:by 5131:by 5040:of 5028:of 4784:= 1 4261:= 1 3820:is 3632:log 3615:1.5 3573:log 3505:log 3491:1.5 3455:log 3364:log 3305:log 3237:log 3187:log 3098:log 3012:log 2923:log 2837:log 2663:log 1931:If 850:to 817:to 790:to 589:by 409:nmp 85:in 6900:: 6696:. 6686:41 6684:. 6667:. 6659:. 6647:34 6645:. 6628:. 6620:. 6608:35 6606:. 6575:. 6563:45 6561:. 6517:. 6503:. 6476:29 6474:. 6470:. 6440:. 6430:. 6402:. 6390:. 6364:. 6352:39 6350:. 6338:^ 6324:. 6312:64 6310:. 6296:^ 6283:. 6275:. 6263:. 6203:^ 6186:. 6142:. 6132:. 6124:. 6116:. 6104:. 6100:. 6087:^ 6070:. 6044:38 6042:, 6038:, 6017:, 5997:MR 5995:, 5987:, 5975:18 5973:, 5969:, 5938:, 5934:, 5883:. 5856:. 5829:. 5800:. 5796:. 5773:. 5763:13 5761:. 5736:. 5700:, 5686:, 5641:^ 5625:. 5590:. 5572:^ 5546:; 5542:; 5538:; 5525:^ 5509:. 5485:, 5471:^ 5456:, 5432:, 5420:^ 5404:. 5394:45 5369:^ 5171:Ω( 5139:/3 5127:/3 5099:Ω( 5058:Θ( 5042:Θ( 4989:22 4984:, 4982:22 4966:21 4961:, 4959:21 4943:12 4938:, 4936:12 4920:11 4915:, 4913:11 4895:22 4889:, 4886:21 4880:, 4877:12 4871:, 4868:11 4852:22 4846:, 4843:21 4837:, 4834:12 4828:, 4825:11 4806:11 4801:+ 4799:11 4794:← 4792:11 4761:, 4734:, 4714:22 4709:, 4707:22 4702:, 4700:22 4684:21 4679:, 4677:22 4672:, 4670:21 4654:22 4649:, 4647:12 4642:, 4640:12 4624:21 4619:, 4617:12 4612:, 4610:11 4594:12 4589:, 4587:21 4582:, 4580:22 4564:11 4559:, 4557:21 4552:, 4550:21 4534:12 4529:, 4527:11 4522:, 4520:12 4504:11 4499:, 4497:11 4492:, 4490:11 4472:22 4466:, 4463:21 4457:, 4454:12 4448:, 4445:11 4429:22 4423:, 4420:21 4414:, 4411:12 4405:, 4402:11 4386:22 4380:, 4377:21 4371:, 4368:12 4362:, 4359:11 4343:22 4337:, 4334:21 4328:, 4325:12 4319:, 4316:11 4298:× 4283:11 4278:× 4276:11 4271:← 4269:11 4252:: 4246:, 4242:, 4229:: 4196:22 4186:22 4173:12 4163:21 4151:21 4141:22 4128:11 4118:21 4104:22 4094:12 4081:12 4071:11 4059:21 4049:12 4036:11 4026:11 3918:. 3913:= 3911:AB 3902:Θ( 3892:, 3863:× 3599:12 3436:12 3331:12 3168:12 3124:15 2993:15 2949:18 2818:18 2247:, 2243:, 2097:, 2093:, 1941:, 1937:, 1911:, 1907:, 1893:× 1884:, 1879:× 1849:Θ( 1837:Θ( 1833:/2 1684:11 1678:11 1673:= 1671:11 1633:22 1623:22 1610:12 1600:21 1588:21 1578:22 1565:11 1555:21 1541:22 1531:12 1518:12 1508:11 1496:21 1486:12 1473:11 1463:11 1434:22 1422:21 1408:12 1396:11 1370:22 1358:21 1344:12 1332:11 1303:22 1291:21 1277:12 1265:11 1205:22 1193:21 1179:12 1167:11 1131:22 1119:21 1105:12 1093:11 1057:22 1045:21 1031:12 1019:11 930:Θ( 902:ij 898:← 895:ij 882:kj 878:× 875:ik 866:: 860:, 856:+ 833:: 827:, 823:+ 806:: 800:, 796:+ 765:, 734:, 703:, 683:: 668:: 653:: 600:: 554:Θ( 433:Θ( 426:× 407:Θ( 382:ij 369:kj 365:× 362:ik 353:: 336:: 325:: 184:× 166:× 152:× 144:AB 142:= 115:O( 103:O( 79:Θ( 72:× 6797:) 6793:( 6738:e 6731:t 6724:v 6704:. 6692:: 6675:. 6663:: 6636:. 6624:: 6614:: 6583:. 6579:: 6546:. 6540:: 6525:. 6521:: 6505:6 6488:. 6482:: 6455:. 6444:: 6434:: 6412:. 6398:: 6372:. 6368:: 6332:. 6328:: 6271:: 6228:. 6224:: 6197:. 6171:. 6165:: 6150:. 6120:: 6112:: 6081:. 6051:ω 5991:: 5946:: 5940:9 5916:. 5910:: 5895:. 5891:: 5885:5 5868:. 5864:: 5841:. 5837:: 5814:. 5808:: 5802:4 5781:. 5769:: 5746:. 5704:: 5688:4 5671:. 5667:: 5635:. 5621:: 5602:. 5566:. 5519:. 5489:: 5460:: 5436:: 5414:. 5400:: 5319:n 5312:n 5304:n 5302:× 5300:n 5270:p 5264:p 5259:) 5257:p 5255:/ 5253:n 5251:( 5249:O 5239:) 5237:p 5235:/ 5233:n 5231:( 5229:O 5224:) 5220:p 5215:/ 5213:n 5211:( 5209:O 5201:p 5190:p 5182:p 5175:) 5173:n 5167:) 5163:M 5158:/ 5156:n 5154:( 5152:O 5146:M 5137:M 5125:M 5103:) 5101:n 5081:B 5077:A 5073:C 5062:) 5060:n 5054:T 5050:) 5048:n 5044:n 5034:) 5032:n 5001:. 4993:. 4991:) 4986:T 4979:C 4970:. 4968:) 4963:T 4956:C 4947:. 4945:) 4940:T 4933:C 4924:. 4922:) 4917:T 4910:C 4898:. 4892:T 4883:T 4874:T 4865:T 4860:T 4855:. 4849:C 4840:C 4831:C 4822:C 4817:C 4803:t 4796:c 4789:c 4782:n 4773:C 4769:T 4765:) 4763:T 4759:C 4747:. 4745:T 4740:. 4738:) 4736:T 4732:C 4718:. 4716:) 4711:B 4704:A 4697:T 4688:. 4686:) 4681:B 4674:A 4667:T 4658:. 4656:) 4651:B 4644:A 4637:T 4628:. 4626:) 4621:B 4614:A 4607:T 4598:. 4596:) 4591:B 4584:A 4577:C 4568:. 4566:) 4561:B 4554:A 4547:C 4538:. 4536:) 4531:B 4524:A 4517:C 4508:. 4506:) 4501:B 4494:A 4487:C 4475:. 4469:T 4460:T 4451:T 4442:T 4437:T 4432:. 4426:C 4417:C 4408:C 4399:C 4394:C 4389:. 4383:B 4374:B 4365:B 4356:B 4351:B 4346:. 4340:A 4331:A 4322:A 4313:A 4308:A 4300:n 4296:n 4291:T 4280:b 4273:a 4266:c 4259:n 4250:) 4248:B 4244:A 4240:C 4204:) 4192:B 4182:A 4178:+ 4169:B 4159:A 4147:B 4137:A 4133:+ 4124:B 4114:A 4100:B 4090:A 4086:+ 4077:B 4067:A 4055:B 4045:A 4041:+ 4032:B 4022:A 4015:( 3959:Z 3955:2 3951:/ 3946:Z 3915:C 3906:) 3904:n 3898:C 3894:B 3890:A 3871:k 3865:k 3861:k 3786:) 3783:1 3780:( 3777:o 3774:+ 3767:n 3746:n 3740:n 3679:M 3676:5 3673:+ 3669:) 3663:M 3658:n 3653:2 3645:( 3636:2 3623:2 3619:n 3612:+ 3607:2 3603:n 3593:M 3585:7 3577:2 3567:) 3561:M 3556:n 3551:3 3543:( 3538:4 3517:n 3509:2 3499:2 3495:n 3488:+ 3483:2 3479:n 3475:4 3467:7 3459:2 3450:n 3446:5 3433:7 3411:M 3408:5 3405:+ 3401:) 3395:M 3390:n 3385:2 3377:( 3368:2 3355:2 3351:n 3347:3 3344:+ 3339:2 3335:n 3325:M 3317:7 3309:2 3299:) 3293:M 3288:n 3283:3 3275:( 3270:4 3249:n 3241:2 3231:2 3227:n 3223:3 3220:+ 3215:2 3211:n 3207:4 3199:7 3191:2 3182:n 3178:5 3165:7 3143:M 3140:3 3137:+ 3132:2 3128:n 3118:M 3110:7 3102:2 3092:) 3086:M 3081:n 3076:3 3068:( 3063:5 3040:2 3036:n 3032:5 3024:7 3016:2 3007:n 3003:6 2990:7 2968:M 2965:3 2962:+ 2957:2 2953:n 2943:M 2935:7 2927:2 2917:) 2911:M 2906:n 2901:3 2893:( 2888:6 2865:2 2861:n 2857:6 2849:7 2841:2 2832:n 2828:7 2815:7 2768:M 2748:n 2721:n 2702:) 2693:n 2689:( 2686:O 2680:) 2675:7 2667:2 2658:n 2654:( 2651:O 2627:. 2615:) 2606:n 2602:( 2599:O 2589:ω 2556:) 2547:M 2542:b 2537:p 2534:n 2531:m 2525:+ 2520:b 2516:p 2513:m 2510:+ 2507:p 2504:n 2501:+ 2498:n 2495:m 2489:+ 2486:p 2483:+ 2480:n 2477:+ 2474:m 2470:( 2453:b 2449:M 2405:2 2401:B 2395:2 2391:A 2387:+ 2382:1 2378:B 2372:1 2368:A 2364:= 2359:) 2351:2 2347:B 2337:1 2333:B 2326:( 2319:) 2311:2 2307:A 2299:1 2295:A 2288:( 2283:= 2280:C 2262:B 2258:A 2253:m 2249:p 2245:m 2241:n 2215:) 2207:2 2203:B 2199:A 2192:1 2188:B 2184:A 2178:( 2173:= 2168:) 2160:2 2156:B 2148:1 2144:B 2137:( 2132:A 2129:= 2126:C 2108:B 2103:p 2099:p 2095:m 2091:n 2065:) 2059:B 2054:2 2050:A 2042:B 2037:1 2033:A 2026:( 2021:= 2017:B 2011:) 2003:2 1999:A 1989:1 1985:A 1978:( 1973:= 1970:C 1952:A 1947:n 1943:p 1939:m 1935:n 1915:) 1913:p 1909:m 1905:n 1898:. 1895:p 1891:m 1886:B 1881:m 1877:n 1872:A 1853:) 1851:n 1841:) 1839:n 1831:n 1813:, 1810:) 1805:2 1801:n 1797:( 1791:+ 1788:) 1785:2 1781:/ 1777:n 1774:( 1771:T 1768:8 1765:= 1762:) 1759:n 1756:( 1753:T 1732:; 1729:) 1726:1 1723:( 1717:= 1714:) 1711:1 1708:( 1705:T 1692:n 1681:b 1675:a 1668:c 1641:) 1629:B 1619:A 1615:+ 1606:B 1596:A 1584:B 1574:A 1570:+ 1561:B 1551:A 1537:B 1527:A 1523:+ 1514:B 1504:A 1492:B 1482:A 1478:+ 1469:B 1459:A 1452:( 1447:= 1442:) 1430:B 1418:B 1404:B 1392:B 1385:( 1378:) 1366:A 1354:A 1340:A 1328:A 1321:( 1316:= 1311:) 1299:C 1287:C 1273:C 1261:C 1254:( 1239:n 1218:, 1213:) 1201:B 1189:B 1175:B 1163:B 1156:( 1151:= 1148:B 1144:, 1139:) 1127:A 1115:A 1101:A 1089:A 1082:( 1077:= 1074:A 1070:, 1065:) 1053:C 1041:C 1027:C 1015:C 1008:( 1003:= 1000:C 969:M 963:b 958:) 950:M 944:b 940:/ 936:n 921:C 900:C 893:C 880:B 873:A 864:) 862:m 858:T 854:K 848:K 844:k 831:) 829:p 825:T 821:J 815:J 811:j 804:) 802:n 798:T 794:I 788:I 784:i 775:T 773:+ 771:J 769:: 767:J 763:T 761:+ 759:I 757:: 755:I 751:C 744:T 742:+ 740:J 738:: 736:J 732:T 730:+ 728:K 726:: 724:K 720:B 713:T 711:+ 709:K 707:: 705:K 701:T 699:+ 697:I 695:: 693:I 689:A 681:T 677:m 673:K 666:T 662:p 658:J 651:T 647:n 643:I 637:) 633:M 626:T 618:C 612:B 608:A 595:M 584:M 570:B 566:A 558:) 556:n 550:B 546:B 542:A 533:b 529:/ 525:M 517:n 512:B 508:A 500:b 496:/ 492:M 484:b 480:M 437:) 435:n 428:n 424:n 411:) 395:C 380:C 367:B 360:A 351:m 347:k 334:p 330:j 323:n 319:i 312:C 306:B 302:A 292:p 288:j 284:n 280:i 263:. 258:j 255:k 251:b 245:k 242:i 238:a 232:m 227:1 224:= 221:k 213:= 208:j 205:i 201:c 186:p 182:n 177:C 173:B 168:p 164:m 159:A 154:m 150:n 140:C 119:) 117:n 107:) 105:n 83:) 81:n 74:n 70:n 61:n

Index

matrix multiplication
numerical algorithms
scientific computing
pattern recognition
graph
parallel
distributed
takes time
field
big O notation
Strassen's algorithm
computational complexity of matrix multiplication
asymptotic complexity
Williams
galactic algorithm
definition of matrix multiplication
time
asymptotic notation
algorithm analysis

row- and column-major order
memory access patterns
cache
row-major order, column-major order
fully associative cache
tiled
divide-and-conquer algorithm
block partitioning
recursively
scalar multiplication

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.