I was trying to compare different algorithms based on input size. I have implemented all these algorithms in C. Following table gives you how the order of growth is affecting practical implementation.
n
|
Bubble Sort
|
Insertion Sort
|
Merge Sort
|
Selection Sort
|
100
|
0 ms
|
0 ms
|
0 ms
|
0 ms
|
1000
|
0 ms
|
0 ms
|
0 ms
|
0 ms
|
10000
|
250 ms
|
62 ms
|
0 ms
|
94 ms
|
50000
|
6546 ms
|
1563 ms
|
16 ms
|
2328 ms
|
100000
|
26687 ms
|
6266 ms
|
47 ms
|
9291 ms
|
0 means it has taken very less or negligible time.
So it looks like Merge sort is clear winner. Followed by insertion sort, selection sort and finally Bubble sort.
Same thing I applied for Matrix Multiplication.
N
|
Brute Force (n^3)
|
Strassen
|
128
|
16 ms
|
1391 ms
|
512
|
656 ms
|
68078 ms
|
1024
|
7297 ms
|
474781 ms
|
10000
|
3.64 hours
|
Did not stop even after 24 hours
|
(Actually even if it had stopped answer would be wrong as 10000 is not power of 2)
|
As you can see it looks like Strassen is not yet all beating brute force in any way. Then I came across following points in Wikipedia (https://en.wikipedia.org/wiki/Strassen_algorithm)
Practical implementations of Strassen's algorithm switch to standard methods of matrix multiplication for small enough submatrices, for which those algorithms are more efficient. The particular crossover point for which Strassen's algorithm is more efficient depends on the specific implementation and hardware. Earlier authors had estimated that Strassen's algorithm is faster for matrices with widths from 32 to 128 for optimized implementations.[1]However, it has been observed that this crossover point has been increasing in recent years, and a 2010 study found that even a single step of Strassen's algorithm is often not beneficial on current architectures, compared to a highly optimized traditional multiplication, until matrix sizes exceed 1000 or more, and even for matrix sizes of several thousand the benefit is typically marginal at best (around 10% or less)
Similar things are discussed in some of the coding forums -> http://stackoverflow.com/questions/13559928/why-is-my-strassens-matrix-multiplication-slow?rq=1
So based on above comments I changed implementation. I kept Crossover value as 32. That is if sub matrix size is 32x32 than Strassen's algorithm instead of calling its own algorithm with smaller size, it will call Brute force algorithm to do the lower size matrix multiplication.
N
|
Brute Force (n^3)
|
Strassen
|
64
|
0 ms
|
16 ms
|
128
|
16 ms
|
16 ms
|
256
|
125 ms
|
110 ms
|
512
|
1078 ms
|
937 ms
|
1024
|
14468 ms
|
6282 ms
|
Clearly Strassen is overtaking brute force. Now I tried with crossover value as 16. Again Strassen was very close to Brute force but could not beat it. For value 128, Strassen was able to beat Brute force only after N size of 512.
In my case Strassen algorithm takes 2-3 times more memory than Brute Force like for n = 1024 i.e. 1024*1024 matrix brute force took 21 MB where as Strassens algorithm took 48 MB.