FASTMMW
Updated 464 days ago
If we apply the definition we have a M27 algorithm; if we apply a fast algorithm, we have M23. The matrices now are smaller and we have to handle matrix operations of sizes . If we have a single GPU, we save , which is better than before. If we have 2 GPUs, we need as temporary space , which is slightly more than we need for M7 with one GPU. However, with 2 GPUs M23 requires 12 steps and M27 requires 14: thus we gain . Still ahead but not as good as . With 3 GPUs, M27 requires 9 steps and M23 requires 8. We are coming ahead by . The relative gain is worse than M7 with one GPU and we need more temporary space as well. With 4 GPUS, M23 requires 6 steps (4+4+4+4+4+3) and M27 requires 7; we are back saving . I do not have more GPUs …... This year I could finally show that 3x3x3 can be used in between M/2 and M. Thus, we have a hierarchical algorithm that changes strategy as function of the problem size and of the architecture. To show this practical performance advantage and the existence..