This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
version 1: use single loop(1.10ms) version 2: use cuBLAS(0.77ms) version 3: utilize shared memory based on v1(0.98ms) version 4: utilize prefetch based on v3(0.59ms) ...