快速矩阵乘法_C++_MATLAB_下载.zip资源-CSDN文库

共2000个文件

hpp：56个

cpp：34个

sh：29个

版权申诉

3 浏览量 2023-04-13 23:56:23 上传评论收藏 4.5MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

快速矩阵乘法_C++_MATLAB_下载.zip （2000个子文件）

adds.424.1 3KB

adds.423.1 3KB

outerprod.1600 10KB

tssquare.1600 10KB

adds.424.2 3KB

adds.423.2 3KB

tssquare.2000 11KB

all.outer.24 27KB

smirnov.24 23KB

all.tssquare.24 23KB

all.square.24 11KB

comparison.tss.3000.24 8KB

comparison.outer.2800.24 7KB

tssquare.2400 8KB

square.24.424 1KB

square.24.433 1KB

square.24.442 1KB

smirnov.6 23KB

all.outer.6 20KB

all.tssquare.6 19KB

comparison.tss.3000.6 8KB

comparison.outer.2800.6 7KB

dgemm_curves.all 15KB

tssquare.all 4KB

all.square.6.bfs 11KB

classical.test.bfs 3KB

square.all.big 25KB

bini223-10-52-approx 407B

bini232-10-52-approx 407B

bini322-10-52-approx 528B

classical222-8-24 246B

classical234-24-72 4KB

classical333-27-81 1KB

classical423-24-72 1KB

make.inc.corn 800B

add_benchmark.cpp 53KB

fast-matmul-search.cpp 15KB

kernels.cpp 13KB

matmul_benchmarks.cpp 10KB

cse_add_perf_benchmarks.cpp 7KB

matmul_tests.cpp 6KB

RandomMT.cpp 4KB

scaling_stability.cpp 4KB

square54_benchmark.cpp 3KB

fast423_stability.cpp 3KB

aux.cpp 3KB

dgemm_curves.cpp 3KB

fast_scaling.cpp 2KB

strassen_scaling_perf.cpp 2KB

daxpy_benchmark.cpp 2KB

dealloc_test.cpp 2KB

strassen_bad_case.cpp 1KB

strassen_scaling.cpp 1KB

schonhage_stability.cpp 1000B

smirnov333_stability.cpp 996B

bini_stability.cpp 988B

simple_dgemm.cpp 971B

strassen.cpp 943B

schonhage333.cpp 931B

fast332.cpp 902B

classical.cpp 901B

fast333.cpp 901B

fast322.cpp 892B

fast323.cpp 891B

hk332.cpp 885B

bini322.cpp 884B

fast433.cpp 884B

fast432.cpp 878B

fast424.cpp 877B

dalberto_piped_seq 9KB

make.inc.edison 762B

fast423-130 1KB

fast423-134 1KB

fast423-138 1KB

fast423-156 1KB

.gitignore 25B

.gitignore 18B

.gitignore 6B

grey-strassen 190B

grey234-20-144 1KB

grey243-20-144 1KB

grey252-18-99 950B

grey322-11-49 468B

grey322-11-50 411B

grey323-15-103 712B

grey323-15-89 1KB

grey324-20-144 1KB

grey332-15-103 712B

grey332-15-90 1KB

grey333-23-125 2KB

grey333-23-142 1KB

grey333-23-143 1KB

grey333-23-144 1KB

grey333-23-152 1KB

grey333-23-221 1KB

grey342-20-144 1KB

grey343-29-234 2KB

grey422-14-84 631B

grey423-20-121 2KB

共 2000 条

Fast matrix multiplication -------- Austin R. Benson and Grey Ballard This software contains implementations of fast matrix multiplication algorithms for sequential and shared-memory parallel environments. To cite this work, please use: Austin R. Benson and Grey Ballard. "A framework for practical parallel fast matrix multiplication". In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2015. An extended version of the paper is available on [arxiv](http://arxiv.org/pdf/1409.2908v1.pdf). For references to APA algorithms, see Grey Ballard, Jack Weissenberger, and Luoping Zhang. "Accelerating Neural Network Training using Arbitrary Precision Approximating Matrix Multiplication Algorithms". In Proceedings of the 50th International Conference on Parallel Processing Workshops (ICPP-W), 2021. License -------- Copyright 2014 Sandia Corporation. Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains certain rights in this software. This software is released under the BSD 2-Clause license. Please see the LICENSE file. Setup -------- The code requires: * Intel MKL * Compiler supporting C++11 and OpenMP The Makefile depends on an included file that specifies the compiler and the run-time mode. You must specify this file in the first line of the Makefile. For an example, see the file `make.incs/make.inc.edison`, which contains the information for running on NERSC's Edison machine. The `MODE` variable specifies sequential or parallel mode. The `DEFINES` variable can specify the type of parallelism if running in parallel mode. The `DEFINES` variable also specifies the naming convention for BLAS routines. By default, names are considered to be like "dgemm". However, by defining BLAS_POST, names are considered to be like "dgemm_", i.e., the routines have a trailing underscore. The `MKL_ROOT` variable must be set for your machine. We did most testing using the Intel compiler (icpc). Depending on the version of g++, the OpenMP task constructs can be different and the hybrid shared-memory parallel code may crash. Sequential mode, DFS parallel, and BFS parallel should work with g++. Building examples -------- First, use the code generator to generate the algorithms: cd codegen bash gen_all_algorithms.sh 0 Some simple codes that use the fast algorithms are in the `examples` directory. For example, you can build and run the (4, 3, 3) algorithm: make fast433 ./build/fast433 Building tests -------- We now assume that all of the algorithms have been gernated with the code generator (see above). The tests are built and run with: make matmul_tests ./build/matmul_tests -all 1 The tests are just for correctness of the algorithms, not for performance. You should see output like: STRASSEN_1: 257, 500, 55 Maximum relative difference: 3.87253e-15 This test runs one step of Strassen's algorithm, multiplying a 257 x 500 matrix with a 500 x 55 matrix. The maximum relative difference is an error measure: max_{ij} |C_{ij} - D_{ij}| / |C_{ij}|, where C is the result computed with the fast algorithm and D is the result computed with the classical algorithm. For all of the exact fast algorithms, the error should be around 1e-14 or 1e-15. The approximate algorithms (e.g., Bini's) have larger error. Typically, additional recursive steps leads to a larger error. Building with different parallel methods -------- The BFS, DFS, and HYBRID parallel algorithms are compile-time options. In your make include file in the `make.incs` directory, to use DFS: MODE=openmp DEFINES += -D_PARALLEL_=1 Switch the `_PARALLEL_` define to 2 for BFS or 3 for HYBRID. For an example, run make fast424 ./build/fast424 DGEMM curve benchmarks -------- Build and run the benchmark for the dgemm curves: make dgemm_curves ./build/dgemm_curves 1 # N x N x N ./build/dgemm_curves 2 # N x 800 x N ./build/dgemm_curves 3 # N x 800 x 800 The output is a semi-colon separated list, where each item loooks like: M K N num_trials time; The M, K, and N terms specify the matrix dimensions: M x K multiplied by K x N. The time is in milliseconds and is the total time to run num_trials multiplies. For example, 1200 800 1200 5 104.87; means that it took 104.87 milliseconds to multiply a 1200 x 800 matrix by a 800 x 1200 matrix five times. To build with parallelism enabled, you need to define the `_PARALLEL_` (see `make.incs/make.inc.linux`). To run without dynamic threads (i.e., mkl_set_dynamic(0)), append a second argument, e.g.: ./build/dgemm_curves 1 1 # Square timings without dynamic thread allocation Fast algorithms benchmarks -------- Build benchmarking code for all of the fast algorithms: make matmul_benchmarks The build takes a while because we are compiling all of the algorithms. To run a small test to benchmark MKL against Strassen with one, two, and three levels of recursion: ./build/matmul_benchmarks -square_test 1 The output format is specified in `data/README.md`. To run all of the benchmarks for the tall-and-skinny matrix multiplied by a small square matrix (N x k x k for fixed k): ./build/matmul_benchmarks -ts_square_like 1

评论收藏

内容反馈

版权申诉