【免费】中山大学2020并行式与分布式计算课程作业.zip资源-CSDN文库

共64个文件

png：28个

cpp：11个

md：7个

项目

毕业设计

需积分: 0 83 浏览量 2024-02-10 16:13:44 上传评论收藏 5.88MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

中山大学2020并行式与分布式计算课程作业.zip （64个子文件）

资料总结

作业6

jacobi.cu 7KB

18340052-何泽-并行分布式计算作业6-v1.pdf 281KB

error-test.cu 2KB

图片

1.PNG 12KB

2.PNG 10KB

5.PNG 64KB

8.PNG 3KB

6.PNG 3KB

7.PNG 3KB

4.PNG 31KB

11.PNG 3KB

10.PNG 3KB

3.PNG 14KB

9.PNG 3KB

jacobi.h 2KB

error_checks.h 886B

README.md 6KB

error_check_1.h 886B

作业5

1138_bus.txt 74KB

生产者消费者.cpp 1KB

mpitest.cpp 4KB

矩阵大小不同线程数相同.cpp 5KB

662_bus.txt 44KB

18340052-何泽-并行分布式计算作业5-v1.pdf 1.28MB

图片

1.PNG 6KB

2.PNG 142KB

8.PNG 29KB

7.PNG 24KB

13.PNG 533KB

10.PNG 43KB

11.png 59KB

6.png 53KB

5.png 99KB

3.PNG 85KB

4.png 73KB

9.PNG 80KB

12.PNG 213KB

14.png 24KB

494_bus.txt 30KB

685_bus.txt 55KB

矩阵大小相同线程数不同.cpp 2KB

README.md 13KB

作业4

18340052-何泽-并行分布式计算作业4-v1.pdf 135KB

main.cpp 602B

README.md 2KB

作业2

18340052-何泽-并行分布式计算作业2-v1.pdf 1017KB

2.jpg 109KB

par3.cpp 7KB

1.jpg 247KB

par2.cpp 7KB

3.jpg 119KB

README.md 15KB

par1.cpp 7KB

作业3

图片

1.PNG 91KB

2.PNG 13KB

3.png 255KB

18340052-何泽-并行分布式计算作业3-v1.pdf 581KB

test.cpp 367B

README.md 2KB

作业1

单次运行.cpp 2KB

18340052-何泽-并行分布式计算作业1-v1.pdf 634KB

运行1000次取平均.cpp 2KB

README.md 7KB

README.md 91B

# 一、问题描述 > 1. 分别采用不同的算法(非分布式算法)例如一般算法、分治算法和Strassen算法等计算计算矩阵两个300x300的矩阵乘积，并通过Perf工具分别观察cache miss、CPI、 mem_load 等性能指标 > > 2. Consider a memory system with a level 1 cache of 32KB and DRAM of 512 MB with the processor operating at 1 GHz. The latency to L1 cache is one cycle and the latency to DRAM is 100 cycles. In each memory cycle, the processes fetches four words (cache line size is four words ). What is the peak achievable performance of a dot product of two vectors? Note: Where necessary, assume an optimal cache placement policy. > > ```c++ > /* dot product loop */ > for (i = 0; i < dim; i++) > dot_prod += a[i] * b[i]; > ``` > > 3. Now consider the problem of multiplying a dense matrix with a vector using a two-loop dot-product formulation. The matrix is of dimension 4K x 4K. (Each row of the matrix takes 16KB of storage.) What is the peak achievable performance of this technique using a two-loop dot-product based matrix-vector product? > > ```c++ > /* matrix-vector product loop */ > for (i = 0; i < dim; i++) > for (j = 0; i < dim; j++) > c[i] += a[i][j] * b[j]; > ``` # 二、解决方案 ## Ⅰ、问题一 ### 1. 算法描述 #### ①一般算法最直白、直接的算法，即用一个矩阵的每一行的每个元素分别乘另一个矩阵每一列的每个元素，各个乘积相加得到结果矩阵的一个元素。具体实现即三重循环。 #### ②分治算法不断地将矩阵分为四个矩阵分别计算，直到变为单个元素. - 三个矩阵可以写成下面的格式： $$ A= \left[ \begin{matrix}{A_{11}}&A_{12}\\A_{21}&A_{22} \end{matrix} \right], B= \left[ \begin{matrix}{B_{11}}&B_{12}\\B_{21}&B_{22} \end{matrix} \right], C= \left[ \begin{matrix}{C_{11}}&C_{12}\\C_{21}&A_{22} \end{matrix} \right] $$ - 那么相关的计算可以写成 $$ \left[ \begin{matrix}{C_{11}}&C_{12}\\C_{21}&A_{22} \end{matrix} \right]= \left[ \begin{matrix}{A_{11}}&A_{12}\\A_{21}&A_{22} \end{matrix} \right]\times \left[ \begin{matrix}{B_{11}}&B_{12}\\B_{21}&B_{22} \end{matrix} \right]\\ C_{11}=A_{11}\times B_{11}+ A_{12}\times B_{21}\\ C_{12}=A_{11}\times B_{12}+ A_{12}\times B_{22}\\ C_{21}=A_{21}\times B_{11}+ A_{22}\times B_{21}\\ C_{22}=A_{21}\times B_{12}+ A_{22}\times B_{22} $$ - 同理A~11~等一些子矩阵也可以写成相关的子矩阵，就这样将矩阵不断分解为小矩阵进行计算，最后归并为一个矩阵。同时，这种方法要求矩阵的行列必须为2的n次方。 #### ③Strassen算法 Strassen算法同样是使用分治的思想解决问题，只不过，不同的是当矩阵的阶很大时就会采取一个递推式进行计算相关递推式。计算如下表达式： $$ S_1=B_{12}-B_{22}\\ S_2=A_{11}+A_{12}\\ S_3=A_{21}+A_{22}\\ S_4=B_{21}-B_{11}\\ S_5=A_{11}+A_{22}\\ S_6=B_{11}+B_{22}\\ S_7=A_{12}-A_{22}\\ S_8=B_{21}+B_{22}\\ S_9=A_{11}-A_{21}\\ S_10=B_{11}+B_{12}\\ $$ $$ P_1=A_{11}\times S_1\\ P_2=S_2\times B_{22}\\ P_3=S_3\times B_{11}\\ P_4=A_{22}\times S_4\\ P_5=S_5\times S_6\\ P_6=S_7\times S_8\\ P_7=S_9\times S_{10}\\ $$ 那么最终的矩阵结果为： $$ C_{11}=P_5+P_4-P_2+P_6\\ C_{12}=P_1+P_2\\ C_{21}=P_3+P_4\\ C_{22}=P_5+P_1-P_3-P_7 $$ 其中A11，A12，A21，A22和B11，B12，B21，B22分别为两个乘数A和B矩阵的四个子矩阵。C11，C12，C21，C22为最终的结果C矩阵的四个子矩阵。那么只需要将这四个矩阵合并就是最终结果。 ### 2. 算法实现 > **为了之后测试三种算法的公平性，我准备将三种算法全都写到一个类里，在测试用不同算法计算时，只需要调用类里的不同函数即可。** #### ① 类的定义与构造函数实现其中，构造函数就是传入行数、列数并随机赋值 ```c++ class Matrix{ private: int row; int col; vector<vector<int>> data; public: Matrix(int row, int col) :data(row),row(row),col(col){ for (int i = 0; i < row; i++){ data[i].resize(col); } srand(time(0)); for (int i = 0; i < row; i++){ for (int j = 0; j < col; j++){ data[i][j] = rand(); } } } Matrix(int row1, int col1, int row2, int col2, const Matrix& a) :row(row2 - row1), col(col2 - col1),data(row){ for (int i = 0; i < row; i++){ data[i].resize(col); } for (int i = 0; i < row; i++){ for (int j = 0; j < col; j++){ data[i][j] = a.get(col1 + i, row1 + j); } } } ~Matrix(){ this->row = 0; this->col = 0; } Matrix(const Matrix& a){ *this = a; } Matrix* cal1(const Matrix&); Matrix* cal2(const Matrix&); Matrix* cal3(const Matrix&); Matrix* operator+(const Matrix&); Matrix* operator-(const Matrix&); Matrix* operator*(const Matrix&); static Matrix* add(const Matrix*, const Matrix*); static Matrix* sub(const Matrix*, const Matrix*); static Matrix* plus(const Matrix*, const Matrix*,const Matrix*,const Matrix*); vector<int> operator[](const int); int get(const int,const int) const; void set(const int, const int, int); bool isSimilar(const Matrix& x); void clear(int); }; ``` #### ② 除计算乘积函数之外其他函数的实现 - 与另一个矩阵加减乘定义： ```c++ Matrix* operator+(const Matrix&); Matrix* operator-(const Matrix&); Matrix* operator*(const Matrix&); ``` 实现： ```c++ Matrix* Matrix::operator+(const Matrix& a) { if (!isSimilar(a)){ return nullptr; } Matrix *ptr = new Matrix(a.row, a.col); for (int i = 0; i < row; i++){ for (int j = 0; j < col; j++){ ptr->set(i, j, this->get(i, j) + a.get(i, j)); } } return ptr; } Matrix* Matrix::operator-(const Matrix& a) { if (!isSimilar(a)){ return nullptr; } Matrix *ptr = new Matrix(a.row, a.col); for (int i = 0; i < row; i++){ for (int j = 0; j < col; j++){ ptr->set(i, j, this->get(i, j) - a.get(i, j)); } } return ptr; } ``` 由于不同方法乘法方法不同，乘法函数后面再说 - 两矩阵相加减 ```c++ Matrix* Matrix::add(const Matrix* p1, const Matrix* p2) { if (!(p1->col == p2->col && p1->row == p2->row)){ return nullptr; } Matrix *ptr = new Matrix(p1->row, p1->col); for (int i = 0; i < p1->row; i++){ for (int j = 0; j < p1->col; j++){ ptr->set(i, j, (p1->get(i, j) + p2->get(i, j))); } } return ptr; } Matrix* Matrix::sub(const Matrix* p1, const Matrix* p2) { if (!(p1->col == p2->col && p1->row == p2->row)){ return nullptr; } Matrix *ptr = new Matrix(p1->row, p1->col); for (int i = 0; i < p1->row; i++){ for (int j = 0; j < p1->col; j++){ ptr->set(i, j, (p1->get(i, j) - p2->get(i, j))); } } return ptr; } ``` - 将四个矩阵合并为一个矩阵 ```c++ Matrix* Matrix::plus(const Matrix* p1, const Matrix* p2,const Matrix* p3, const Matrix* p4) { //不符合可以进行合并的条件 if (!(p1->row == p2->row && p2->col == p4->col && p4->row == p3->row && p1->col == p3->col)){ return nullptr; } Matrix* ptr = new Matrix(p1->row + p3->row, p2->col + p1->col); ptr->clear(0); for (int i = 0; i < p1->row; i++){ for (int j = 0; j < p1->col; j++){ ptr->set(i, j, p1->get(i, j)); } } for (int i = 0; i < p2->row; i++){ for (in

评论收藏

内容反馈