# AlgEng - Matrix Multiplication
This C++ project is build as a part of the Algorithm
Engineering course at Aarhus University.
It includes different includes implementations of
different alrogithms for multiplying matrices.
## Algorithms
The project contains the following algorithms.
Algorithm | Description
--- | ---
**Naive** | This algorithm is simply three simple for-loops iterating through the two matrices.
**Naive:x** | This algorithm is equal to **Naive** but usees OpenMP to parallelize
**Naive:T** | This algorithm takes advantage of transposing B in A * B before multiplying the two to lower cache misses from $n^3$ to $\frac{n^3}{cache-line-size}$.
**Naive:T:x** | This algorithm is equal to **Naive:T** but uses OpenMP to parallelize
**Obl** | This algorithm uses a cache oblivious approach to lower the cache misses by keeping on recursing on smaller sub problems.
**Obl:x** | This algorithm uses the same recursion as **Obl** but stops when the problem size is below a threshold of size *x* where it switches to the **Naive** algorihm.
**Obl:T:x** | This algorithm combines the benefits of **Naive:T** and **Obl:x** by transposing B and recursing until some threshold *x* and then switching to **Naive:T**.
**Tile:x** | This algorithm uses a tile based approach where it divides A and B into tiles that fits into cache and then calculates every sub problem using **Naive**. This algorithm is thus a cache aware algorithm.
**Tile:T:x** | This algorithm uses the same algorithm as **Tile:x** but uses **Naive:T** for the sub problems.
## Building
To build the project make a folder called output
in the root of the project and run cmake and make
to build it. **Note** this project depends on
Intels PCM library as well as PAPI. Both are listed
under dependencies.
The build will generate four executables:
- `testing`: Used to run unittests of all the implementations.
- `datagen`: Used to generate data for the benchmark programs.
- `benchmark`: Benchmarks algorithms using the PAPI library
- `benchmark2`: Benchmarks algorithms using the PCM library
## Testing
The testing executable takes no arguments and runs all tests
in the test folder
```commandline
> ./testing
```
## Datagen
The datagen executable takes four arguments and generates a
datafile for two matrices A of size m x n and B og size n x p
filled with random numbers between -1000 and 1000.
```commandline
> ./datagen <m> <n> <p> <output-file>
```
## Benchmark
The benchmark executable takes different arguments and benchmarks
all the algorithms using PAPI.
**Arguments**:
- `-r:<refresh>`: is an unsigned int telling how many refresh
iterations to run before measuring. Default 2.
- `-l:<loop>`: is an unsigned int telling how many iterations
to measure and average over. Default 5.
- `-i:<input-file>`: specifies a file generated by datagen
to benchmark algorithms against. Multiple files can be specified.
- `-o:<output-file>`: specifies the prefix of the resulting output files.
Example:
```commandline
> sudo taskset -c 0 nice -n -20 ./benchmark -i:data/42_42_42.data -r:5 -l:10 -o:my_run
```
*Note* that all benchmark result data will be located in the *output* folder in the root of the project.
## Benchmark2
The benchmark executable takes different arguments and benchmarks
all the algorithms using PCM.
Arguments are the same as benchmark but with an additional
- `-c:<core>`: specifies the core to measure events from (should
be the same as set in taskset for linux).
Example:
```commandline
> sudo taskset -c 0 nice -n -20 ./benchmark -i:data/42_42_42.data -r:5 -l:10 -o:my_run -c:0
```
*Note* that all benchmark result data will be located in the *output* folder in the root of the project.
## Extra tools
The project includes `print_result_cols.py` that can be used to print a single column from the result files
in the following way.
**Arguments**:
- `<col-idx>`: 0 based index of the col to print from every file
- `<data-file>`: One or more files to print data from.
```commandline
> python print_result_cols.py 0 my_run.nai1.data my_run.obl.160.data
```
Will print the first column of `my_run.nai1.data` and `my_run.obl.160.data` in a table like style.
## Dependencies
- [Catch test framework](https://github.com/philsquared/Catch):
Catch is included in the project and should work out of the box.
- [Intel Performance Counter Monitoring (PCM)](https://github.com/opcm/pcm):
For PCM to work it has to be cloned from its repository and build.
Afterwards it can be linked in the `CMakeLists.txt` file in this project.
Note the the project is setup to work with Intel® Core™ i7-5600U Processor
and not guaranteed to work with others.
- [Performance API (PAPI)](http://icl.utk.edu/papi/):
PAPI works only on Linux and it has to be installed such that `/usr/local/lib/libpapi.a`
is present at this exact location. Installation guide can be found on the webpage.