diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md deleted file mode 100644 index c38bf93..0000000 --- a/CONTRIBUTING.md +++ /dev/null @@ -1,34 +0,0 @@ -# Come riprodurre i risultati - -Perchè tutto il team possa collaborare al progetto, è importante che tutti sappiano come abbiamo fatto a ottenere un certo risultato. - -## Come compilare - -Per compilare il codice a noi assegnato, è necessario: - -1. Accedere alla cartella in cui è contenuto: - ```console - $ cd ./atax - ``` - -2. Eseguire il Makefile: - ```console - $ make atax.elf - ``` - -## Come debuggare e profilare - -Ho configurato il [Makefile](OpenMP/linear-algebra/kernels/atax/Makefile) con un phony target che esegue il programma 25 volte e calcola il tempo di esecuzione medio: - -1. Accedere alla cartella in cui è contenuto: - ```console - $ cd ./atax - ``` - -2. Eseguire il Makefile: - ```console - $ make bench - ``` - -> Nota: funziona solo su sistemi UNIX-like! -> Nota2: ricordarsi di fare module load cuda e assegnare poi il giusto path a nvcc (sia su .vscode/c_cpp_properties.json sia nel Makefile) diff --git a/README.md b/README.md index bb72cc2..ca8ad37 100644 --- a/README.md +++ b/README.md @@ -17,37 +17,86 @@ ## Results -Results can be read in the ex.txt file where we stored all the -experiments done. - +``` +Flags: -DMINI_DATASET +CB*** Average of 3 runs: 1.03e-05 seconds +Flags: -DMINI_DATASET -DHPC_INCLUDE_INIT +CB*** Average of 3 runs: 1.27e-05 seconds +Flags: -DMINI_DATASET -DHPC_USE_CUDA +CB*** Average of 3 runs: 0.00123 seconds +Flags: -DMINI_DATASET -DHPC_INCLUDE_INIT -DHPC_USE_CUDA +CB*** Average of 3 runs: 0.00161 seconds +Flags: -DSMALL_DATASET +CB*** Average of 3 runs: 0.0014 seconds +Flags: -DSMALL_DATASET -DHPC_INCLUDE_INIT +CB*** Average of 3 runs: 0.00344 seconds +Flags: -DSMALL_DATASET -DHPC_USE_CUDA +CB*** Average of 3 runs: 0.00971 seconds +Flags: -DSMALL_DATASET -DHPC_INCLUDE_INIT -DHPC_USE_CUDA +CB*** Average of 3 runs: 0.0112 seconds +Flags: -DSTANDARD_DATASET +CB*** Average of 3 runs: 0.0876 seconds +Flags: -DSTANDARD_DATASET -DHPC_INCLUDE_INIT +CB*** Average of 3 runs: 0.188 seconds +Flags: -DSTANDARD_DATASET -DHPC_USE_CUDA +CB*** Average of 3 runs: 0.201 seconds +Flags: -DSTANDARD_DATASET -DHPC_INCLUDE_INIT -DHPC_USE_CUDA +CB*** Average of 3 runs: 0.0647 seconds +Flags: -DLARGE_DATASET +CB*** Average of 3 runs: 0.35 seconds +Flags: -DLARGE_DATASET -DHPC_INCLUDE_INIT +CB*** Average of 3 runs: 0.746 seconds +Flags: -DLARGE_DATASET -DHPC_USE_CUDA +CB*** Average of 3 runs: 0.26 seconds +Flags: -DLARGE_DATASET -DHPC_INCLUDE_INIT -DHPC_USE_CUDA +CB*** Average of 3 runs: 0.278 seconds +Flags: -DEXTRALARGE_DATASET +CB*** Average of 3 runs: 0.789 seconds +Flags: -DEXTRALARGE_DATASET -DHPC_INCLUDE_INIT +CB*** Average of 3 runs: 1.68 seconds +Flags: -DEXTRALARGE_DATASET -DHPC_USE_CUDA +CB*** Average of 3 runs: 0.647 seconds +Flags: -DEXTRALARGE_DATASET -DHPC_INCLUDE_INIT -DHPC_USE_CUDA +CB*** Average of 3 runs: 0.665 seconds +``` ### Validation -* Compiler used: **nvcc** -* Device used: `JETSON NANO DEVELOPER KIT` -* Built on: Mon_Mar_11_22:13:24_CDT_2019 Cuda compilation tools, release 10.0, V10.0.326 +> Compiler used: **nvcc** +> ``` +> Built on Mon_Mar_11_22:13:24_CDT_2019 +> Cuda compilation tools, release 10.0, V10.0.326 +> ``` +> +> Device used: **Unimore Jetson Nano #8** To reproduce the obtained results: -1. Clone the repository on @Steffo99's computer: +1. Load the CUDA module: + + ```console + $ module load cuda + ``` + +2. Clone the repository on @Steffo99's computer: ```console $ git clone https://github.com/Steffo99/unimore-hpc-assignments ``` -2. Checkout the exact commit the tests were executed on: +3. Checkout the exact commit the tests were executed on: ```console $ git checkout d13a9b786a53d5195ae17ef7afa776e2600ce8e0 ``` -3. Access our group's assigned folder: +4. Access our group's assigned folder: ```console $ cd unimore-hpc-assignments/atax ``` -4. Run the benchmarking script: +5. Run the benchmarking script: ```console $ make bench diff --git a/ex.txt b/ex.txt deleted file mode 100644 index d177bf6..0000000 --- a/ex.txt +++ /dev/null @@ -1,90 +0,0 @@ -2d6448e5aa3707370b837a37db4eb880ca06ddb7 -Performed on GTX 1070 driver 525.60.11 with atomic add on double number. - -Flags: -DMINI_DATASET -CB*** Average of 3 runs: 3.33e-06 seconds -Flags: -DMINI_DATASET -DHPC_INCLUDE_INIT -CB*** Average of 3 runs: 8.33e-06 seconds -Flags: -DMINI_DATASET -DHPC_USE_CUDA -CB*** Average of 3 runs: 6.8e-05 seconds -Flags: -DMINI_DATASET -DHPC_INCLUDE_INIT -DHPC_USE_CUDA -CB*** Average of 3 runs: 7.2e-05 seconds -Flags: -DSMALL_DATASET -CB*** Average of 3 runs: 0.000563 seconds -Flags: -DSMALL_DATASET -DHPC_INCLUDE_INIT -CB*** Average of 3 runs: 0.00139 seconds -Flags: -DSMALL_DATASET -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.000229 seconds -Flags: -DSMALL_DATASET -DHPC_INCLUDE_INIT -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.000309 seconds -Flags: -DSTANDARD_DATASET -CB*** Average of 3 runs: 0.0276 seconds -Flags: -DSTANDARD_DATASET -DHPC_INCLUDE_INIT -CB*** Average of 3 runs: 0.0664 seconds -Flags: -DSTANDARD_DATASET -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.00938 seconds -Flags: -DSTANDARD_DATASET -DHPC_INCLUDE_INIT -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.0128 seconds -Flags: -DLARGE_DATASET -CB*** Average of 3 runs: 0.109 seconds -Flags: -DLARGE_DATASET -DHPC_INCLUDE_INIT -CB*** Average of 3 runs: 0.243 seconds -Flags: -DLARGE_DATASET -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.0449 seconds -Flags: -DLARGE_DATASET -DHPC_INCLUDE_INIT -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.0459 seconds -Flags: -DEXTRALARGE_DATASET -CB*** Average of 3 runs: 0.248 seconds -Flags: -DEXTRALARGE_DATASET -DHPC_INCLUDE_INIT -CB*** Average of 3 runs: 0.584 seconds -Flags: -DEXTRALARGE_DATASET -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.0971 seconds -Flags: -DEXTRALARGE_DATASET -DHPC_INCLUDE_INIT -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.108 seconds - --------------------------------------------------------- -d13a9b786a53d5195ae17ef7afa776e2600ce8e0 -Experiment after changing a index of the vector Y -nothing special changed but i place it here. -Performed on jetson nano with atomic add on float number. - -Flags: -DMINI_DATASET -CB*** Average of 3 runs: 1.03e-05 seconds -Flags: -DMINI_DATASET -DHPC_INCLUDE_INIT -CB*** Average of 3 runs: 1.27e-05 seconds -Flags: -DMINI_DATASET -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.00123 seconds -Flags: -DMINI_DATASET -DHPC_INCLUDE_INIT -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.00161 seconds -Flags: -DSMALL_DATASET -CB*** Average of 3 runs: 0.0014 seconds -Flags: -DSMALL_DATASET -DHPC_INCLUDE_INIT -CB*** Average of 3 runs: 0.00344 seconds -Flags: -DSMALL_DATASET -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.00971 seconds -Flags: -DSMALL_DATASET -DHPC_INCLUDE_INIT -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.0112 seconds -Flags: -DSTANDARD_DATASET -CB*** Average of 3 runs: 0.0876 seconds -Flags: -DSTANDARD_DATASET -DHPC_INCLUDE_INIT -CB*** Average of 3 runs: 0.188 seconds -Flags: -DSTANDARD_DATASET -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.201 seconds -Flags: -DSTANDARD_DATASET -DHPC_INCLUDE_INIT -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.0647 seconds -Flags: -DLARGE_DATASET -CB*** Average of 3 runs: 0.35 seconds -Flags: -DLARGE_DATASET -DHPC_INCLUDE_INIT -CB*** Average of 3 runs: 0.746 seconds -Flags: -DLARGE_DATASET -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.26 seconds -Flags: -DLARGE_DATASET -DHPC_INCLUDE_INIT -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.278 seconds -Flags: -DEXTRALARGE_DATASET -CB*** Average of 3 runs: 0.789 seconds -Flags: -DEXTRALARGE_DATASET -DHPC_INCLUDE_INIT -CB*** Average of 3 runs: 1.68 seconds -Flags: -DEXTRALARGE_DATASET -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.647 seconds -Flags: -DEXTRALARGE_DATASET -DHPC_INCLUDE_INIT -DHPC_USE_CUDA -CB*** Average of 3 runs: 0.665 seconds \ No newline at end of file