init
This commit is contained in:
parent
cb240a6d75
commit
baa9bc407b
1 changed files with 9 additions and 4 deletions
13
README.md
13
README.md
|
@ -18,7 +18,7 @@ Prereqs
|
||||||
- GCC C++17 compiler (g++)
|
- GCC C++17 compiler (g++)
|
||||||
- OpenMP (optional for cpp_omp)
|
- OpenMP (optional for cpp_omp)
|
||||||
- NVIDIA CUDA toolkit for building cuda/main.cu
|
- NVIDIA CUDA toolkit for building cuda/main.cu
|
||||||
- Python 3.9+ and PyTorch (with CUDA for GPU runs)
|
- uv (Astral's Python package manager) and PyTorch (with CUDA for GPU runs)
|
||||||
|
|
||||||
Build
|
Build
|
||||||
- Single-threaded C++:
|
- Single-threaded C++:
|
||||||
|
@ -27,6 +27,11 @@ Build
|
||||||
- OpenMP C++:
|
- OpenMP C++:
|
||||||
g++ -O3 -march=native -std=c++17 -fopenmp -DNDEBUG cpp_omp/main.cpp -o bin_cpp_omp
|
g++ -O3 -march=native -std=c++17 -fopenmp -DNDEBUG cpp_omp/main.cpp -o bin_cpp_omp
|
||||||
|
|
||||||
|
Note: If using clang++ instead of g++, OpenMP support may require additional setup:
|
||||||
|
- On macOS: brew install libomp, then use: clang++ -Xpreprocessor -fopenmp -lomp ...
|
||||||
|
- On Linux: install libomp-dev, then use: clang++ -fopenmp ...
|
||||||
|
- Or stick with g++ which has built-in OpenMP support
|
||||||
|
|
||||||
- CUDA:
|
- CUDA:
|
||||||
nvcc -O3 -arch=native cuda/main.cu -o bin_cuda
|
nvcc -O3 -arch=native cuda/main.cu -o bin_cuda
|
||||||
If -arch=native not supported, use e.g.:
|
If -arch=native not supported, use e.g.:
|
||||||
|
@ -44,11 +49,11 @@ Run
|
||||||
./bin_cuda 100000000 10
|
./bin_cuda 100000000 10
|
||||||
|
|
||||||
- PyTorch baseline (CPU or GPU auto-detect):
|
- PyTorch baseline (CPU or GPU auto-detect):
|
||||||
python pytorch/baseline.py --N 100000000 --iters 10 --device cuda
|
uv run pytorch/baseline.py --N 100000000 --iters 10 --device cuda
|
||||||
python pytorch/baseline.py --N 100000000 --iters 10 --device cpu
|
uv run pytorch/baseline.py --N 100000000 --iters 10 --device cpu
|
||||||
|
|
||||||
- PyTorch optimized:
|
- PyTorch optimized:
|
||||||
python pytorch/optimized.py --N 100000000 --iters 10
|
uv run pytorch/optimized.py --N 100000000 --iters 10
|
||||||
|
|
||||||
Notes
|
Notes
|
||||||
- Memory: N=100M uses ~400 MB for A,B,C and ~400 MB for D,E. Ensure enough RAM/GPU memory.
|
- Memory: N=100M uses ~400 MB for A,B,C and ~400 MB for D,E. Ensure enough RAM/GPU memory.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue