init
This commit is contained in:
parent
cb240a6d75
commit
baa9bc407b
1 changed files with 9 additions and 4 deletions
13
README.md
13
README.md
|
@ -18,7 +18,7 @@ Prereqs
|
|||
- GCC C++17 compiler (g++)
|
||||
- OpenMP (optional for cpp_omp)
|
||||
- NVIDIA CUDA toolkit for building cuda/main.cu
|
||||
- Python 3.9+ and PyTorch (with CUDA for GPU runs)
|
||||
- uv (Astral's Python package manager) and PyTorch (with CUDA for GPU runs)
|
||||
|
||||
Build
|
||||
- Single-threaded C++:
|
||||
|
@ -27,6 +27,11 @@ Build
|
|||
- OpenMP C++:
|
||||
g++ -O3 -march=native -std=c++17 -fopenmp -DNDEBUG cpp_omp/main.cpp -o bin_cpp_omp
|
||||
|
||||
Note: If using clang++ instead of g++, OpenMP support may require additional setup:
|
||||
- On macOS: brew install libomp, then use: clang++ -Xpreprocessor -fopenmp -lomp ...
|
||||
- On Linux: install libomp-dev, then use: clang++ -fopenmp ...
|
||||
- Or stick with g++ which has built-in OpenMP support
|
||||
|
||||
- CUDA:
|
||||
nvcc -O3 -arch=native cuda/main.cu -o bin_cuda
|
||||
If -arch=native not supported, use e.g.:
|
||||
|
@ -44,11 +49,11 @@ Run
|
|||
./bin_cuda 100000000 10
|
||||
|
||||
- PyTorch baseline (CPU or GPU auto-detect):
|
||||
python pytorch/baseline.py --N 100000000 --iters 10 --device cuda
|
||||
python pytorch/baseline.py --N 100000000 --iters 10 --device cpu
|
||||
uv run pytorch/baseline.py --N 100000000 --iters 10 --device cuda
|
||||
uv run pytorch/baseline.py --N 100000000 --iters 10 --device cpu
|
||||
|
||||
- PyTorch optimized:
|
||||
python pytorch/optimized.py --N 100000000 --iters 10
|
||||
uv run pytorch/optimized.py --N 100000000 --iters 10
|
||||
|
||||
Notes
|
||||
- Memory: N=100M uses ~400 MB for A,B,C and ~400 MB for D,E. Ensure enough RAM/GPU memory.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue