diff --git a/README.md b/README.md
index 7316770..8c14093 100644
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@ Directory layout
 - pytorch/optimized.py
 
 Prereqs
-- C++17 compiler (g++/clang++)
+- GCC C++17 compiler (g++)
 - OpenMP (optional for cpp_omp)
 - NVIDIA CUDA toolkit for building cuda/main.cu
 - Python 3.9+ and PyTorch (with CUDA for GPU runs)
@@ -25,7 +25,6 @@ Build
   g++ -O3 -march=native -std=c++17 -DNDEBUG cpp_single/main.cpp -o bin_cpp_single
 
 - OpenMP C++:
-  Linux/macOS (clang may need -Xpreprocessor -fopenmp and libomp):
   g++ -O3 -march=native -std=c++17 -fopenmp -DNDEBUG cpp_omp/main.cpp -o bin_cpp_omp
 
 - CUDA:
@@ -67,9 +66,3 @@ Notes
 Validation
 - All variants print "result" which should be numerically close across methods
   (tiny differences expected due to different reduction orders and precision).
-
-Extensions (optional for class)
-- Fuse add+fma into one CUDA kernel to show fewer memory passes.
-- Use thrust or cub for reductions.
-- Try half-precision (float16/bfloat16) on GPU for bandwidth gains.
-- Add vectorized loads (float4) on CPU and CUDA to show further speedups.