A minimal, production-ready CUDA C++ project introducing NVIDIA Thrust, a high-level parallel algorithms library for CUDA, designed for reproducible builds, clean debugging, and educational clarity. Works seamlessly under Windows + WSL2 environments.
- CUDA C++ example:
src/thrust_intro.cu - Demonstrates
thrust::transformwithzip_iteratorand a functor (SAXPY example) - Cross-platform build via CMake
- GPU architecture targeting (Compute Capability 8.9, RTX 4070 SUPER)
- Optional
.vscode/launch.jsonfor GDB debugging - Optional
CTestintegration for regression testing - Status check script for verifying CUDA environment
#include <cstdio>
#include <thrust/device_vector.h>
#include <thrust/transform.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>
struct saxpy_functor {
float a;
__host__ __device__
float operator()(const thrust::tuple<float,float>& t) const {
return a * thrust::get<0>(t) + thrust::get<1>(t);
}
};
int main(){
const int N = 1 << 20;
thrust::device_vector<float> x(N, 1.f), y(N, 2.f), z(N);
saxpy_functor f{3.f};
auto first = thrust::make_zip_iterator(thrust::make_tuple(x.begin(), y.begin()));
auto last = thrust::make_zip_iterator(thrust::make_tuple(x.end(), y.end()));
thrust::transform(first, last, z.begin(), f);
float z0 = z[0], zN = z[N-1];
printf("z[0]=%.1f z[N-1]=%.1f (expect 5.0)\n", z0, zN);
printf("Success!\n");
return 0;
}Output:
z[0]=5.0 z[N-1]=5.0 (expect 5.0)
Success!
cpp-cuda-thust-intro/
├── src/
│ └── thrust_intro.cu
├── build/ # Auto-generated by CMake
├── CMakeLists.txt
├── README.md
├── check_thrust_intro_status.sh
└── (optional) .vscode/ # Launch/debug configuration
From project root:
# Configure
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
# Build
cmake --build build -j
# Run
./build/thrust_introExpected output:
z[0]=5.0 z[N-1]=5.0 (expect 5.0)
Success!
✅ Requires: CUDA 12.0+, a GPU with Compute Capability ≥ 8.9 (e.g. RTX 4070 SUPER), CMake 3.24+, and a C++17 compiler.
If you enable CTest in your CMake configuration, you can run tests with:
cd build
ctestExpected result:
100% tests passed, 0 tests failed out of 1
A helper script validates your CUDA environment and runs a quick benchmark.
./check_thrust_intro_status.shTypical output:
GPU: NVIDIA GeForce RTX 4070 SUPER (Compute Capability 8.9)
Driver Version: 560.xx, CUDA 12.8
Vector Add completed in 0.5 ms
- Understand how Thrust abstracts GPU kernels into STL-like functions.
- Learn
transformwith zip iterators to apply operations on multiple device vectors. - Practice building CUDA C++ projects with CMake on Linux/WSL2.
- Compare explicit CUDA kernel programming vs. Thrust abstractions.
- CUDA
- Thrust (NVIDIA)
- GPU computing
- Parallel programming
- C++ / CMake / WSL2
- Examples: transform · zip iterator · device vector
| Component | Recommended Version |
|---|---|
| CUDA Toolkit | 12.8+ |
| CMake | ≥ 3.24 |
| Compiler | g++ or nvcc |
| GPU | RTX 4070 SUPER (SM 8.9) |
| OS | Windows 11 + WSL2 (Ubuntu 22.04) |
Launch configuration is in .vscode/launch.json:
{
"name": "Run thrust_intro",
"type": "cppdbg",
"request": "launch",
"program": "${workspaceFolder}/build/thrust_intro",
"cwd": "${workspaceFolder}",
"MIMode": "gdb"
}Run or debug directly with F5 inside VS Code.
- CUDA by Example — Sanders & Kandrot
- NVIDIA CUDA Toolkit Docs
- Thrust Quick Start Guide
- CMake + CUDA Language Guide
MIT License © 2025 Samuel Huang (FlosMume)