This page is used for tracking Cargo/Rust and CUDA features that are currently supported
or planned to be supported in the future. As well as tracking some information about how they could
be supported.
Note that Not supported does not mean it won't ever be supported, it just means we haven't gotten
around to adding it yet.
behaves mostly the same (because llvm is still used for optimizations). Except that libnvvm opts are run on anything except no-opts because nvvm only has -O0 and -O3
codegen-units
✔️
LTO
➖
we load bitcode modules lazily using dependency graphs, which then forms a single module optimized by libnvvm, so all the benefits of LTO are on without pre-libnvvm LTO being needed.
Closures
✔️
Enums
✔️
Loops
✔️
If
✔️
Match
✔️
Proc Macros
✔️
Try (?)
✔️
128 bit integers
🟨
Basic ops should work (and are emulated), advanced intrinsics like ctpop, rotate, etc are unsupported.
Unions
✔️
Iterators
✔️
Dynamic Dispatch
✔️
Pointer Casts
✔️
Unsized Slices
✔️
Alloc
✔️
Printing
✔️
Panicking
✔️
Currently just traps (aborts) because of weird printing failures in the panic handler
Float Ops
✔️
Maps to libdevice intrinsics, calls to libm are not intercepted though, which we may want to do in the future
Note: Most of these categories are used very rarely in CUDA code, therefore
do not be alarmed that it seems like many things are not supported. We just focus
on things used by the wide majority of users.
Feature Name
Support Level
Notes
Function Execution Space Specifiers
➖
Variable Memory Space Specifiers
✔️
Handled Implicitly but can be explicitly stated for statics with #[address_space(...)]
Built-in Vector Types
➖
Use linear algebra libraries like vek or glam
Built-in Variables
✔️
Memory Fence Instructions
✔️
Synchronization Functions
✔️
Mathematical Functions
🟨
Less common functions like native f16 math are not supported
Texture Functions
❌
Surface Functions
❌
Read-Only Data Cache Load Function
❌
No real need, immutable references hint this automatically
Load Functions Using Cache Hints
❌
Store Functions Using Cache Hints
❌
Time Function
✔️
Atomic Functions
❌
Address Space Predicate Functions
✔️
Address Spaces are implicitly handled, but they may be added for exotic interop with CUDA C/C++
Address Space Conversion Functions
✔️
Alloca Function
➖
Compiler Optimization Hint Functions
➖
Existing core hints work
Warp Vote Functions
❌
Warp Match Functions
❌
Warp Reduce Functions
❌
Warp Shuffle Functions
❌
Nanosleep
✔️
Warp Matrix Functions (Tensor Cores)
❌
Asynchronous Barrier
❌
Asynchronous Data Copies
❌
Profiler Counter Function
✔️
Assertion
✔️
Trap Function
✔️
Breakpoint
✔️
Formatted Output
✔️
Dynamic Global Memory Allocation
✔️
Execution Configuration
✔️
Launch Bounds
❌
Pragma Unroll
❌
SIMD Video Instructions
❌
Cooperative Groups
❌
Dynamic Parallelism
❌
Stream Ordered Memory
✔️
Graph Memory Nodes
❌
Unified Memory
✔️
__restrict__
➖
Not needed, you get that performance boost automatically through rust's noalias :)