Skip to content

Execution Providers

Execution Providers (EPs) enable hardware-accelerated inference. All EPs support the configuration pattern: with_<module>_<ep>_<option>() / with_<ep>_<option>_all().

Quick Reference

Provider Feature Flag Device Best For
TensorRT tensorrt Device::TensorRT(id) NVIDIA GPUs (fastest)
TensorRT-RTX nvrtx Device::NvTensorRT(id) RTX GPUs
CUDA cuda Device::Cuda(id) NVIDIA GPUs
CoreML coreml Device::CoreML Apple Silicon
OpenVINO openvino Device::OpenVINO(target) Intel CPUs/GPUs
DirectML directml Device::DirectML(id) Windows
MIGraphX migraphx Device::MIGraphX AMD GPUs
CANN cann Device::CANN(id) Huawei Ascend
oneDNN onednn Device::OneDNN Intel CPUs
NNAPI nnapi Device::NNAPI Android
ARM NN armnn Device::ArmNN ARM devices
WebGPU webgpu Device::WebGPU Browsers

TensorRT

Option Type Default Description
fp16 bool true Enable FP16 precision
engine_cache bool true Cache compiled engines
timing_cache bool false Cache timing profiles
builder_optimization_level u8 3 Builder optimization (0-5)
max_workspace_size usize 1073741824 Max workspace (1GB)
min_subgraph_size usize 1 Min subgraph node count
dump_ep_context_model bool false Dump context model
dump_subgraphs bool false Dump subgraphs

Example

Config::default()
    .with_model_device(Device::TensorRT(0))
    .with_model_tensorrt_fp16(true)
    .with_model_tensorrt_engine_cache(true)
    .with_model_tensorrt_builder_optimization_level(3)
    .commit()?;

First Run Slow

TensorRT builds engines on first run. Enable engine_cache for instant subsequent loads.

Dynamic Shapes

Dynamic shapes in usls are configured in a way that closely mirrors trtexec.

trtexec example:

Example

trtexec --fp16 --onnx=your_model.onnx \
    --minShapes=images:1x3x416x416 \
    --optShapes=images:1x3x640x640 \
    --maxShapes=images:8x3x800x800 \
    --saveEngine=your_model.engine

Equivalent usls configuration:

Example

Config::default()
    .with_model_ixx(0, 0, (1, 1, 8))        // batch: min=1, opt=1, max=8
    .with_model_ixx(0, 1, 3)                // channels: fixed at 3
    .with_model_ixx(0, 2, (416, 640, 800))  // height: min/opt/max
    .with_model_ixx(0, 3, (416, 640, 800))  // width: min/opt/max
    .commit()?;

TensorRT-RTX

Same options as TensorRT, but preserves input precision (no auto FP32→FP16 conversion).

Example

Config::default()
    .with_model_device(Device::NvTensorRT(0))
    .commit()?;

TensorRT vs TensorRT-RTX

  • TensorRT EP: Automatically handles FP32→FP16 conversion. Use --dtype fp32 for optimal performance.
  • TensorRT-RTX EP: Preserves input precision. No automatic conversion.

CUDA

Option Type Default Description
cuda_graph bool false Enable CUDA graph capture
fuse_conv_bias bool false Fuse conv+bias for perf
conv_max_workspace bool true Max workspace for conv search
tf32 bool true Enable TF32 on Ampere+
prefer_nhwc bool true Prefer NHWC layout

Example

Config::default()
    .with_model_device(Device::Cuda(0))
    .with_model_cuda_cuda_graph(true)
    .with_model_cuda_tf32(true)
    .commit()?;

CoreML (Apple)

Option Type Default Description
static_input_shapes bool true Static shapes for optimization
subgraph_running bool true Enable subgraph mode
model_format u8 0 0=MLProgram, 1=NeuralNetwork
compute_units u8 0 0=All, 1=CPUAndGPU, 2=CPUAndNeuralEngine, 3=CPUOnly
specialization_strategy u8 1 0=Default, 1=FastPrediction, 2=FastCompilation

Example

Config::default()
    .with_model_device(Device::CoreML)
    .with_model_coreml_static_input_shapes(true)
    .with_model_coreml_compute_units(0)
    .commit()?;

OpenVINO (Intel)

Option Type Default Description
dynamic_shapes bool true Enable dynamic shapes
opencl_throttling bool true Enable OpenCL throttling
qdq_optimizer bool true Enable QDQ optimizer
num_threads usize 8 Number of threads

Example

// CPU target
Config::default()
    .with_model_device(Device::OpenVINO("CPU".to_string()))
    .with_model_openvino_num_threads(8)
    .commit()?;

// GPU target
Config::default()
    .with_model_device(Device::OpenVINO("GPU".to_string()))
    .commit()?;

Dynamic Loading

Some platforms require: cargo run -F openvino -F ort-load-dynamic


oneDNN (Intel)

Option Type Default Description
arena_allocator bool true Enable arena allocator

Example

Config::default()
    .with_model_device(Device::OneDNN)
    .with_model_onednn_arena_allocator(true)
    .commit()?;

CANN (Huawei)

Option Type Default Description
graph_inference bool true Enable graph inference
dump_graphs bool false Dump graphs for debug
dump_om_model bool true Dump OM model

Example

Config::default()
    .with_model_device(Device::CANN(0))
    .with_model_cann_graph_inference(true)
    .commit()?;

MIGraphX (AMD)

Option Type Default Description
fp16 bool true Enable FP16 precision
exhaustive_tune bool false Exhaustive tuning

Example

Config::default()
    .with_model_device(Device::MIGraphX)
    .with_model_migraphx_fp16(true)
    .commit()?;

NNAPI (Android)

Option Type Default Description
cpu_only bool false Force CPU-only execution
disable_cpu bool false Disable CPU fallback
fp16 bool true Enable FP16 precision
nchw bool false Use NCHW layout

Example

Config::default()
    .with_model_device(Device::NNAPI)
    .with_model_nnapi_fp16(true)
    .commit()?;

ARM NN

Option Type Default Description
arena_allocator bool true Enable arena allocator

Example

Config::default()
    .with_model_device(Device::ArmNN)
    .with_model_armnn_arena_allocator(true)
    .commit()?;

WebGPU

No configurable parameters currently.

Example

Config::default()
    .with_model_device(Device::WebGPU)
    .commit()?;

CPU

Option Type Default Description
arena_allocator bool true Enable arena allocator

Example

Config::default()
    .with_model_device(Device::Cpu)
    .with_model_cpu_arena_allocator(true)
    .commit()?;

Configuration Patterns

Pattern Method Scope
Per-module with_model_<ep>_<option>() Single module
Global with_<ep>_<option>_all() All modules
Explicit with_<ep>_<option>_module(Module, value) Specific module

Example

Config::default()
    // TensorRT FP16 for model module only
    .with_model_tensorrt_fp16(true)

    // CoreML static shapes for all modules
    .with_coreml_static_input_shapes_all(true)

    // Explicit module specification
    .with_tensorrt_fp16_module(Module::Visual, true)
    .commit()?;