Skip to content

Data Types (DType)

usls supports multiple precision levels to balance accuracy, speed, and memory usage.

Quick Reference

DType Precision Speed Memory Best For
Fp32 32-bit float Baseline 100% Maximum accuracy
Fp16 16-bit float 2-3x faster 50% Modern GPUs
Q8 8-bit quantized Fast 25% Edge deployment
Q4F16 4-bit + FP16 Fast ~15% VLMs, limited memory
Bnb4 BitsAndBytes 4-bit Fast ~12% Ultra-low memory

Configuration

Global (All Modules)

Example

let config = Config::sam3()
    .with_dtype_all(DType::Fp16)
    .commit()?;

Per-Module

Example

let config = Config::clip()
    // Fast visual encoding
    .with_visual_dtype(DType::Fp16)
    // Accurate text encoding
    .with_textual_dtype(DType::Fp32)
    .commit()?;

TensorRT Note

TensorRT FP32 Behavior

TensorRT automatically converts FP32 to FP16 for performance. Use --dtype fp32 with TensorRT for optimal speed.

TensorRT-RTX preserves the input precision exactly.