🧩 Vision-Language Models (VLM)¶
Status: ✅ Supported | ❓ Unknown | ❌ Not Supported For Now
| Model | Task / Description | Demo | Dynamic Batch | TensorRT | FP32 | FP16 | Q8 | Q4f16 | BNB4 |
|---|---|---|---|---|---|---|---|---|---|
| BLIP | Image Captioning | demo | ✅ | ❓ | ✅ | ❓ | ❌ | ❌ | ❌ |
| Florence2 | A Variety of Vision Tasks | demo | ✅ | ❓ | ✅ | ✅ | ❌ | ❌ | ❌ |
| Moondream2 | Open-Set Object Detection Open-Set Keypoints Detection Image Captioning Visual Question Answering |
demo | ✅ | ❓ | ❌ | ❌ | ✅ | ✅ | ❌ |
| SmolVLM | Visual Question Answering | demo | ✅ | ❓ | ✅ | ❓ | ❓ | ❓ | ❓ |
| SmolVLM2 | Visual Question Answering | demo | ✅ | ❓ | ✅ | ❓ | ❓ | ❓ | ❓ |
| FastVLM | Vision Language Models | demo | ✅ | ❓ | ✅ | ✅ | ✅ | ✅ | ✅ |