*Completely* depends on your laptop hardware, but generally:
Technology
1
Beiträge
1
Kommentatoren
0
Aufrufe
-
Completely depends on your laptop hardware, but generally:
- TabbyAPI (exllamav2/exllamav3)
- ik_llama.cpp, and its openai server
- kobold.cpp (or kobold.cpp rocm, or croco.cpp, depends)
- An MLX host with one of the new distillation quantizations
- Text-gen-web-ui (slow, but supports a lot of samplers and some exotic quantizations)
- SGLang (extremely fast for parallel calls if thats what you want).
- Aphrodite Engine (lots of samplers, and fast at the expense of some VRAM usage).
I use text-gen-web-ui at the moment only because TabbyAPI is a little broken with exllamav3 (which is utterly awesome for Qwen3), otherwise I'd almost always stick to TabbyAPI.
Tell me (vaguely) what your system has, and I can be more specific.