#quantization

2 articles tagged with “quantization”

debuggingMay 27, 2026

Why your quantized LLM loses its MTP heads and how to keep them

Quantizing a model with multi-token prediction heads? Here's why standard conversion pipelines drop them silently, and how to preserve and calibrate them.

machinelearningllmpython

comparisonApril 18, 2026

Traditional Quantization vs 1.58-Bit Ternary Models: A Practical Comparison

Comparing traditional 4-bit/8-bit quantization (GPTQ, GGUF, AWQ) with 1.58-bit ternary models. Practical code examples and honest tradeoffs.

machinelearningllmquantization