
Why your quantized LLM loses its MTP heads and how to keep them
Quantizing a model with multi-token prediction heads? Here's why standard conversion pipelines drop them silently, and how to preserve and calibrate them.
Thoughts on authentication, developer tools, and building secure applications.

Quantizing a model with multi-token prediction heads? Here's why standard conversion pipelines drop them silently, and how to preserve and calibrate them.

Geo-restrictions look simple until you ship them. Here's how to build jurisdiction-based access controls that survive VPNs, mobile carriers, and CDN caching.

An honest comparison of Plausible, Fathom, and self-hosted Umami after migrating four production projects off Google Analytics 4.

How to detect when your servers have been compromised into attack infrastructure, with a step-by-step debugging walkthrough using ss, auditd, and nftables.

Autonomous coding agents love getting stuck in tool-use loops. Here's why it happens and four concrete fixes that stop the bleeding.

MySQL's 20-year-old view subquery restriction (Bug #11472) finally has a reported fix. Here's how to refactor views with CTEs and nested views today.

Multitrack audio playback in the browser drifts because <audio> elements don't share a clock. Here's how to use the Web Audio API to fix it.

LLM coding agents quietly drop constraints as tasks get longer. Here's why it happens and a concrete pattern for keeping back end code generation honest.

Why AI agents lose context across multi-step tool calls and a concrete scratchpad pattern to fix it, with code examples.

Native partial DOM updates are surprisingly hard. Here's why libraries like HTMX exist, what Chrome is reportedly exploring, and how to handle it cleanly today.

A practical comparison of Umami, Plausible, and Fathom for teams migrating off Google Analytics, with code examples and self-hosting notes.

Self-hosted ebook servers often break past 50k books. Here's why the database is usually the bottleneck and how to fix indexing, search, and metadata at scale.

Your ML inference isn't slow because of compute — it's memory-bound. Here's how to diagnose it with profilers and fix it with kernel fusion and quantization.