Topic

AI & LLMs

Models you can run on your own hardware, prompt patterns that ship, agent frameworks that don't catch fire, and the awkward questions nobody answers in the breathless launch posts. Ollama, vLLM, llama.cpp, LocalAI, plus the quieter stuff — embeddings, RAG, evals, and figuring out when the cloud API is actually the right answer. If you'd rather understand the trade-offs than chase benchmarks, you'll feel at home here.

78 articles in this topic.

Featured posts

RAGAS: Evaluating RAG Without Vibes

Stop guessing if your RAG pipeline works. RAGAS gives you reproducible metrics: faithfulness, answer relevance, context precision and recall.

28 Jul, 2026
9 min read
KV Cache Quantization: Free LLM Context, Almost

KV cache eats your VRAM at long context, not the weights. Q8/Q4 KV quantization in llama.cpp and vLLM cuts it 2-4x with almost no quality hit.

25 Jul, 2026
9 min read
Aider & Cline: Terminal AI Coding That Actually Ships

Aider vs Cline: two agentic AI coding tools that go beyond autocomplete. Which terminal AI coding agent ships cleaner work?

22 Jul, 2026
9 min read
Mixture of Experts (MoE) for Self-Hosters, Demystified

MoE LLMs like Mixtral and DeepSeek-V3 run 70B-class quality on 7B-ish active params. Here's how sparse activation works and how to run it at home.

19 Jul, 2026
10 min read
Python Libraries Worth Your Time in 2026

The Python libraries actually worth your time in 2026: data tools, ML frameworks, and CLI kit, plus which ones earn a permanent spot in your home lab.

19 Jul, 2026
13 min read
Speculative Decoding: Faster LLMs With a Tiny Sidekick

Speculative decoding, Gemma 4 MTP, and DeepSeek DSpark all make LLMs 2-6x faster losslessly. How each works, and which to use for local vs. serving.

15 Jul, 2026
13 min read

Featured posts

RAGAS: Evaluating RAG Without Vibes

KV Cache Quantization: Free LLM Context, Almost

Aider & Cline: Terminal AI Coding That Actually Ships

Mixture of Experts (MoE) for Self-Hosters, Demystified

Python Libraries Worth Your Time in 2026

Speculative Decoding: Faster LLMs With a Tiny Sidekick

All AI & LLMs articles