Tag: llm

All the articles with the tag "llm".

Give Your AI Agent a Cheap Intern

16 Jun, 2026

Stop burning expensive AI tokens on boring grunt work. The overseer/workhorse pattern routes mechanical tasks to a cheap model and saves more than you'd think.

Claude Code + SearXNG: Private Web Search

15 Jun, 2026

Wire a self-hosted SearXNG instance into Claude Code via a Bash wrapper for private, scriptable web search, and when to use it vs the built-in tool.

Dify: Visual Agent Workflows

9 Jun, 2026

Dify is an open-source LLM-app builder you can self-host. Visual workflow editor, RAG, agents, tool use, without writing 500 lines of LangChain glue.

CUDA vs ROCm vs CPU: Running AI on Whatever GPU You've Got

26 Aug, 2025 · Updated: 9 Jun, 2026

CUDA vs ROCm for AI on Linux: NVIDIA's easy path, AMD's emotional journey, and why CPU inference isn't dead yet. Real Docker setups included.

Exploring the Diverse World of LLM Models

24 Apr, 2024 · Updated: 9 Jun, 2026

LLaMA, Mistral, Falcon, GPT, the LLM landscape is crowded. Compare model families, sizes, licensing, and what each is actually good for.

Key Parameters of Large Language Models

15 Jul, 2024 · Updated: 9 Jun, 2026

Temperature, top-p, top-k, context length, LLM inference parameters explained so you stop guessing why the model gives weird output.

LangGraph vs CrewAI vs AutoGen: AI Agent Frameworks for Mere Mortals

22 Nov, 2025 · Updated: 9 Jun, 2026

Confused by AI agent frameworks? Compare LangGraph, CrewAI, and AutoGen with real Python examples, a no-nonsense breakdown, and zero hype. Pick the right one.

Large Language Model Formats and Quantization

29 Apr, 2024 · Updated: 9 Jun, 2026

GGUF, GGML, AWQ, GPTQ, LLM file formats and quantization levels explained: trade-offs between model quality, size, and inference speed.

LiteLLM & vLLM: One API to Rule All Your Models

25 Feb, 2026 · Updated: 9 Jun, 2026

LiteLLM proxies every LLM, local or cloud, behind one OpenAI-compatible endpoint. Pair it with vLLM for GPU-backed serving and ditch the SDK sprawl.

Local Vision LLMs Worth Running in 2026

5 Jun, 2026 · Updated: 9 Jun, 2026

Pixtral, Qwen3-VL, and Gemma 4 compared for local multimodal use in 2026. LLaVA is dead; here's what to run in Ollama for OCR, screenshots, and vision tasks.

Ollama Beyond the Basics: Model Management, Custom Models, and Optimization

26 Sep, 2025 · Updated: 9 Jun, 2026

Master Ollama with Modelfiles, GPU tuning, API usage, and performance tricks. Stop running 70B models on 8GB VRAM and wondering why everything is slow.

Ollama Memory Management: Why Models Keep Loading

22 Jan, 2026 · Updated: 9 Jun, 2026

Ollama keeps models in VRAM after every request. Control GPU usage with keep_alive, force-unload via the API, and check memory to stop the reload cycle.