Skip to content
Go back

LLM Temperature and top_p Explained Without the Math

By SumGuy 5 min read
LLM Temperature and top_p Explained Without the Math

The Intuition (Skip the Formulas)

When an LLM generates text, it doesn’t just pick the most likely next word. At each step, it has a probability distribution over possible next words.

Most of the time, the top word is way more likely than the others. But sometimes, you want variety. That’s where temperature and top_p come in.

Temperature = “How boring or creative is the model?”

top_p = “How much of the probability distribution do I care about?”

Both control the same thing from different angles.

Temperature: The Knob Everyone Knows

Temperature ranges from 0 to ~2 (though values above 1 are rare).

Temperature = 0
├─ Always picks the most likely word (deterministic)
└─ Output: Repetitive, predictable, sometimes dull
Temperature = 0.7 (default for most APIs)
├─ Picks likely words, but allows some variation
└─ Output: Natural, conversational, slight randomness
Temperature = 1.2
├─ "Anything goes" — all words weighted roughly equally
└─ Output: Creative but sometimes nonsensical
Temperature = 2.0
├─ Chaos — essentially random
└─ Output: Gibberish

Practical Examples

Task: Code generation

# Temperature = 0
# Output: Always the same code pattern, might miss better solutions
# Temperature = 0.5
# Output: Same structure, slightly varied variable names

Task: Creative writing

# Temperature = 0
# Output: "The sun rose over the hills. It was a sunny day."
# (Boring, but coherent)
# Temperature = 0.9
# Output: "The sun erupted like liquid gold, shattering shadows."
# (More interesting, still coherent)
# Temperature = 1.5
# Output: "Sun purple-blazed quantum elephant mountains."
# (Creative but nonsensical)

Rule of thumb:

top_p: The Advanced Knob

top_p (nucleus sampling) filters the probability distribution differently.

Instead of tweaking how sharp/flat the distribution is, top_p says: “Include the most likely words until you’ve covered P% of the probability mass.”

Imagine the model predicts:
"the" — 30% likely
"a" — 25% likely
"some" — 20% likely
"an" — 15% likely
"one" — 5% likely
"that" — 3% likely
"something" — 2% likely
...
top_p = 0.9 (cover 90% of probability)
├─ Include: "the", "a", "some", "an" (90% total)
└─ Exclude: "one", "that", "something"...
top_p = 0.5 (cover 50% of probability)
├─ Include: "the", "a" (55% total)
└─ Exclude: "some", "an", etc.

Temperature vs. top_p: When to Use Each

Temperature:

top_p:

Common patterns:

Analytical (code, facts):
temperature = 0.2
top_p = 0.9
Conversational (chatbots):
temperature = 0.7
top_p = 0.95
Creative (brainstorming):
temperature = 0.95
top_p = 0.9

Interestingly, using both together works better than either alone. top_p keeps you from generating garbage, while temperature adds variation.

How to Set Them in Practice

Ollama:

Terminal window
curl http://localhost:11434/api/generate -d '{
"model": "mistral",
"prompt": "Explain quantum computing",
"temperature": 0.7,
"top_p": 0.9,
"stream": false
}'

Python (with Claude API):

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
temperature=0.8,
top_p=0.95,
messages=[
{"role": "user", "content": "Write a creative story about robots"}
]
)

The Testing Workflow

Don’t guess. Test with your actual use case.

#!/bin/bash
temps=(0.3 0.7 1.0)
ps=(0.8 0.9 0.95)
prompt="Write a short poem about code"
for temp in "${temps[@]}"; do
for p in "${ps[@]}"; do
echo "=== Temperature: $temp, top_p: $p ==="
curl -s http://localhost:11434/api/generate -d "{
\"model\": \"mistral\",
\"prompt\": \"$prompt\",
\"temperature\": $temp,
\"top_p\": $p,
\"stream\": false
}" | python3 -c "import sys, json; print(json.load(sys.stdin)['response'])"
echo ""
done
done

Run this, see which combo produces output you like, then hardcode those values.

The Gotcha: Interaction Effects

Temperature and top_p interact in non-obvious ways.

temperature = 0 (deterministic)
+ top_p = 0.5 (filter tail words)
= Same result as temperature = 0 alone (top_p doesn't matter)
temperature = 1.0 (uniform distribution)
+ top_p = 0.5 (cover 50% of words)
= About 50% of the probability is "available"

High temperature + low top_p = weird interaction. Test if you use both.

Real-World Advice

Don’t overthink it. Most people optimize for the wrong thing. Spend time on your prompt first, then tweak these dials if you need to.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it may appear here.


Previous Post
Cloudflare Workers: Edge Without the PhD
Next Post
TLS 1.3: Modern Encryption Without the Existential Dread

Related Posts