Definition
Temperature is a parameter that controls the randomness of a language model’s output during text generation. Lower temperatures (e.g., 0.1) make outputs more deterministic and focused, while higher temperatures (e.g., 1.0+) increase diversity and creativity but may reduce coherence. It works by scaling the probability distribution before sampling the next token.
Why it matters
Temperature is a critical control for AI application behavior:
- Precision vs. creativity — low for factual tasks, high for brainstorming
- Consistency — low temperatures produce reproducible outputs
- User experience — tuning temperature affects perceived intelligence
- Task optimization — different tasks need different temperature settings
- Safety — lower temperatures reduce unexpected or inappropriate outputs
The right temperature can transform model performance for specific use cases.
How it works
┌────────────────────────────────────────────────────────────┐
│ TEMPERATURE EFFECT │
├────────────────────────────────────────────────────────────┤
│ │
│ Original logits: [2.0, 1.0, 0.5, 0.2] │
│ (Before softmax) │
│ │
│ TEMPERATURE = 0.5 (More focused) │
│ ┌────────────────────────────────────────────────┐ │
│ │ Scaled: [4.0, 2.0, 1.0, 0.4] │ │
│ │ Probs: [78%, 15%, 5%, 2%] │ │
│ │ ████████████████████░░░░ │ │
│ │ Token A dominates │ │
│ └────────────────────────────────────────────────┘ │
│ │
│ TEMPERATURE = 1.0 (Balanced) │
│ ┌────────────────────────────────────────────────┐ │
│ │ Scaled: [2.0, 1.0, 0.5, 0.2] │ │
│ │ Probs: [47%, 27%, 17%, 9%] │ │
│ │ ████████████░░░░░░░░ │ │
│ │ More diversity │ │
│ └────────────────────────────────────────────────┘ │
│ │
│ TEMPERATURE = 2.0 (Creative/random) │
│ ┌────────────────────────────────────────────────┐ │
│ │ Scaled: [1.0, 0.5, 0.25, 0.1] │ │
│ │ Probs: [34%, 28%, 22%, 16%] │ │
│ │ ████████░░░░░░░░░░░░ │ │
│ │ Near-uniform distribution │ │
│ └────────────────────────────────────────────────┘ │
│ │
│ Formula: P(token) = softmax(logits / temperature) │
│ │
└────────────────────────────────────────────────────────────┘
Recommended settings by task:
| Task | Temperature | Rationale |
|---|---|---|
| Code generation | 0.0-0.3 | Precision required |
| Factual Q&A | 0.1-0.5 | Accuracy over creativity |
| Conversation | 0.7-0.9 | Natural variation |
| Creative writing | 0.9-1.2 | Encourage novelty |
Common questions
Q: What does temperature 0 mean?
A: Temperature 0 (or near-zero) makes the model always pick the highest-probability token—deterministic, greedy decoding. Useful when you need consistent, reproducible outputs.
Q: Can temperature be greater than 1?
A: Yes. Values above 1 flatten the probability distribution, making rare tokens more likely. This increases creativity but risks incoherence. Generally stay below 1.5 for usable output.
Q: How does temperature interact with top-p?
A: They’re complementary. Temperature adjusts the distribution shape; top-p/top-k then sample from it. Using both gives fine-grained control. Many APIs apply temperature first, then top-p filtering.
Q: What’s a good default temperature?
A: 0.7 is a common default—balanced between consistency and variation. Adjust based on your specific task requirements and test with real examples.
Related terms
- Top-p Sampling — nucleus sampling parameter
- Top-k Sampling — limits token candidates
- Inference — generation process where temperature applies
- LLM — models controlled by temperature
References
Holtzman et al. (2020), “The Curious Case of Neural Text Degeneration”, ICLR. [2,500+ citations]
Ackley et al. (1985), “A Learning Algorithm for Boltzmann Machines”, Cognitive Science. [5,000+ citations]
Fan et al. (2018), “Hierarchical Neural Story Generation”, ACL. [1,000+ citations]
Ficler & Goldberg (2017), “Controlling Linguistic Style Aspects in Neural Language Generation”, arXiv. [400+ citations]