Skip to main content
AI & Machine Learning

Temperature

A parameter controlling the randomness of language model outputs, affecting creativity versus consistency.

Also known as: Sampling temperature, Model temperature, Generation temperature

Definition

Temperature is a parameter that controls the randomness of a language model’s output during text generation. Lower temperatures (e.g., 0.1) make outputs more deterministic and focused, while higher temperatures (e.g., 1.0+) increase diversity and creativity but may reduce coherence. It works by scaling the probability distribution before sampling the next token.

Why it matters

Temperature is a critical control for AI application behavior:

  • Precision vs. creativity — low for factual tasks, high for brainstorming
  • Consistency — low temperatures produce reproducible outputs
  • User experience — tuning temperature affects perceived intelligence
  • Task optimization — different tasks need different temperature settings
  • Safety — lower temperatures reduce unexpected or inappropriate outputs

The right temperature can transform model performance for specific use cases.

How it works

┌────────────────────────────────────────────────────────────┐
│                   TEMPERATURE EFFECT                       │
├────────────────────────────────────────────────────────────┤
│                                                            │
│  Original logits: [2.0, 1.0, 0.5, 0.2]                    │
│  (Before softmax)                                          │
│                                                            │
│  TEMPERATURE = 0.5 (More focused)                          │
│  ┌────────────────────────────────────────────────┐        │
│  │  Scaled: [4.0, 2.0, 1.0, 0.4]                  │        │
│  │  Probs:  [78%, 15%, 5%, 2%]                    │        │
│  │  ████████████████████░░░░                      │        │
│  │  Token A dominates                             │        │
│  └────────────────────────────────────────────────┘        │
│                                                            │
│  TEMPERATURE = 1.0 (Balanced)                              │
│  ┌────────────────────────────────────────────────┐        │
│  │  Scaled: [2.0, 1.0, 0.5, 0.2]                  │        │
│  │  Probs:  [47%, 27%, 17%, 9%]                   │        │
│  │  ████████████░░░░░░░░                          │        │
│  │  More diversity                                │        │
│  └────────────────────────────────────────────────┘        │
│                                                            │
│  TEMPERATURE = 2.0 (Creative/random)                       │
│  ┌────────────────────────────────────────────────┐        │
│  │  Scaled: [1.0, 0.5, 0.25, 0.1]                 │        │
│  │  Probs:  [34%, 28%, 22%, 16%]                  │        │
│  │  ████████░░░░░░░░░░░░                          │        │
│  │  Near-uniform distribution                     │        │
│  └────────────────────────────────────────────────┘        │
│                                                            │
│  Formula: P(token) = softmax(logits / temperature)         │
│                                                            │
└────────────────────────────────────────────────────────────┘

Recommended settings by task:

TaskTemperatureRationale
Code generation0.0-0.3Precision required
Factual Q&A0.1-0.5Accuracy over creativity
Conversation0.7-0.9Natural variation
Creative writing0.9-1.2Encourage novelty

Common questions

Q: What does temperature 0 mean?

A: Temperature 0 (or near-zero) makes the model always pick the highest-probability token—deterministic, greedy decoding. Useful when you need consistent, reproducible outputs.

Q: Can temperature be greater than 1?

A: Yes. Values above 1 flatten the probability distribution, making rare tokens more likely. This increases creativity but risks incoherence. Generally stay below 1.5 for usable output.

Q: How does temperature interact with top-p?

A: They’re complementary. Temperature adjusts the distribution shape; top-p/top-k then sample from it. Using both gives fine-grained control. Many APIs apply temperature first, then top-p filtering.

Q: What’s a good default temperature?

A: 0.7 is a common default—balanced between consistency and variation. Adjust based on your specific task requirements and test with real examples.


References

Holtzman et al. (2020), “The Curious Case of Neural Text Degeneration”, ICLR. [2,500+ citations]

Ackley et al. (1985), “A Learning Algorithm for Boltzmann Machines”, Cognitive Science. [5,000+ citations]

Fan et al. (2018), “Hierarchical Neural Story Generation”, ACL. [1,000+ citations]

Ficler & Goldberg (2017), “Controlling Linguistic Style Aspects in Neural Language Generation”, arXiv. [400+ citations]