Softmax
A function that converts a vector of real numbers into a probability distribution summing to 1.
Full Definition
Softmax is the mathematical function applied to logits to produce token probabilities: it exponentiates each value, then divides by the sum of all exponentiated values, ensuring the outputs are positive and sum to 1. It is applied at two key points in transformers: (1) in the attention mechanism to normalise attention scores into weights, and (2) at the output layer to produce the next-token probability distribution over the vocabulary. The exponential operation amplifies differences, making the highest-scoring token even more dominant — a property that temperature modulates. Numerically stable implementations subtract the maximum logit before exponentiating to prevent overflow.
Examples
Logits [2.0, 1.0, 0.1] → after softmax: [0.659, 0.242, 0.099], assigning 66% probability to the top token.
Attention scores [8.3, 2.1, -1.5] normalised via softmax to [0.985, 0.013, 0.002], focusing 98.5% of attention on the first token.
Apply this in your prompts
PromptITIN automatically uses techniques like Softmax to build better prompts for you.