Skip to main content

LLM Inference

Definition

LLM inference is the process where a trained Large Language Model generates new text or predictions based on input data. This involves applying the model’s learned patterns to new prompts. It is the operational phase following model training, where the model performs its intended function. Efficient inference is crucial for real-time applications and user interaction.