Reinforcement Learning from Human Feedback

RLHF Artificial Intelligence

Definition

A training methodology that uses human preferences to guide the fine-tuning of AI models. RLHF trains a reward model from human comparisons of model outputs, then uses reinforcement learning to optimize the AI model against this reward signal.