Reinforcement Learning from Human Feedback
R
Reinforcement Learning from Human Feedback
Definition
A training methodology that uses human preferences to guide the fine-tuning of AI models. RLHF trains a reward model from human comparisons of model outputs, then uses reinforcement learning to optimize the AI model against this reward signal.