Benchmark
B
Benchmark
Definition
A standardized test or dataset used to evaluate and compare the performance of AI models on specific tasks. Common AI benchmarks include MMLU for knowledge, HumanEval for code generation, and ImageNet for image classification.