Pinterest: Pinterest uses a fine-tuned AI language model to automatically judge whether search results are relevant to what a user was looking for — a task that was previously done by human raters. The AI can evaluate 150,000 search result pairs in 30 minutes, which would take a team of human reviewers far longer and cost significantly more. This is an internal tool that improves the quality of search results users see. | AI Trace
Data AnalysisInternal OnlyVerified
Pinterest uses a fine-tuned AI language model to automatically judge whether search results are relevant to what a user was looking for — a task that was previously done by human raters. The AI can evaluate 150,000 search result pairs in 30 minutes, which would take a team of human reviewers far longer and cost significantly more. This is an internal tool that improves the quality of search results users see.
Details
The system fine-tunes a multilingual open-source language model (XLM-RoBERTa-large) on a dataset of human-annotated relevance judgments, then uses it to score new search results at scale without additional human labor. In testing, the model agreed with human raters 73.7% of the time on exact scores and 91.7% of the time within one point on a scoring scale. Running on a single GPU, it processes large batches in minutes rather than days. The primary use case is accelerating A/B experiment evaluation: Pinterest can quickly determine whether a search algorithm change improved relevance without waiting for slow manual labeling cycles.