A NER model plus classifiers trained on synthetic data from Claude 3 Sonnet — so long-tail searches finally hit the right product page.
Client profile
An online platform connecting homeowners with home-improvement professionals, with a product marketplace, 3D visualization tools, and project-management software
Industry
Other, Home Improvement
Region
EMEA, North America
Improvement in search accuracy
From idea to working prototype
Houzz connects homeowners with home-improvement professionals — architects, designers, contractors — and runs a product marketplace alongside the directory. Millions of product pages. Millions of long-tail queries every day. The search engine is the platform.
01 The ChallengeHouzz’s existing search ran on an NLP model that handled direct-match queries well and long-tail queries badly. Less than 40% of long-tail queries were processed accurately. When a user with a specific intent typed it in full, the engine frequently routed them to the wrong page — or to a page not optimized for conversion.
In a marketplace, missed queries don’t just lose a sale. They train users that search doesn’t work.
02 The ApproachTwo constraints shaped the approach:
Houzz wanted to avoid the cost and time of collecting and labeling a ground-truth dataset. Provectus proposed synthetic query generation instead — train on generated data that emulates real user queries across categories and attributes.
Inference had to be sub-second. Out-of-the-box LLMs couldn’t hit that bar. The team routed around the constraint: convert queries into embeddings first, then run classification on top.
03 The BuildAmazon Titan Text Embeddings (on Amazon Bedrock) generate query embeddings. Simple classifiers trained on those embeddings identify product categories and attributes. A NER model built on Flair classifies residual parts of the query — the pieces that aren’t categories or attributes — which is where most long-tail signal lives.
Training data came from Anthropic’s Claude 3 Sonnet. Synthetic queries span the full combinatorial space of categories and attributes across Houzz’s product taxonomy. The NER model, the category classifiers, and the attribute classifiers all train on this synthetic corpus. Model selection and tuning ran through Weights & Biases.
Houzz received two things: a bundle of trained ML models for semantic search understanding, and an infrastructure for generating synthetic data for new categories and attributes. Adding a new product line is a retraining, not a rebuild.
04 The Results+50%
Improvement in the engine’s ability to correctly understand customer queries
Category and attribute identification accuracy rose from 52.94% to 78% — a ~50% relative lift. Recall rose from 66.98% to 85%, so users are less likely to miss a relevant product. Precision held at 79% — the recall lift did not come at the cost of result relevance. Latency stayed inside the sub-second budget.
For the marketplace this means customers find what they searched for and stay on-platform longer. For merchants it means more qualified traffic. For Houzz it means search works for the queries users actually type, not just the clean ones.
05 What’s NextThe synthetic-data infrastructure is the extension lever. New categories, new attributes, new product types get generated data and a retrained model. Houzz continues the engagement on the extension path.