Field Notes / NLP & Relevance
The user typed "running shoes." Your catalog said "jogging sneakers."
Zero results. The user bounces. Your conversion rate quietly takes another paper cut. Multiply that across thousands of queries a day and you have one of the most expensive — and least visible — failures in modern digital experience: a search engine that takes your visitors too literally.
For years, this was the cost of doing business with traditional search. We patched it with synonym lists, manual boost rules, and exhausted content teams typing in every plausible variation a customer might ever use. It worked, sort of, until customers started talking to search bars like they talk to humans.
In early 2025, Sitecore Search shipped its answer to that problem: Semantic Search — an AI-powered ranking layer that reads intent, not just tokens. This article walks through what changed, how it works in production, and the exact API and SDK pieces you'll touch when you implement it.
How Sitecore Search worked before semantic.
To appreciate what semantic search adds, it helps to be honest about what came before. Sitecore Search has always been a fast, capable retrieval engine — but its default ranking model, like most enterprise search products, is built on classical information retrieval principles: tokenize the content, build an inverted index, score candidates with a BM25-family algorithm, return them in order of textual relevance.
The relevance of any given result therefore depends almost entirely on three things: how the content was tokenized at index time, how the query is tokenized at search time, and whether the two share enough surface-level vocabulary to match. This is where analyzers earn their keep. Sitecore Search ships with a respectable lineup — rfk_standard_multi_locale for general multilingual content, rfk_keyword for exact phrase matches, rfk_ngram_analyzer for compound words and autocomplete forgiveness, and several others — each making different tradeoffs between recall and precision.
Where this model strains is at the level of meaning. If a user types "affordable laptop for college" and your content uses the phrase "budget notebook for students," classical scoring has no way to know those phrases are conceptually identical. The historical workaround was the synonym dictionary: a manually curated map of hyponyms and taxonyms that taught the engine "feline → lion, tiger" and "running shoes → jogging sneakers." Useful, but it doesn't scale. Every new product line, every shift in customer vocabulary, every regional turn of phrase is a new ticket for someone to maintain.
What changed in 2025 — the semantic leap.
In August 2025, Sitecore moved Semantic Search into Early Access for customers with a Sitecore Stream license. The pitch is straightforward: an AI-powered ranking layer that interprets the intent and context behind a query rather than relying on token overlap, so users get more natural, more forgiving search experiences without you maintaining mountains of synonym data.
Here is the part that matters most for architects, and the one that's easy to get wrong if you assume "semantic" automatically means "vector database":
As of late 2025, Sitecore Search applies semantic ranking on top of results retrieved via keyword matching. It does not perform a traditional full-vector search at the indexing layer. The keyword retriever produces a candidate set; the semantic re-ranker reorders that set based on conceptual similarity to the query.
This hybrid design is a deliberate choice, and a smart one. Pure vector search at index time is computationally heavy, requires careful embedding model management, and can struggle with exact-match needs (think SKUs, part numbers, model codes — the things keyword search has always been brilliant at). By keeping keyword retrieval in front and applying AI re-ranking behind it, Sitecore preserves performance, preserves precision on exact terms, and adds intent-awareness exactly where it helps most: at the relevance ordering stage.
One implication worth internalizing: because semantic search runs after keyword retrieval, you still need solid analyzer configuration and reasonable content quality. Semantic ranking can re-order what the keyword stage finds — it cannot conjure a result that was never retrieved in the first place.
Keyword vs. Semantic — when each one wins.
Neither approach is universally better; they're optimized for different jobs. The table below is the mental model worth keeping at hand when you're scoping a new search experience.
| Dimension | Keyword Search (default) | Semantic Search (+semsearch) |
|---|---|---|
| Matching basis | Token overlap, BM25 scoring | Conceptual / intent similarity |
| Synonym handling | Manual dictionaries only | Learned from language models |
| "running shoes" → "jogging sneakers" | No (without manual synonyms) | Yes |
| Exact-match queries (SKU, part #) | Excellent | Still works (retrieval is keyword-based) |
| Performance | Very fast | Fast (re-rank, not full vector) |
| Best for | Codes, IDs, exact phrases | Natural language, descriptive intent |
| Setup overhead | Default behaviour | Domain enablement + flag per widget |
The most important takeaway: this isn't a replace-everything decision. Most production sites end up with a mix — semantic on free-text search, keyword on SKU lookups, exact-answer on FAQ-style widgets. Choose per widget, not per site.
Enabling semantic search — the +semsearch flag in practice.
Activating semantic search on a given widget is, in API terms, a one-line change. Activating it for your tenant is a one-email change.
Prerequisite
Before you can use +semsearch, your domain has to be enabled for semantic search by Sitecore. Open a ticket with your implementation specialist or account team and request enablement. This is a deliberate gate — it means the semantic models are warmed up against your domain before you start sending traffic.
The request shape
You enable semantic ranking by adding +semsearch to the rfk_flags array on a widget request. Everything else about the request stays the same; the rest of your filters, sorting, content fields, and context payload remain valid.
POST https://discover.sitecorecloud.io/discover/v2/<DOMAIN_ID>
authorization: <API_KEY>
content-type: application/json
{
"widget": {
"items": [{
"rfk_id": "rfkid_7",
"entity": "content",
"rfk_flags": ["+semsearch"], // ← enables semantic ranking
"search": {
"content": { "fields": ["description", "tags"] },
"query": { "keyphrase": "identify customer needs" }
}
}]
},
"context": {
"locale": { "country": "us", "language": "en" }
}
}
That's the entire opt-in. Send the same query without the flag and you'll get classical BM25 results. Send it with the flag and the candidate set is reordered based on conceptual similarity to "identify customer needs" — which means content that talks about customer discovery, understanding what users want, or uncovering buyer requirements can now rank above content that just happens to repeat the literal phrase.
Treat +semsearch as a widget-level decision. A free-text search bar benefits enormously from it. A part-number lookup or an autocomplete suggesting SKUs almost certainly does not. You can run semantic and keyword widgets side-by-side on the same page without conflict.
The developer's toolkit beyond +semsearch.
Semantic ranking is the headline feature, but the surrounding API surface is what makes a real production search experience feel polished. Here are the pieces of the Content SDK and Search API that pair most naturally with semantic search.
6.1 The SearchService class with TypeScript generics
The Content SDK's SearchService is the modern way to query Sitecore Search from a Node or Next.js backend. Its quietly excellent feature is generics support — you give it the shape of your document and your IDE gives you autocomplete on every result property, including nested ones. This eliminates a whole category of "what was that field called again?" bugs.
import { SearchService } from '@sitecore-content-sdk/search';
type Article = {
id: string;
title: string;
description: string;
publishDate: string;
author: { name: string; email: string; };
tags: string[];
};
const searchService = new SearchService({
contextId: 'SITECORE_EDGE_CONTEXT_ID',
});
// TypeScript now knows the shape of every result
const response = await searchService.search<Article>({
searchIndexId: '1234567890',
keyphrase: 'identify customer needs',
});
response.results.forEach((article) => {
console.log(article.title); // ✓ typed
console.log(article.author.name); // ✓ typed (nested)
console.log(article.tags.join(', ')); // ✓ typed (array)
});
6.2 exact_answer for direct, question-style results
When users phrase queries as questions — "what is your return policy?", "how do I reset my password?" — they don't really want a ranked list, they want the answer. The exact_answer query type is built for exactly this case, returning a direct answer block alongside (or instead of) traditional results. It pairs nicely with semantic search because the system can match the question's intent to a content snippet that doesn't repeat the question verbatim.
6.3 Personalization with the mlt algorithm
Sitecore Search's mlt ("more like this") algorithm tailors results based on behavioral signals: a visitor's UUID, their click history, brand and color affinities, whatever you've configured to track. Layered on top of semantic ranking, this becomes powerful — semantic finds the conceptually relevant set, and personalization promotes the items most likely to convert this specific user. The order matters mentally: relevance first, personalization second.
6.4 swatch grouping for product variants
For e-commerce teams, the swatch attribute field grouping is the difference between showing the same shoe seven times in seven colors and showing it once with a colour-picker. It aggregates variants under a single representative product card, which is what users actually expect from a modern product grid.
6.5 Dot notation for nested sorting
Real content models are rarely flat. The Search API supports dot notation in sort expressions, so you can sort on author.name, customer.email, or any other nested property without flattening your document shape. Type-safe when you're using generics with SearchService.
const response = await searchService.search<Order>({
searchIndexId: '1234567890',
sort: {
name: 'customer.name', // dot notation, type-checked
order: 'asc',
},
});
React 19, Content SDK, and what XM Cloud teams should know.
The most consequential frontend update of late 2025 is quiet but important: in September 2025, the Sitecore Search JS SDK for React shipped support for React 19. That single release aligns the Search SDK with the rest of the modern Sitecore stack — Content SDK 1.0, JSS 22.7.0+, Next.js 15, and Node 22 — meaning XM Cloud projects no longer have to choose between modern framework versions and modern search components.
If you're standing up a new Next.js head on XM Cloud today, the integration looks roughly like this:
import { SearchProvider, useSearch } from '@sitecore-search/react';
function App({ children }) {
return (
<SearchProvider
endpoint="https://<your-tenant>.sitecorecloud.net"
apiKey={process.env.SEARCH_API_KEY}
tenantId={process.env.SEARCH_TENANT_ID}
>
{children}
</SearchProvider>
);
}
function SearchResults({ keyphrase }) {
const { results, loading } = useSearch({
query: keyphrase,
attributes: ['title', 'description', 'tags'],
flags: ['+semsearch'], // semantic ranking, per-widget
});
if (loading) return <p>Searching…</p>;
return results.map((r) => <article key={r.id}>{r.title}</article>);
}
The compatibility matrix worth pinning to your project's README:
| SDK / Framework | Minimum Version | Notes |
|---|---|---|
| Sitecore Search JS SDK for React | v3.0.0 | Required for React 19 support |
| React | 19 | Earlier versions need pre-3.0 SDK |
| Next.js | 15 | Aligns with Content SDK 1.0 |
| Node.js | 22 | Compatible with Content SDK 1.0 |
| Sitecore JSS | 22.7.0 | Earlier JSS will not pair cleanly |
| Sitecore Content SDK | v1.0.0 | Brings SearchService, generics |
A realistic implementation strategy.
Knowing the API is half the job. Knowing when not to flip the flag is the other half. The decision flow below is the one I'd put in front of any team about to roll out semantic search.
Anti-patterns to avoid
- Don't enable
+semsearchglobally without testing. Some widgets — autocomplete, exact-product lookup — get worse, not better. - Don't expect semantic ranking to fix bad content. Semantic re-ranks the candidate set; if the keyword retrieval misses, the re-ranker has nothing to reorder. Garbage in, garbage out still applies.
- Don't skip analyzer tuning. Because retrieval is keyword-based, your analyzer choices still drive what makes it into the candidate set. Pick
rfk_standard_multi_localefor most fields,rfk_keywordfor IDs, and don't sleep on this layer. - Don't measure success with vibes. A/B test specific widgets with click-through and conversion metrics before declaring victory.
A four-step rollout that won't blow up
- Audit. Inventory every search widget on the site. For each, classify it: exact-match, descriptive, or Q&A.
- Pilot. Pick one descriptive widget — typically the main site search — and request domain enablement. Add
+semsearchonly there. - A/B test. Run it against the keyword-only baseline. Measure CTR on result position 1–3 and downstream conversion.
- Expand. Roll out widget by widget based on evidence, not assumption. Keep your exact-match widgets on classical retrieval.
Where this is heading next.
Semantic ranking is one piece of a larger arc. Sitecore's broader direction — visible in the Stream license, the SitecoreAI product surface, and the Content SDK consolidation — points toward search becoming less of a retrieval engine and more of a conversational layer over your content. Generative answers, multimodal queries (image and voice), and predictive search are all on the same trajectory.
Honest assessment of where things still aren't: full vector search at index time isn't here yet, and for content sets where conceptual recall — not just ranking — matters most, that's a real limitation. Multimodal capabilities are early. And semantic search's effectiveness still depends heavily on having decent content with reasonable descriptive fields; it doesn't rescue thin product copy.
The right way to think about today's semantic search is as the first step of a multi-year shift, not the destination. Implement it cleanly, measure it honestly, and you'll be well-positioned for whatever ships next.
The summary worth bookmarking.
Seven things to remember
- Sitecore Semantic Search is a re-ranking layer, not a vector index. Keyword retrieval still happens first.
- Enable it per widget by adding
+semsearchto therfk_flagsarray — and ask Sitecore to enable your domain first. - Keyword search remains the right tool for SKUs, IDs, and exact phrases. Don't replace it; complement it.
- The Content SDK's
SearchServicewith TypeScript generics is the modern way to query — type-safe, with dot notation for nested sorts. - Pair semantic ranking with
exact_answerfor question-style queries andmltfor personalized ordering. - Search React SDK v3.0.0 (Sept 2025) brings React 19 support, aligning with Content SDK 1.0, JSS 22.7.0+, and Next.js 15.
- Roll out: audit widgets, pilot one, A/B test, expand. Don't flip the flag globally on day one.
Further reading.
- Official Docs Using Semantic Search — Sitecore Documentation
- Changelog · Aug 2025 Semantic Search — Early Access announcement
- Changelog · Sep 2025 Search React SDK — React 19 support
- Official Docs Search API — SearchService class reference
- Official Docs Search JS SDK for React — overview & setup
- Official Docs Optimizing keyword searches with synonyms