Mastering Semantic Search, beyond the keyword.

A practical guide for developers and search architects on Sitecore Search's intent-based ranking — what it is, how it actually works under the hood, and how to ship it on XM Cloud without surprises.

· 12 min read · Sitecore , Sitecore AI , Sitecore Search

Field Notes / NLP & Relevance

01 / The Hook

The user typed "running shoes."  Your catalog said "jogging sneakers."

Zero results. The user bounces. Your conversion rate quietly takes another paper cut. Multiply that across thousands of queries a day and you have one of the most expensive — and least visible — failures in modern digital experience: a search engine that takes your visitors too literally.

For years, this was the cost of doing business with traditional search. We patched it with synonym lists, manual boost rules, and exhausted content teams typing in every plausible variation a customer might ever use. It worked, sort of, until customers started talking to search bars like they talk to humans.

In early 2025, Sitecore Search shipped its answer to that problem: Semantic Search — an AI-powered ranking layer that reads intent, not just tokens. This article walks through what changed, how it works in production, and the exact API and SDK pieces you'll touch when you implement it.

02 / The Before Times

How Sitecore Search worked before semantic.

To appreciate what semantic search adds, it helps to be honest about what came before. Sitecore Search has always been a fast, capable retrieval engine — but its default ranking model, like most enterprise search products, is built on classical information retrieval principles: tokenize the content, build an inverted index, score candidates with a BM25-family algorithm, return them in order of textual relevance.

The relevance of any given result therefore depends almost entirely on three things: how the content was tokenized at index time, how the query is tokenized at search time, and whether the two share enough surface-level vocabulary to match. This is where analyzers earn their keep. Sitecore Search ships with a respectable lineup — rfk_standard_multi_locale for general multilingual content, rfk_keyword for exact phrase matches, rfk_ngram_analyzer for compound words and autocomplete forgiveness, and several others — each making different tradeoffs between recall and precision.

Where this model strains is at the level of meaning. If a user types "affordable laptop for college" and your content uses the phrase "budget notebook for students," classical scoring has no way to know those phrases are conceptually identical. The historical workaround was the synonym dictionary: a manually curated map of hyponyms and taxonyms that taught the engine "feline → lion, tiger" and "running shoes → jogging sneakers." Useful, but it doesn't scale. Every new product line, every shift in customer vocabulary, every regional turn of phrase is a new ticket for someone to maintain.

Query "running shoes" Tokenizer [run, shoe] Analyzer stem lower syn Inverted Index term → doc IDs BM25 Ranking term frequency EVERYTHING DEPENDS ON SURFACE-LEVEL TOKEN OVERLAP "jogging sneakers" never matches "running shoes" without manual synonyms The Classical Pipeline FIG. 01 — KEYWORD-ONLY RETRIEVAL
Fig 01.  Pre-2025 default Sitecore Search pipeline. BM25 ranks candidates strictly on token statistics from the inverted index.
03 / The Shift

What changed in 2025 — the semantic leap.

In August 2025, Sitecore moved Semantic Search into Early Access for customers with a Sitecore Stream license. The pitch is straightforward: an AI-powered ranking layer that interprets the intent and context behind a query rather than relying on token overlap, so users get more natural, more forgiving search experiences without you maintaining mountains of synonym data.

Here is the part that matters most for architects, and the one that's easy to get wrong if you assume "semantic" automatically means "vector database":

Important — How Semantic Search Actually Works

As of late 2025, Sitecore Search applies semantic ranking on top of results retrieved via keyword matching. It does not perform a traditional full-vector search at the indexing layer. The keyword retriever produces a candidate set; the semantic re-ranker reorders that set based on conceptual similarity to the query.

This hybrid design is a deliberate choice, and a smart one. Pure vector search at index time is computationally heavy, requires careful embedding model management, and can struggle with exact-match needs (think SKUs, part numbers, model codes — the things keyword search has always been brilliant at). By keeping keyword retrieval in front and applying AI re-ranking behind it, Sitecore preserves performance, preserves precision on exact terms, and adds intent-awareness exactly where it helps most: at the relevance ordering stage.

The Hybrid Pipeline FIG. 02 — KEYWORD RETRIEVAL + SEMANTIC RE-RANKING STAGE 1 · RETRIEVAL (UNCHANGED) Query +semsearch Keyword retrieval (BM25) Candidate Set N matching docs STAGE 2 · SEMANTIC RE-RANKING (NEW) AI Semantic Re-Ranker NLP · intent · context Final Results reordered by intent Performance of keyword search · Intent-awareness of NLP · No re-indexing required.
Fig 02.  Sitecore's hybrid model. Retrieval stays keyword-based for speed and exact-match fidelity; the AI re-ranker takes over only at the ordering stage.

One implication worth internalizing: because semantic search runs after keyword retrieval, you still need solid analyzer configuration and reasonable content quality. Semantic ranking can re-order what the keyword stage finds — it cannot conjure a result that was never retrieved in the first place.

04 / Side by Side

Keyword vs. Semantic — when each one wins.

Neither approach is universally better; they're optimized for different jobs. The table below is the mental model worth keeping at hand when you're scoping a new search experience.

Dimension Keyword Search (default) Semantic Search (+semsearch)
Matching basis Token overlap, BM25 scoring Conceptual / intent similarity
Synonym handling Manual dictionaries only Learned from language models
"running shoes" → "jogging sneakers" No (without manual synonyms) Yes
Exact-match queries (SKU, part #) Excellent Still works (retrieval is keyword-based)
Performance Very fast Fast (re-rank, not full vector)
Best for Codes, IDs, exact phrases Natural language, descriptive intent
Setup overhead Default behaviour Domain enablement + flag per widget

The most important takeaway: this isn't a replace-everything decision. Most production sites end up with a mix — semantic on free-text search, keyword on SKU lookups, exact-answer on FAQ-style widgets. Choose per widget, not per site.

05 / Implementation

Enabling semantic search — the +semsearch flag in practice.

Activating semantic search on a given widget is, in API terms, a one-line change. Activating it for your tenant is a one-email change.

Prerequisite

Before you can use +semsearch, your domain has to be enabled for semantic search by Sitecore. Open a ticket with your implementation specialist or account team and request enablement. This is a deliberate gate — it means the semantic models are warmed up against your domain before you start sending traffic.

The request shape

You enable semantic ranking by adding +semsearch to the rfk_flags array on a widget request. Everything else about the request stays the same; the rest of your filters, sorting, content fields, and context payload remain valid.

POST https://discover.sitecorecloud.io/discover/v2/<DOMAIN_ID>
authorization: <API_KEY>
content-type: application/json

{
  "widget": {
    "items": [{
      "rfk_id": "rfkid_7",
      "entity": "content",
      "rfk_flags": ["+semsearch"],          // ← enables semantic ranking
      "search": {
        "content": { "fields": ["description", "tags"] },
        "query":   { "keyphrase": "identify customer needs" }
      }
    }]
  },
  "context": {
    "locale": { "country": "us", "language": "en" }
  }
}

That's the entire opt-in. Send the same query without the flag and you'll get classical BM25 results. Send it with the flag and the candidate set is reordered based on conceptual similarity to "identify customer needs" — which means content that talks about customer discovery, understanding what users want, or uncovering buyer requirements can now rank above content that just happens to repeat the literal phrase.

Pro Tip — Per-widget, not per-site

Treat +semsearch as a widget-level decision. A free-text search bar benefits enormously from it. A part-number lookup or an autocomplete suggesting SKUs almost certainly does not. You can run semantic and keyword widgets side-by-side on the same page without conflict.

06 / The Toolkit

The developer's toolkit beyond +semsearch.

Semantic ranking is the headline feature, but the surrounding API surface is what makes a real production search experience feel polished. Here are the pieces of the Content SDK and Search API that pair most naturally with semantic search.

6.1   The SearchService class with TypeScript generics

The Content SDK's SearchService is the modern way to query Sitecore Search from a Node or Next.js backend. Its quietly excellent feature is generics support — you give it the shape of your document and your IDE gives you autocomplete on every result property, including nested ones. This eliminates a whole category of "what was that field called again?" bugs.

import { SearchService } from '@sitecore-content-sdk/search';

type Article = {
  id: string;
  title: string;
  description: string;
  publishDate: string;
  author: { name: string; email: string; };
  tags: string[];
};

const searchService = new SearchService({
  contextId: 'SITECORE_EDGE_CONTEXT_ID',
});

// TypeScript now knows the shape of every result
const response = await searchService.search<Article>({
  searchIndexId: '1234567890',
  keyphrase: 'identify customer needs',
});

response.results.forEach((article) => {
  console.log(article.title);          // ✓ typed
  console.log(article.author.name);    // ✓ typed (nested)
  console.log(article.tags.join(', ')); // ✓ typed (array)
});

6.2   exact_answer for direct, question-style results

When users phrase queries as questions — "what is your return policy?", "how do I reset my password?" — they don't really want a ranked list, they want the answer. The exact_answer query type is built for exactly this case, returning a direct answer block alongside (or instead of) traditional results. It pairs nicely with semantic search because the system can match the question's intent to a content snippet that doesn't repeat the question verbatim.

6.3   Personalization with the mlt algorithm

Sitecore Search's mlt ("more like this") algorithm tailors results based on behavioral signals: a visitor's UUID, their click history, brand and color affinities, whatever you've configured to track. Layered on top of semantic ranking, this becomes powerful — semantic finds the conceptually relevant set, and personalization promotes the items most likely to convert this specific user. The order matters mentally: relevance first, personalization second.

6.4   swatch grouping for product variants

For e-commerce teams, the swatch attribute field grouping is the difference between showing the same shoe seven times in seven colors and showing it once with a colour-picker. It aggregates variants under a single representative product card, which is what users actually expect from a modern product grid.

6.5   Dot notation for nested sorting

Real content models are rarely flat. The Search API supports dot notation in sort expressions, so you can sort on author.name, customer.email, or any other nested property without flattening your document shape. Type-safe when you're using generics with SearchService.

const response = await searchService.search<Order>({
  searchIndexId: '1234567890',
  sort: {
    name: 'customer.name',    // dot notation, type-checked
    order: 'asc',
  },
});
The Layered Mental Model FIG. 03 — WHERE EACH FEATURE LIVES IN THE STACK 04 · Presentation swatch grouping · facets · UI hooks (useSearch, useInfiniteSearch) CLIENT 03 · Personalization mlt algorithm · behavioral UUIDs · brand & affinity signals PER USER 02 · Ranking +semsearch flag, semantic re-rank AI / NLP 01 · Retrieval analyzers · inverted index · BM25 · keyphrase matching · sort (dot notation) FOUNDATION
Fig 03.  Each Sitecore Search feature lives at a specific layer. Get the foundation wrong and the layers above can't compensate.
07 / The Frontend

React 19, Content SDK, and what XM Cloud teams should know.

The most consequential frontend update of late 2025 is quiet but important: in September 2025, the Sitecore Search JS SDK for React shipped support for React 19. That single release aligns the Search SDK with the rest of the modern Sitecore stack — Content SDK 1.0, JSS 22.7.0+, Next.js 15, and Node 22 — meaning XM Cloud projects no longer have to choose between modern framework versions and modern search components.

If you're standing up a new Next.js head on XM Cloud today, the integration looks roughly like this:

import { SearchProvider, useSearch } from '@sitecore-search/react';

function App({ children }) {
  return (
    <SearchProvider
      endpoint="https://<your-tenant>.sitecorecloud.net"
      apiKey={process.env.SEARCH_API_KEY}
      tenantId={process.env.SEARCH_TENANT_ID}
    >
      {children}
    </SearchProvider>
  );
}

function SearchResults({ keyphrase }) {
  const { results, loading } = useSearch({
    query: keyphrase,
    attributes: ['title', 'description', 'tags'],
    flags: ['+semsearch'],   // semantic ranking, per-widget
  });

  if (loading) return <p>Searching…</p>;
  return results.map((r) => <article key={r.id}>{r.title}</article>);
}

The compatibility matrix worth pinning to your project's README:

SDK / FrameworkMinimum VersionNotes
Sitecore Search JS SDK for Reactv3.0.0Required for React 19 support
React19Earlier versions need pre-3.0 SDK
Next.js15Aligns with Content SDK 1.0
Node.js22Compatible with Content SDK 1.0
Sitecore JSS22.7.0Earlier JSS will not pair cleanly
Sitecore Content SDKv1.0.0Brings SearchService, generics
08 / Strategy

A realistic implementation strategy.

Knowing the API is half the job. Knowing when not to flip the flag is the other half. The decision flow below is the one I'd put in front of any team about to roll out semantic search.

Decision Flow — Which mode for which widget? FIG. 04 — RANKING MODE SELECTOR New search widget Is it exact-match (SKU, code, ID)? YES Keyword only no +semsearch NO Is the user asking a direct question? YES exact_answer + optional semsearch NO +semsearch on widget free-text descriptive intent
Fig 04.  A widget-by-widget decision flow. Most production sites end up with all three modes coexisting.

Anti-patterns to avoid

  • Don't enable +semsearch globally without testing. Some widgets — autocomplete, exact-product lookup — get worse, not better.
  • Don't expect semantic ranking to fix bad content. Semantic re-ranks the candidate set; if the keyword retrieval misses, the re-ranker has nothing to reorder. Garbage in, garbage out still applies.
  • Don't skip analyzer tuning. Because retrieval is keyword-based, your analyzer choices still drive what makes it into the candidate set. Pick rfk_standard_multi_locale for most fields, rfk_keyword for IDs, and don't sleep on this layer.
  • Don't measure success with vibes. A/B test specific widgets with click-through and conversion metrics before declaring victory.

A four-step rollout that won't blow up

  1. Audit. Inventory every search widget on the site. For each, classify it: exact-match, descriptive, or Q&A.
  2. Pilot. Pick one descriptive widget — typically the main site search — and request domain enablement. Add +semsearch only there.
  3. A/B test. Run it against the keyword-only baseline. Measure CTR on result position 1–3 and downstream conversion.
  4. Expand. Roll out widget by widget based on evidence, not assumption. Keep your exact-match widgets on classical retrieval.
09 / Looking Forward

Where this is heading next.

Semantic ranking is one piece of a larger arc. Sitecore's broader direction — visible in the Stream license, the SitecoreAI product surface, and the Content SDK consolidation — points toward search becoming less of a retrieval engine and more of a conversational layer over your content. Generative answers, multimodal queries (image and voice), and predictive search are all on the same trajectory.

Honest assessment of where things still aren't: full vector search at index time isn't here yet, and for content sets where conceptual recall — not just ranking — matters most, that's a real limitation. Multimodal capabilities are early. And semantic search's effectiveness still depends heavily on having decent content with reasonable descriptive fields; it doesn't rescue thin product copy.

The right way to think about today's semantic search is as the first step of a multi-year shift, not the destination. Implement it cleanly, measure it honestly, and you'll be well-positioned for whatever ships next.

10 / Key Takeaways

The summary worth bookmarking.

Seven things to remember

  1. Sitecore Semantic Search is a re-ranking layer, not a vector index. Keyword retrieval still happens first.
  2. Enable it per widget by adding +semsearch to the rfk_flags array — and ask Sitecore to enable your domain first.
  3. Keyword search remains the right tool for SKUs, IDs, and exact phrases. Don't replace it; complement it.
  4. The Content SDK's SearchService with TypeScript generics is the modern way to query — type-safe, with dot notation for nested sorts.
  5. Pair semantic ranking with exact_answer for question-style queries and mlt for personalized ordering.
  6. Search React SDK v3.0.0 (Sept 2025) brings React 19 support, aligning with Content SDK 1.0, JSS 22.7.0+, and Next.js 15.
  7. Roll out: audit widgets, pilot one, A/B test, expand. Don't flip the flag globally on day one.
11 / Resources

Further reading.

Photograph of Ashish Kapoor

About the author

Ashish Kapoor

Global Director of Marketing Technology | Chief Technology Advisor | Architecting the Future with SaaS MACH & Agentic AI | 2x Sitecore Ambassador MVP

  • 21+ years in enterprise product architecture
  • Sitecore MVP Ambassador (2023, 2024)
  • Global digital delivery across 40+ countries
  • 100+ AI agents shipped in production
  • $2M+ MarTech rationalisation savings
Read the full bio