Schema Markup That AI Search Engines Actually Cite (2026)

AI crawlers parse structured data before they parse prose. When Perplexity generates an answer about a local business, when ChatGPT decides which page to link to, when Google’s AI Overviews assembles a summary — they’re pulling from the same source of truth: the JSON-LD blocks embedded in your HTML.

The question is which schema types get cited, and which are decorative. Here are the seven that matter in 2026, with working examples you can copy.

The seven schema types AI search prefers

1. Organization + LocalBusiness / ProfessionalService

The foundation. Every page of your site should share a single Organization node, referenced by @id everywhere else. For service businesses, pair it with LocalBusiness (or a subtype like ProfessionalService or Dentist).

{
  "@context": "https://schema.org",
  "@type": "ProfessionalService",
  "@id": "https://your-domain.com/#localbusiness",
  "name": "Your Business Name",
  "url": "https://your-domain.com",
  "telephone": "+1-XXX-XXX-XXXX",
  "address": {
    "@type": "PostalAddress",
    "addressLocality": "Mississauga",
    "addressRegion": "ON",
    "addressCountry": "CA"
  },
  "geo": { "@type": "GeoCoordinates", "latitude": 43.589, "longitude": -79.6441 },
  "openingHoursSpecification": [
    {
      "@type": "OpeningHoursSpecification",
      "dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
      "opens": "09:00",
      "closes": "17:00"
    }
  ],
  "areaServed": [{ "@type": "AdministrativeArea", "name": "Ontario" }]
}

AI search engines use ProfessionalService, LocalBusiness, and their subtypes to decide whether your business matches a geographic query.

2. FAQPage

The highest-value schema for AI citations in 2026. Every FAQ you’ve already written on a service page or blog post should be wrapped in FAQPage markup. AI engines lift individual Q-A pairs as direct answers.

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is AEO?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "AEO (Answer Engine Optimization) is the practice of optimizing content to be cited by AI search engines..."
      }
    }
  ]
}

Rule: only mark up FAQs that are actually answered on the page. AI engines validate the markup against the visible content and demote pages with schema that doesn’t match.

3. HowTo

Used for step-by-step content. AI Overviews specifically look for HowTo schema when the query includes “how to,” “tutorial,” or “step by step.”

{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to Publish llms.txt",
  "step": [
    {
      "@type": "HowToStep",
      "name": "Create the file",
      "text": "Create a file named llms.txt in your public/ directory..."
    },
    { "@type": "HowToStep", "name": "Add the header", "text": "First line is '# Business Name'..." }
  ]
}

4. Article (with author, datePublished, dateModified, citation)

The critical combination. Every blog post should emit Article schema with all four properties. The citation property — which most sites skip — is how you tell AI engines which sources you’re referencing.

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Schema Markup That AI Search Engines Actually Cite",
  "author": {
    "@type": "Person",
    "name": "Author Name",
    "url": "https://your-domain.com/about"
  },
  "datePublished": "2026-04-22",
  "dateModified": "2026-04-22",
  "publisher": { "@id": "https://your-domain.com/#organization" },
  "citation": ["https://llmstxt.org", "https://schema.org/Article"]
}

5. BreadcrumbList

Understood by every major AI engine. Use it on every page except the homepage.

{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://your-domain.com/" },
    { "@type": "ListItem", "position": 2, "name": "Services", "item": "https://your-domain.com/services" },
    { "@type": "ListItem", "position": 3, "name": "AI SEO", "item": "https://your-domain.com/services/ai-seo" }
  ]
}

6. DefinedTerm / DefinedTermSet (underused)

This one is quietly powerful. If you publish a glossary (and you should), wrap it in DefinedTermSet. AI engines — especially Perplexity and ChatGPT — treat DefinedTerm schema as authoritative dictionary-style citations.

{
  "@context": "https://schema.org",
  "@type": "DefinedTermSet",
  "name": "Marketing & SEO Glossary",
  "hasDefinedTerm": [
    {
      "@type": "DefinedTerm",
      "name": "CLS (Cumulative Layout Shift)",
      "description": "A Core Web Vital measuring unexpected visual shift during page load...",
      "inDefinedTermSet": "https://your-domain.com/resources/glossary"
    }
  ]
}

Few sites publish DefinedTerm schema, so when you do, you end up in a small citation pool with low competition. DEM uses this on our glossary exactly because of this dynamic.

7. Product / Service

If you sell products, use Product. If you sell services, use Service with a provider pointing back to your Organization @id.

{
  "@context": "https://schema.org",
  "@type": "Service",
  "name": "Local SEO Services",
  "provider": { "@id": "https://your-domain.com/#organization" },
  "areaServed": [{ "@type": "AdministrativeArea", "name": "Ontario" }],
  "description": "Google Business Profile optimization, citations, location landing pages, and review systems for Ontario businesses."
}

Common schema mistakes

Stale data. Prices, hours, and aggregate ratings that no longer match the page. AI engines detect the mismatch and demote.
Orphan @id values. Each @id should be referenced from at least one other node. An Organization @id that nothing links to is wasted.
Mismatched canonical URLs. If your canonical says https://example.com/page but your schema’s url says https://www.example.com/page, you’ve split your authority.
Invalid JSON. One trailing comma breaks the whole block. Always validate before shipping.
Using aggregateRating without real reviews. Inventing numbers violates Google’s structured-data policy and will earn a manual action. Only ship aggregateRating if the data is real and visible on the page.

Testing and validation workflow

Before any structured data ships to production:

Schema.org validator — validator.schema.org. Catches syntax errors.
Google Rich Results Test — search.google.com/test/rich-results. Shows which rich results Google will render.
View source — actually open the page, view source, and eyeball the JSON-LD blocks. It’s the only way to catch the “the schema is rendered but the values are wrong” failure mode.
Production spot check — after deploy, run curl -s https://your-domain.com/page | grep -A 50 'ld+json' and confirm the live output matches.

Case study: how DEM structures its schema

Every page on digitalestatemedia.com ships at least three JSON-LD blocks: the shared Organization, the shared ProfessionalService, and a page-specific block (BreadcrumbList on most pages, FAQPage on service and blog pages, DefinedTermSet on the glossary, Article on blog posts).

The trick that ties it together: every page references the same @id for the Organization — https://www.digitalestatemedia.com/#organization. This lets AI engines build a connected graph of our pages rather than treating each as isolated. When Perplexity cites a single DEM page, it implicitly pulls the Organization context along with it.

This is the approach we recommend for every client engagement: centralize the Organization and LocalBusiness schemas in one shared module, reference them by @id from every page, then add page-specific schema on top.

Checklist before publishing any page

Organization + LocalBusiness/ProfessionalService present (usually shared across the site)
Page-specific schema matches the page’s dominant content type (Article for blog, Service for service page, FAQPage if there’s an FAQ, BreadcrumbList if not the homepage)
All @id values reference valid nodes
All schema values match the visible page content
Validated against Schema.org validator and Google Rich Results Test
citation field populated if the page references external sources

Where this fits in the bigger picture

Schema is one of three levers for AI search visibility. The other two are llms.txt as a curated entry point and clean, citable prose in the page body. Do all three and you’re in roughly 5% of sites that AI search engines can actually cite cleanly.

If you want the conceptual background, SEO vs AEO vs GEO: What Your Business Actually Needs in 2026 is the pillar for this cluster. For implementation help, this is exactly what our AI SEO service covers — we build the schema, the llms.txt, and the citable content together as a single system.