Preparing for AI Shopping: A Data Strategy Playbook

AI shopping agents query structured data, not marketing copy. This playbook covers the product data, inventory, and infrastructure investments merchants need to capture agent commerce value.

Share with others

Every vendor in your stack has an AI readiness story now: your CDP has an agentic commerce module, your analytics vendor is pitching AI-native attribution, and your personalization platform has a slide about agent-mediated discovery. If you are a VP of MarTech or data engineering at a large retailer, you have probably seen more "AI-ready" roadmaps in the past quarter than you saw CDP pitches in 2019.

Here is what makes the noise worth sorting through: the underlying trend is real. Adobe found AI traffic to retail sites grew 805% year-over-year during Black Friday 2025, Salesforce research shows 39% of consumers already use AI for product discovery, and Morgan Stanley reports 23% of Americans purchased something via AI in the past month. These are observed behaviors, not projections.

The performance gap is where it gets interesting. McKinsey's research shows AI-generated product recommendations achieve 4.4x higher conversion rates than traditional search. Yet actual ChatGPT shopping referrals convert 86% worse than affiliate traffic. Consumer demand is ahead of merchant infrastructure, and the gap is almost entirely a data problem. The playbook that follows focuses on the infrastructure investments that close it.

Why AI shopping readiness is really data infrastructure readiness

The temptation is to treat agent commerce as a new channel requiring new architecture, but the actual problem is more mundane. ChatGPT referrals convert poorly because merchant data stacks were built around assumptions that agents violate: customers load web pages, execute JavaScript, accept cookies, and produce observable behavioral journeys.

Agents do none of this. They make API calls, parse structured data, and complete transactions through protocols like the Agentic Commerce Protocol without ever triggering the client-side tracking that your analytics, attribution, and personalization systems depend on.

The good news is that everything agents need from your infrastructure is something your data team has already been requesting. Product data hygiene improves current SEO and merchandising while making products discoverable by agents. Server-side data collection fixes the ad blocker and browser privacy gaps your team raised three years ago while capturing agent traffic that client-side tools miss entirely. Identity resolution unifies cross-channel customer views for today's marketing programs while connecting tomorrow's agent transactions to existing customer profiles.

The practical result is that preparing for agent commerce means accelerating work you already need to do, with the added benefit of positioning for a channel that is growing faster than most teams expected.

Product data structure for agent discoverability

Agents read structured data, not marketing copy. When an agent is asked to find wireless headphones with 30-plus hour battery life under $200, it queries product catalogs and filters on explicit attributes. Products whose battery life is not stated in machine-readable format get skipped, regardless of whether they meet the criteria. Research from Mirakl found that 42% of customers abandon purchases due to insufficient product information, and that number likely understates the problem for agent surfaces, where data quality determines whether products appear at all.

The key difference between traditional SEO and agent commerce is that agents enforce what search engines merely rewarded. Missing Schema.org markup used to mean lower rankings; in agent commerce it means the product does not appear. Complete JSON-LD schema, consistent product identifiers (GTIN, MPN) across all surfaces, and attribute-dense descriptions that answer specific queries are table stakes. We covered the full technical requirements for ChatGPT shopping readiness in detail previously. The rest of this playbook focuses on the data infrastructure decisions that sit underneath product data and determine whether your systems can actually capture, resolve, and act on agent commerce signals.

Server-side collection as the foundation layer

This is where the infrastructure conversation shifts from product data (which your merchandising team owns) to data collection architecture (which sits squarely in your domain).

AI agents make API calls directly to merchant systems. They do not load web pages, render JavaScript, or accept cookies. If your data collection depends on client-side execution, you have a growing blind spot that today includes ad-blocked sessions, Safari ITP restrictions, and consent management gaps, and tomorrow includes every agent-mediated transaction.

Server-side data collection captures events at the infrastructure layer where requests arrive, independent of what happens in the browser. When an agent initiates an add-to-cart or checkout, server-side collection captures the event directly from the API call, with no JavaScript dependency or cookie requirement. The same architectural choice that fixes your current data quality problems becomes mandatory for agent commerce, where client-side approaches do not work at all.

The practical implication for infrastructure planning: any data collection investment that depends on client-side execution is building on an assumption with a known expiration date. Browser restrictions are tightening, agent traffic is growing, and the percentage of your total commerce that generates client-side signals is declining from both directions simultaneously.

Identity resolution for agent commerce customers

Agent-mediated purchases arrive without journey context. Your first signal is the checkout payload: product, quantity, price, shipping. Everything that happened before (discovery, comparison, consideration, the conversation between customer and agent that led to your product being selected) is invisible to your systems.

Without identity resolution, every agent transaction is a contextless event. You know what was bought but not by whom, in any meaningful sense beyond the shipping address. You cannot connect that purchase to the same customer's website visits, email engagement, or in-store activity, which means you cannot build a profile, personalize subsequent interactions, or measure lifetime value across channels.

Cross-surface identity resolution changes the equation. A customer who browses your website, receives your emails, and eventually purchases through an agent appears as one customer across all surfaces rather than as disconnected records that fragment your customer intelligence. The agent transaction, instead of being a context-free event, becomes a data point that enriches an existing profile.

This is the same identity resolution work that improves match rates for your current advertising, extends tracking beyond browser restrictions, and unifies cross-channel customer views for existing marketing programs. Agent commerce does not require a different kind of identity resolution, just the identity resolution your team has been building the business case for, with a new and increasingly urgent reason to fund it.

What to build now versus what to wait on

Not everything in this playbook demands immediate action. The timeline depends on how much agent traffic your category is already seeing and how much of the foundational work is already underway.

Build now (immediate returns, agent-ready as a byproduct):

Product data hygiene is the lowest-risk, highest-certainty investment. Complete Schema.org markup, consistent identifiers, and attribute-dense product data improve current SEO performance, marketplace visibility, and merchandising quality while making products discoverable by agents. If your product data team has a backlog of schema improvements, move them up.

Server-side data collection addresses a current, measurable problem (data loss from ad blockers, browser restrictions, consent gaps) while building the foundation for agent traffic measurement. If you are evaluating data collection infrastructure in the next 18 months, weight your decision toward server-side architecture.

Build when agent traffic hits 1-3% of sessions:

Identity resolution that specifically accounts for agent transaction patterns. Most organizations have some form of identity resolution already, so the agent-specific work involves ensuring that agent-originated checkout events can be matched to existing customer profiles and that your progressive profiling logic handles the sparse initial context that agent purchases provide.

Agent-specific attribution modeling is also in this category, since until agent traffic reaches a measurable threshold there is not enough data to build meaningful models. Prepare the data infrastructure (server-side collection, identity graphs) so that when volume arrives, your attribution systems can process it.

Wait and monitor:

Agent-specific personalization logic, since how agents interact with merchant personalization APIs is still evolving and the protocols are not settled enough to build against. Focus on having clean, unified customer data (which is the output of the investments above) so that when agent personalization patterns solidify, you have the foundation to implement them.

The infrastructure decisions that compound

The most important thing about this playbook is what it does not require: a bet on which agent platform wins, which commerce protocol becomes standard, or how fast the transition happens. Every investment outlined here improves current performance while positioning for agent commerce, because the underlying architecture is the same.

Product data structure improves SEO and agent discoverability simultaneously. Server-side collection fixes current data quality gaps while capturing agent signals that client-side tools miss entirely. Identity resolution increases match rates across existing channels while providing the connective tissue that makes agent transactions meaningful rather than contextless. Each of these compounds across current and emerging channels, which is the only kind of infrastructure investment worth making when the specific shape of an emerging channel is still forming.

First-mile data infrastructure that captures signals at the point of collection, normalizes them, and routes them to downstream systems provides the architectural foundation for all three, regardless of whether the signal originated from a web browser, a mobile app, or an AI agent.

The retailers who will be best positioned when agent commerce scales are the ones who treated it as a reason to finally close the data quality, identity, and collection gaps that already cost them accuracy across traditional channels.