How AI Agents Buy Things

Seven API calls complete a ChatGPT purchase in seconds. Technical breakdown of the flow, security layers, and data blind spots for merchants.

Share with others

When a customer uses ChatGPT to purchase running shoes from a Shopify merchant, the transaction involves seven distinct phases of API communication between the AI agent, merchant systems, payment infrastructure, and payment networks. The sequence completes in 2-4 seconds without the customer leaving the conversational interface.

Understanding this technical flow matters for organizations evaluating agentic commerce implementations. The protocol solves transaction security and merchant control through multi-layer authentication, scoped payment tokens, and real-time state synchronization. But it intentionally leaves measurement, attribution, and pre-transaction data collection as separate infrastructure concerns.

This article explains the technical mechanics phase by phase, the security mechanisms at each layer, and which data infrastructure requirements fall outside the protocol's scope.

The architecture overview

Four participants communicate during an agentic purchase: the AI agent (ChatGPT or equivalent), the merchant's backend systems, the payment service provider (Stripe in the reference implementation), and payment networks (Visa, Mastercard, issuing banks).

Communication happens through RESTful APIs over HTTPS with Bearer token authentication. The merchant maintains a stateful checkout session. After completion, the merchant sends lifecycle updates through webhooks. Cryptographic signatures verify request authenticity at each layer.

The protocol begins when a customer expresses purchase intent. Everything before that moment (product discovery, comparison shopping, consideration) happens outside the protocol's scope. Agent and merchant negotiate checkout terms through iterative API calls. Once the customer confirms, payment processing occurs through scoped, single-use tokens. The merchant maintains control as merchant of record, running fraud detection and applying business rules before accepting orders.

Phase 1: Intent and session creation

The technical flow begins when a customer tells ChatGPT "buy these running shoes." The agent sends a CreateCheckoutRequest to the merchant's API endpoint. This request contains product identifiers, quantities, and buyer context necessary for the merchant to calculate pricing and availability.

The merchant's backend receives this request, validates the products exist and are in stock, calculates applicable sales tax based on shipping location, and returns available shipping methods with costs. The merchant's response becomes the authoritative checkout state. It includes complete cart contents with current pricing, tax calculations, shipping options, and real-time inventory confirmation.

This initial exchange establishes several important characteristics. The merchant maintains pricing authority. The agent can't override prices or modify what the merchant returns. Inventory gets confirmed before the customer sees checkout options, preventing scenarios where customers commit to purchases the merchant can't fulfill. The checkout session receives a unique identifier the merchant uses to track state through subsequent updates.

Data infrastructure implication: The first measurable event in the transaction flow is checkout initiation. There's no equivalent to "product page view" or "add to cart" events that traditional e-commerce analytics capture. Retailers operating media networks lose the browsing behavior data that informs product recommendation algorithms and advertising attribution models. The customer's discovery and consideration process happened in the conversational interface, invisible to the merchant's analytics systems.

Phase 2: Customer confirmation and state synchronization

ChatGPT renders a checkout interface within the chat, displaying the merchant's returned information: product details, pricing, shipping options, and total cost. The customer confirms their shipping address and selects a payment method from stored options. ChatGPT collects payment information directly from the customer. The merchant never sees raw card data at this stage.

If the customer modifies selections (changing to faster shipping or adjusting quantities), ChatGPT sends an UpdateCheckoutRequest to the merchant's API. The merchant recalculates totals, taxes, and availability, returning the updated checkout state. This synchronization happens in real-time as the customer makes decisions.

The stateful negotiation continues until the customer commits to purchase. Traditional checkout involves a single form submission. Agentic checkout involves continuous state synchronization between agent and merchant. Each modification triggers validation on the merchant's side. If inventory depletes during negotiation or pricing changes due to promotions ending, the merchant's next response reflects current reality.

Data infrastructure implication: Each UpdateCheckoutRequest represents a potential analytics event, but standard implementations don't capture this data. Retailers lose visibility into customer hesitation patterns, which shipping options customers compare, and where friction occurs in the checkout flow. Traditional conversion optimization relies on analyzing where customers abandon during checkout. In agentic flows, that granular interaction data doesn't flow to analytics platforms unless merchants build custom instrumentation.

Phase 3: Shared payment token creation

When the customer confirms purchase, ChatGPT creates a Shared Payment Token by calling Stripe's API. This token is the security innovation enabling the architecture.

The token is not actual card data. It's a cryptographically scoped, single-use token bound to specific parameters: payment method identifier, seller identity, exact transaction amount maximum, and expiration time window (typically minutes).

Stripe stores this token and returns a reference token ChatGPT passes to the merchant. The merchant uses this reference with Stripe to process payment. Only Stripe knows the mapping between reference token and actual payment method. Even if intercepted, the token is useless without merchant Stripe credentials and is scoped to specific merchant and amount.

This creates complete payment data isolation. ChatGPT collects card information. Stripe tokenizes it. Merchant receives only the scoped reference. Stripe processes against the actual payment method internally. Merchants never handle primary account numbers.

This dramatically reduces PCI compliance scope. Even if merchant systems are compromised, attackers can't extract payment credentials. Tokens can't be reused elsewhere or for different amounts. Time limitations prevent delayed attacks.

Data infrastructure implication: Payment tokenization solves payment security elegantly. User identity tokenization across sessions isn't addressed. Customers are pseudonymous at the transaction layer. Merchants can't natively recognize that today's ChatGPT purchaser is the same customer who browsed their website last week. Identity resolution across channels requires separate infrastructure.

Phase 4: Authentication through cryptographic signatures

Before sending the payment token, ChatGPT signs the request using HTTP Message Signatures based on IETF standards. The request includes authentication headers specifying the cryptographic signature, where to retrieve the public key for verification, and which request components were signed including validity time windows.

The merchant retrieves ChatGPT's public key from a well-known endpoint, verifies the signature cryptographically, and validates the time window to prevent replay attacks. This emerging standard prevents malicious bots from masquerading as ChatGPT through unforgeable proof of request origin.

Authentication happens at multiple layers. Bearer tokens authenticate API access. HTTP Message Signatures verify request integrity. HMAC signatures validate webhook deliveries. Each layer prevents different attack vectors.

Data infrastructure implication: Agent identity gets verified cryptographically, but customer identity across sessions doesn't. Merchants receive proof ChatGPT sent the request but no cross-session identifier for stitching this customer's agentic purchases to other channel behaviors. Retail media networks lose the ability to build unified customer profiles without additional identity infrastructure.

Phase 5: Merchant-side payment processing

ChatGPT sends a CompleteCheckoutRequest to the merchant containing the Shared Payment Token reference, the verified checkout session identifier, buyer and fulfillment information, and idempotency keys ensuring retry safety.

The merchant's backend performs critical validations before accepting the order. Risk and fraud assessment runs first. The merchant analyzes payment and fraud signals using their own systems or third-party fraud services, checking velocity patterns, device information where available, IP addresses, and behavioral signals. Importantly, the merchant maintains complete control over fraud rules and risk thresholds.

The merchant validates they're still the appropriate merchant of record. They verify inventory remains available (race conditions are possible between session creation and completion). They confirm pricing hasn't changed due to promotions or pricing rule updates. They recalculate taxes and shipping to ensure accuracy. They apply business rules around order minimums, restricted items, or customer-specific policies.

If validation passes, the merchant creates a PaymentIntent in their Stripe account using the token received from ChatGPT. Stripe maps the reference token back to the underlying payment method, communicates with the payment network (Visa, Mastercard, issuing bank), and receives either authorization or decline. The payment network applies its own fraud detection and account validation.

The merchant can accept or decline the entire order at this decision point. High fraud signals, depleted inventory, business rule violations, or payment network declines all result in order rejection. The merchant responds to ChatGPT with either order confirmation or decline reasoning. If confirmed, the merchant creates an order in their e-commerce system (Shopify, custom backend, or other platform) exactly as they would for web or app orders.

Data infrastructure implication: This reveals the most significant blind spot. Merchants have fraud signals from the transaction: the payment method, shipping address, order characteristics, and transaction metadata. But merchants lack behavioral signals from before the transaction. Traditional fraud models rely heavily on pre-purchase behavior: time spent on site, pages viewed, navigation patterns, comparison shopping behavior, and session characteristics.

In agentic commerce, none of this behavioral data exists in the merchant's systems. The customer's discovery and consideration happened in ChatGPT's environment. For fraud detection, this means merchants must assess risk without signals that indicate "normal" customer behavior. For retail media, this means lost intent data that would inform product recommendations and advertising targeting. For publishers monetizing through commerce, this means no visibility into which content drove purchase consideration.

This is the first-mile data problem. Transaction data exists and flows cleanly through the protocol. Intent data and consideration behavior don't.

Phase 6: Order confirmation and customer notification

If validation and payment processing succeed, the merchant responds to the CompleteCheckoutRequest with order confirmation. The response includes the order identifier, order reference number, tracking information if available immediately, and expected delivery dates.

ChatGPT displays this confirmation within the chat interface. The customer sees their order number, total amount charged, and delivery estimate. The conversational context means customers can ask follow-up questions about their order immediately. The entire sequence from purchase intent expression to confirmation typically completes in 2-4 seconds.

The customer never left the chat interface. No redirect to a merchant-hosted payment page occurred. No separate confirmation email needs to be checked (though merchants typically send one anyway). The experience prioritizes speed and reduced friction.

Phase 7: Webhook events and lifecycle synchronization

The merchant's communication with ChatGPT doesn't end at order confirmation. The merchant publishes webhook events to an OpenAI-provided endpoint for order lifecycle updates: order creation confirmation, status changes during processing, shipment notifications with tracking numbers, delivery confirmations, and cancellation events.

These webhooks use HMAC signatures for verification. ChatGPT validates signatures before processing events, preventing spoofed notifications. The webhook stream keeps ChatGPT's view of order state synchronized with the merchant's fulfillment system. This matters especially for retry scenarios where network failures or timeouts occur. Customers can query order status in the chat anytime, and ChatGPT reflects current state based on merchant webhooks.

Idempotency handling throughout the flow prevents duplicate charges or orders. Both agent and merchant track idempotency keys. Network retries don't create multiple orders or multiple charges. If a CompleteCheckoutRequest times out and ChatGPT retries, the merchant recognizes the duplicate request via idempotency key and returns the same result without reprocessing.

Data infrastructure implication: Webhooks handle post-purchase lifecycle effectively for keeping the agent informed. But there's no standardized event stream for merchant analytics platforms. If a merchant wants order events flowing to Google Analytics, their data warehouse, or their customer data platform, they must build custom pipelines. The protocol doesn't specify how agentic purchase data integrates with existing analytics infrastructure.

Key security mechanisms

Complete payment data isolation prevents common attack vectors. ChatGPT collects card information. Stripe tokenizes it. Merchants receive only the scoped token. Stripe processes against the actual payment method. Only Stripe knows the connection between token and credentials.

Scope binding ensures tokens can't be misused. Tokens work only for specific merchants, specific maximum amounts, and within specific time windows. Different amounts, different merchants, or expired tokens all fail. Even intercepted tokens are worthless without correct merchant credentials.

Multi-layer authentication prevents different attack types. Bearer tokens prevent unauthorized API access. HTTP Message Signatures prevent tampering and impersonation. HMAC webhook signatures prevent spoofed updates.

Fraud surface reduction occurs because merchants never handle primary account numbers. PAN data breaches at merchant level become impossible. Time-limited authorizations prevent attackers using stolen tokens later.

Data infrastructure implication: Security architecture focuses on payment integrity and transaction authentication. Customer privacy and consent management across surfaces aren't addressed. Retailers need separate policy infrastructure for consent verification, privacy preference tracking, and compliance with GDPR and CCPA when transactions happen through agent interfaces.

What this flow doesn't capture

The protocol intentionally begins at purchase intent expression. By design, several categories of data that traditional e-commerce systems capture don't exist in this flow.

Pre-transaction behavioral data has no equivalent. There's no "product view" event, no browsing history, no session duration metrics, no comparison shopping patterns, no time spent evaluating options, and no clear attribution path showing how the customer arrived at this purchase decision.

Cross-session identity resolution isn't native. Each transaction is pseudonymous. Merchants can't stitch a customer's agentic purchases to their web purchases, mobile app behavior, in-store transactions, or anonymous browsing. Building unified customer profiles across channels requires additional identity infrastructure.

Attribution and marketing data don't flow through the protocol. No UTM parameters, no referral sources, no campaign tracking codes, and no connection from ad impression to agentic purchase. Marketing attribution models that trace spending to specific channels break fundamentally.

Consent and privacy context aren't surfaced. The protocol doesn't include explicit consent collection. Customer privacy preferences (opt-in status, authorized data uses, applicable regulations) don't transfer from agent to merchant. Compliance verification for GDPR and CCPA happens outside the protocol. Jurisdiction determination relies on shipping address only.

Analytics event streams lack standardization. The protocol doesn't specify how agentic purchase data flows to Google Analytics, data warehouses, customer data platforms, or business intelligence tools. Merchants must build custom pipelines for integration with existing analytics infrastructure.

This is intentional design. The Agentic Commerce Protocol solves transaction security, merchant control, and payment processing. It doesn't solve measurement, attribution, identity resolution, or compliance verification. These require complementary systems.

The infrastructure gap and opportunity

The protocol does what it set out to do: secure transactions, protect payment data, maintain merchant control. Multi-layer authentication works. Scoped tokens prevent credential exposure. Webhooks keep systems synchronized. The architecture respects both customer security and merchant autonomy.

What it doesn't do is capture the data that traditional e-commerce relies on. No browsing behavior, no attribution paths, no consent context, no cross-session identity. Organizations building capabilities to capture intent signals, embed compliance metadata, and normalize data across channels are establishing measurement frameworks while standards are still forming. Companies like MetaRouter are building first-mile infrastructure specifically for these gaps. Those waiting will adapt to decisions others made.