Back to Writing Beyond the Prompt: Context Engineering Patterns for Complex Enterprise APIs

Beyond the Prompt: Context Engineering Patterns for Complex Enterprise APIs

Contents

The Enterprise Context Crisis

When building AI agents to interact with enterprise software—whether a Dynamics 365 ERP, an SAP instance, or a large Salesforce deployment—you immediately collide with a physical constraint: scale.

The standard developer playbook for tool calling is straightforward: compile the API specifications (typically OpenAPI/Swagger documents or OData metadata schemas), stuff them into the system prompt, and let the Large Language Model (LLM) decide which endpoint to call and what arguments to pass.

In a sandbox environment with three endpoints, this works flawlessly.

In the enterprise, it collapses.

A single module in Dynamics 365 Finance & Operations can contain hundreds of OData entities, each with dozens or hundreds of fields. The raw metadata schema for these entities regularly stretches into tens of thousands of lines of XML or JSON. Stuffing this raw representation into the prompt causes three critical failures:

  1. Token Bloat: You spend thousands of dollars in token fees simply passing static API definitions that the model doesn't need for the current query.
  2. Attention Dilution (Lost in the Middle): LLMs struggling under massive contexts frequently hallucinate arguments, miss subtle schema constraints, or fail to follow system instructions.
  3. Latency: Loading 50k tokens of schema metadata adds seconds of processing time before the model even begins generating its output.

To build robust, production-ready enterprise AI agents, we must move beyond the naive prompt and implement context engineering—the practice of dynamically selecting, pruning, and structuring system metadata at runtime.


The Dynamic Schema Pruning Pipeline

Instead of expecting the LLM to filter out the noise, your application should act as an active compiler, filtering the schema metadata before it reaches the model.

Here is what that pipeline looks like in practice:

— Interactive Flowchart

Dynamic Schema Pruning Pipeline

Payload Representation 42000 Tokens
<!-- Raw XML Metadata (Truncated) --> <EntityType Name="Vendor"> <Key><PropertyRef Name="VendorAccountId" /></Key> <Property Name="VendorAccountId" Type="Edm.String" Nullable="false" /> <Property Name="Name" Type="Edm.String" /> <Property Name="VendorGroup" Type="Edm.String" /> <Property Name="CurrencyCode" Type="Edm.String" /> <Property Name="CreditLimit" Type="Edm.Decimal" /> <Property Name="PaymentTermId" Type="Edm.String" /> <Property Name="VendorBlocked" Type="Edm.Enum" /> <Property Name="TaxRegistrationNum" Type="Edm.String" /> <!-- ... 250 more properties ... --> </EntityType> <EntityType Name="Customer">...</EntityType> <EntityType Name="PurchaseOrder">...</EntityType> <EntityType Name="SalesOrder">...</EntityType> <!-- ... 45 other relational entities ... -->
Compression: None

Let's break down the three primary patterns that make this pipeline possible.


Pattern 1: Just-In-Time (JIT) Schema Pruning

When a user asks, "Which vendors are currently on credit hold?", they only care about two entities: Vendors and CreditStatus. They do not need to know about PurchaseOrders, LedgerPeriods, or TaxRegistrations.

JIT Schema Pruning is the practice of dynamically constructing a type interface containing only the specific fields requested by the user query, discarding everything else.

How it works:

  1. Metadata Registry: Store your full API schema in a structural format (like a nested dictionary or graph) in your application memory.
  2. Intent Extraction: Run a lightweight LLM call (or a fast regex-based router) to extract the target entities and the fields the query refers to.
  3. Dynamic Compilation: Query the registry to extract only those properties, and programmatically generate a minimal JSON schema or TypeScript definition.

This pattern reduces schema payloads from 40,000+ tokens to under 200 tokens, resulting in a 99.5% reduction in API cost and near-instant processing.


Pattern 2: Semantic Schema Routing

When your enterprise system has thousands of endpoints, intent extraction using regular expressions or static rules becomes unmanageable. This is where Semantic Schema Routing comes in.

Instead of matching keywords, we vectorize the metadata descriptions of our tables and columns and query them using semantic similarity.

# Conceptual Python code for Semantic Schema Routing
from sentence_transformers import SentenceTransformer
import faiss

model = SentenceTransformer('all-MiniLM-L6-v2')
# We index entity descriptions like "Vendors - accounts, credit holds, and blocked statuses"
# and "Invoices - accounts payable, invoice lines, and tax records"
index = faiss.IndexFlatL2(384)

def find_relevant_entities(user_query: str, top_k: int = 3):
    query_vector = model.encode([user_query])
    distances, indices = index.search(query_vector, top_k)
    return [entity_catalog[idx] for idx in indices[0]]

By pre-filtering the schema catalog through a vector database, the LLM only ever sees the API definitions for the 2 or 3 entities that are semantically relevant to the user's intent.


Pattern 3: State-Preserving Session Compression

In multi-turn conversations where an agent is performing complex tasks—such as reconciling an invoice discrepancy across multiple lookups—the context window accumulates previous tool outputs, SQL queries, and error messages.

If left unchecked, the context will eventually exceed the model's capabilities or blow past your token budget.

To prevent this, you must implement Session Compression:

  1. Sliding Window Summarization: Compress conversation history older than turns into a concise state summary, preserving only the facts discovered (e.g., "Vendor account found: VND-00123. Invoice matches PO-9988").
  2. Entity-Relationship Graph extraction: Extract a temporary graph of key facts as the conversation progresses, and inject this graph into the system prompt rather than raw chat history.
  3. Tool-Output Truncation: When a tool returns a massive payload (like a list of 1,000 invoices), truncate the payload to the first 5 records and provide the LLM with a pagination tool rather than dumping the raw array into the prompt.

Case Study: Dynamics 365 OData Context Engineering

Let's look at a concrete, production-grade implementation of these concepts. In this example, we take a natural language query, run a semantic router to identify the target OData entity, dynamically prune its schema metadata, and invoke the model to compile the OData query.

import { OpenAI } from 'openai';

interface EntitySchema {
	name: string;
	description: string;
	properties: Record<string, string>;
}

// Full schema catalog stored in application memory
const entityCatalog: EntitySchema[] = [
	{
		name: 'Vendors',
		description:
			'Accounts, credit limits, blocks, group codes, and financial statuses of suppliers.',
		properties: {
			VendorAccountId: 'string',
			Name: 'string',
			CreditLimit: 'number',
			VendorBlocked: 'YesNoEnum',
			CurrencyCode: 'string',
			PaymentTermId: 'string'
		}
	},
	{
		name: 'VendorInvoices',
		description:
			'Accounts payable invoices, status codes, amounts, and tax registration identifiers.',
		properties: {
			InvoiceId: 'string',
			VendorAccountId: 'string',
			Amount: 'number',
			InvoiceStatus: 'PendingApprovedEnum',
			DueDate: 'Date'
		}
	}
];

const openai = new OpenAI();

async function getEngineeredContext(userQuery: string): Promise<string> {
	// Step 1: Semantic Routing (Mocked here for simplicity)
	// In production, use vector search against the entity descriptions
	const matchedEntities = entityCatalog.filter(
		(entity) =>
			userQuery.toLowerCase().includes('vendor') || userQuery.toLowerCase().includes('invoice')
	);

	// Step 2: JIT Schema Pruning
	// Instruct the LLM to select ONLY the properties needed for the query
	const pruningResponse = await openai.chat.completions.create({
		model: 'gpt-4o-mini',
		messages: [
			{
				role: 'system',
				content: `Analyze the user query and select only the required properties from these entities.
Entities available: ${JSON.stringify(matchedEntities, null, 2)}
Return valid JSON: { "entity": "Name", "properties": ["Prop1", "Prop2"] }`
			},
			{ role: 'user', content: userQuery }
		],
		response_format: { type: 'json_object' }
	});

	const plan = JSON.parse(pruningResponse.choices[0].message.content || '{}');

	// Step 3: Dynamically generate the minimal interface representation
	const targetEntity = entityCatalog.find((e) => e.name === plan.entity);
	if (!targetEntity) return '';

	const prunedProperties = plan.properties.reduce((acc: any, propName: string) => {
		if (targetEntity.properties[propName]) {
			acc[propName] = targetEntity.properties[propName];
		}
		return acc;
	}, {});

	const minimalSchema = {
		entityName: targetEntity.name,
		properties: prunedProperties
	};

	return JSON.stringify(minimalSchema, null, 2);
}

// Usage Example
(async () => {
	const query = 'Find vendors with credit limits over 100k who are blocked';
	const engineeredContext = await getEngineeredContext(query);

	console.log('--- Engineered Context Passed to LLM ---');
	console.log(engineeredContext);
})();

Output of the script:

--- Engineered Context Passed to LLM ---
{
  "entityName": "Vendors",
  "properties": {
    "VendorAccountId": "string",
    "Name": "string",
    "CreditLimit": "number",
    "VendorBlocked": "YesNoEnum"
  }
}

By engineering the context in this way, the main LLM agent receives a tiny, highly-focused schema. The likelihood of generating a syntactically correct and security-compliant OData query increases exponentially.


Summary: Designing for the Volatility of Context

Building enterprise-grade AI assistants is not about waiting for context windows to expand to infinity. Infinite context does not solve attention dilution, latency, or API cost.

The developers who successfully deploy agents to production are those who view context as a highly dynamic, volatile resource. By wrapping your agents in pipelines that prune schemas, route semantically, and compress state, you ensure your systems remain fast, cost-effective, and accurate.

Do not dump your APIs into the prompt. Compile them JIT.


Sources and Further Reading

Share this article