Sequential Processing (Prompt Chaining)

Description

Sequential processing executes LLM operations in a predefined order. Each step's output becomes input for the next step, creating a clear flow from start to finish. You can add quality checks between steps to ensure the process stays on track.

Diagram

When It's Useful

When tasks have clear, sequential steps (draft → refine → finalize)
When intermediate outputs need validation before proceeding
When breaking complex tasks into simpler subtasks improves quality
When you need predictable execution flow

Code Example

import { openai } from '@ai-sdk/openai';
import { generateText, generateObject } from 'ai';
import { z } from 'zod';

async function generateMarketingCopy(input: string) {
  const model = openai('gpt-4o');

  // STEP 1: Generate initial marketing copy
  // This first LLM call focuses solely on creative generation
  const { text: copy } = await generateText({
    model,
    prompt: `Write persuasive marketing copy for: ${input}. Focus on benefits and emotional appeal.`,
  });

  // STEP 2: Evaluate the quality with structured output
  // Using generateObject ensures we get specific metrics in a consistent format
  const { object: qualityMetrics } = await generateObject({
    model,
    schema: z.object({
      hasCallToAction: z.boolean(),         // Binary check for CTA presence
      emotionalAppeal: z.number().min(1).max(10),  // Rating scale
      clarity: z.number().min(1).max(10),          // Rating scale
    }),
    prompt: `Evaluate this marketing copy for:
    1. Presence of call to action (true/false)
    2. Emotional appeal (1-10)
    3. Clarity (1-10)

    Copy to evaluate: ${copy}`,
  });

  // STEP 3: Quality gate -> only proceed to refinement if needed
  // This conditional branch demonstrates the "gate" pattern
  if (
    !qualityMetrics.hasCallToAction ||
    qualityMetrics.emotionalAppeal < 7 ||
    qualityMetrics.clarity < 7
  ) {
    // STEP 3A: Targeted improvement based on specific deficiencies
    // Note how we dynamically build instructions based on previous step results
    const { text: improvedCopy } = await generateText({
      model,
      prompt: `Rewrite this marketing copy with:
      ${!qualityMetrics.hasCallToAction ? '- A clear call to action' : ''}
      ${qualityMetrics.emotionalAppeal < 7 ? '- Stronger emotional appeal' : ''}
      ${qualityMetrics.clarity < 7 ? '- Improved clarity and directness' : ''}

      Original copy: ${copy}`,
    });
    return { copy: improvedCopy, qualityMetrics };
  }

  // Return original if it passed quality checks
  return { copy, qualityMetrics };
}

This pattern's uniqueness: has a straightforward structure with quality checks. Unlike more complex patterns, it follows a set path with decisions based on quality standards. The main point is that each step's result is organized as input for the next step, with clear rules deciding the flow.

Routing

Description

Routing lets an LLM sort the input and send it to the best-suited pathway. Instead of a single method for everything, routing allows different handling based on the type, complexity, or specific features of the input.

Diagram

When It's Useful

When handling diverse inputs that require different specialized handling
When optimizing for cost by using smaller models for simpler queries
When different types of queries benefit from different system prompts
When you need to maintain specialized expertise for different scenarios

Code Example

import { openai } from '@ai-sdk/openai';
import { generateObject, generateText } from 'ai';
import { z } from 'zod';

async function handleCustomerQuery(query: string) {
  const model = openai('gpt-4o');

  // STEP 1: Classification -> The routing decision maker
  // This step determines which path the workflow will take
  const { object: classification } = await generateObject({
    model,
    schema: z.object({
      reasoning: z.string(),                                // Explanation for the classification
      type: z.enum(['general', 'refund', 'technical']),    // Content category
      complexity: z.enum(['simple', 'complex']),           // Difficulty assessment
    }),
    prompt: `Classify this customer query:
    ${query}

    Determine:
    1. Query type (general, refund, or technical)
    2. Complexity (simple or complex)
    3. Brief reasoning for classification`,
  });

  // STEP 2: Dynamic routing based on classification results
  // Notice how we dynamically select both the model and system prompt
  const { text: response } = await generateText({
    // Model selection -> cost optimization by using smaller models for simpler tasks
    model:
      classification.complexity === 'simple'
        ? openai('gpt-4o-mini')    // Smaller, faster model for simple queries
        : openai('o3-mini'),       // More capable model for complex queries

    // System prompt selection -> specialized expertise for each query type
    system: {
      general:
        'You are an expert customer service agent handling general inquiries.',
      refund:
        'You are a customer service agent specializing in refund requests. Follow company policy and collect necessary information.',
      technical:
        'You are a technical support specialist with deep product knowledge. Focus on clear step-by-step troubleshooting.',
    }[classification.type],        // Index into the object to select the right prompt

    prompt: query,                 // Original query passed to the specialized model
  });

  return { response, classification };
}

This pattern's uniqueness: uses dynamic paths. Unlike fixed paths in sequential processing, routing allows different handling based on input features. The main part is the initial classification step, which decides the best specialized handler (like model size or system prompt) for the specific input.

Parallelization

Description

Parallelization runs multiple LLM operations simultaneously and then combines their results. This pattern breaks a task into independent components that can be processed in parallel, saving time and allowing for specialized focus on different aspects of the same input.

Diagram

When It's Useful

When a task can be broken into independent components
When you need specialized expertise for different aspects of the same input
When you want to decrease latency by performing operations simultaneously
When multiple perspectives or opinions will improve output quality
When implementing "voting" systems where multiple models evaluate the same input

Code Example

import { openai } from '@ai-sdk/openai';
import { generateText, generateObject } from 'ai';
import { z } from 'zod';

async function parallelCodeReview(code: string) {
  const model = openai('gpt-4o');

  // STEP 1: Run multiple specialized reviews in parallel
  // Promise.all allows all three reviews to happen simultaneously
  const [securityReview, performanceReview, maintainabilityReview] =
    await Promise.all([
      // Specialized reviewer #1: Security focus
      generateObject({
        model,
        system:
          'You are an expert in code security. Focus on identifying security vulnerabilities, injection risks, and authentication issues.',
        schema: z.object({
          vulnerabilities: z.array(z.string()),  // List of security issues
          riskLevel: z.enum(['low', 'medium', 'high']),  // Overall risk assessment
          suggestions: z.array(z.string()),      // Security improvement suggestions
        }),
        prompt: `Review this code:
      ${code}`,
      }),

      // Specialized reviewer #2: Performance focus
      generateObject({
        model,
        system:
          'You are an expert in code performance. Focus on identifying performance bottlenecks, memory leaks, and optimization opportunities.',
        schema: z.object({
          issues: z.array(z.string()),           // List of performance issues
          impact: z.enum(['low', 'medium', 'high']),  // Performance impact severity
          optimizations: z.array(z.string()),    // Optimization recommendations
        }),
        prompt: `Review this code:
      ${code}`,
      }),

      // Specialized reviewer #3: Code quality focus
      generateObject({
        model,
        system:
          'You are an expert in code quality. Focus on code structure, readability, and adherence to best practices.',
        schema: z.object({
          concerns: z.array(z.string()),         // Code quality concerns
          qualityScore: z.number().min(1).max(10),  // Numerical quality rating
          recommendations: z.array(z.string()),   // Quality improvement suggestions
        }),
        prompt: `Review this code:
      ${code}`,
      }),
    ]);

  // STEP 2: Combine results with type labels for the aggregator
  const reviews = [
    { ...securityReview.object, type: 'security' },
    { ...performanceReview.object, type: 'performance' },
    { ...maintainabilityReview.object, type: 'maintainability' },
  ];

  // STEP 3: Aggregate results using another model instance
  // This is the "aggregator" that synthesizes all parallel outputs
  const { text: summary } = await generateText({
    model,
    system: 'You are a technical lead summarizing multiple code reviews.',
    prompt: `Synthesize these code review results into a concise summary with key actions:
    ${JSON.stringify(reviews, null, 2)}`,
  });

  return { reviews, summary };
}

This pattern's uniqueness: runs tasks at the same time, not one after the other. Instead of waiting for each step to finish, it uses Promise.all() to handle tasks running together. After that, it combines all the results into one complete outcome. This approach is especially useful for tasks that need focus on different areas.

Orchestrator-Worker

Description

The orchestrator-worker pattern uses a main LLM (the orchestrator) to organize and manage tasks, assigning specific jobs to specialized worker LLMs. Unlike parallel processing where tasks are set in advance, the orchestrator decides on the tasks and how they should be carried out as needed.

Diagram

When It's Useful

When tasks require planning before execution
When subtasks aren't known in advance and need to be determined dynamically
When different specialized expertise is needed for different components
When a high-level view needs to coordinate low-level implementations
For complex software development tasks spanning multiple files or systems

Code Example

import { openai } from '@ai-sdk/openai';
import { generateObject } from 'ai';
import { z } from 'zod';

async function implementFeature(featureRequest: string) {
  // STEP 1: Orchestrator -> Plan the implementation
  // The orchestrator determines what work needs to be done
  const { object: implementationPlan } = await generateObject({
    model: openai('o3-mini'),
    schema: z.object({
      // Define which files need changes and what kind of changes
      files: z.array(
        z.object({
          purpose: z.string(),                              // Why this file needs changes
          filePath: z.string(),                             // Which file to modify
          changeType: z.enum(['create', 'modify', 'delete']), // How to change it
        }),
      ),
      estimatedComplexity: z.enum(['low', 'medium', 'high']),
    }),
    system:
      'You are a senior software architect planning feature implementations.',
    prompt: `Analyze this feature request and create an implementation plan:
    ${featureRequest}`,
  });

  // STEP 2: Workers -> Execute the planned changes
  // Each worker gets a specific task from the plan
  const fileChanges = await Promise.all(
    implementationPlan.files.map(async file => {
      // Select specialized worker prompt based on the task type
      const workerSystemPrompt = {
        create:
          'You are an expert at implementing new files following best practices and project patterns.',
        modify:
          'You are an expert at modifying existing code while maintaining consistency and avoiding regressions.',
        delete:
          'You are an expert at safely removing code while ensuring no breaking changes.',
      }[file.changeType];

      // Worker LLM performs the specialized task
      const { object: change } = await generateObject({
        model: openai('gpt-4o'),
        schema: z.object({
          explanation: z.string(),  // Why these changes were made
          code: z.string(),         // The actual implementation
        }),
        system: workerSystemPrompt,
        prompt: `Implement the changes for ${file.filePath} to support:
        ${file.purpose}

        Consider the overall feature context:
        ${featureRequest}`,
      });

      return {
        file,
        implementation: change,
      };
    }),
  );

  // The results are naturally synthesized into a structured response
  return {
    plan: implementationPlan,    // The orchestrator's plan
    changes: fileChanges,        // The workers' implementations
  };
}

This pattern's uniqueness: it uses a step-by-step, planned method. Unlike parallelization, where tasks run at the same time, or routing, which picks one path, the orchestrator-worker pattern first decides what needs to be done and then assigns tasks to workers with specific skills. The main point is that the orchestrator figures out the tasks needed, and then the workers carry out these tasks based on their skills.

Evaluator-Optimizer

Description

The evaluator-optimizer pattern sets up a feedback loop where one LLM produces content and another checks it against certain standards. Depending on the evaluation, the content is either approved or returned with suggestions for improvement. This system keeps improving itself, refining the outputs until they reach the desired quality.

Diagram

When It's Useful

When quality standards are clearly definable
When initial outputs can benefit from targeted refinement
When multiple iterations can significantly improve results
When specialized evaluation expertise is valuable
For creative or technical content that benefits from critique and revision

Code Example

import { openai } from '@ai-sdk/openai';
import { generateText, generateObject } from 'ai';
import { z } from 'zod';

async function translateWithFeedback(text: string, targetLanguage: string) {
  let currentTranslation = '';
  let iterations = 0;
  const MAX_ITERATIONS = 3;  // Safety limit to prevent infinite loops

  // STEP 1: Initial generation -> Start with a smaller, faster model
  // This creates the first version that will enter the feedback loop
  const { text: translation } = await generateText({
    model: openai('gpt-4o-mini'),  // Smaller model for initial attempt
    system: 'You are an expert literary translator.',
    prompt: `Translate this text to ${targetLanguage}, preserving tone and cultural nuances:
    ${text}`,
  });

  currentTranslation = translation;

  // STEP 2: Evaluation-optimization loop
  // This is the core of the pattern -> a cycle of evaluation and improvement
  while (iterations < MAX_ITERATIONS) {
    // STEP 2A: Evaluate the current solution
    // Note the use of a larger model for critical evaluation
    const { object: evaluation } = await generateObject({
      model: openai('gpt-4o'),  // Larger model for more discerning evaluation
      schema: z.object({
        qualityScore: z.number().min(1).max(10),     // Numerical quality rating
        preservesTone: z.boolean(),                   // Binary quality checks
        preservesNuance: z.boolean(),
        culturallyAccurate: z.boolean(),
        specificIssues: z.array(z.string()),          // Detailed feedback points
        improvementSuggestions: z.array(z.string()),  // Constructive suggestions
      }),
      system: 'You are an expert in evaluating literary translations.',
      prompt: `Evaluate this translation:

      Original: ${text}
      Translation: ${currentTranslation}

      Consider:
      1. Overall quality
      2. Preservation of tone
      3. Preservation of nuance
      4. Cultural accuracy`,
    });

    // STEP 2B: Check if quality meets threshold -> exit condition
    if (
      evaluation.qualityScore >= 8 &&
      evaluation.preservesTone &&
      evaluation.preservesNuance &&
      evaluation.culturallyAccurate
    ) {
      break;  // Exit the loop if quality standards are met
    }

    // STEP 2C: Generate improved version based on specific feedback
    // This uses the evaluation to target exactly what needs improvement
    const { text: improvedTranslation } = await generateText({
      model: openai('gpt-4o'),  // Use larger model for refinement
      system: 'You are an expert literary translator.',
      prompt: `Improve this translation based on the following feedback:
      ${evaluation.specificIssues.join('\n')}
      ${evaluation.improvementSuggestions.join('\n')}

      Original: ${text}
      Current Translation: ${currentTranslation}`,
    });

    // Update for next iteration
    currentTranslation = improvedTranslation;
    iterations++;
  }

  return {
    finalTranslation: currentTranslation,
    iterationsRequired: iterations,  // Tracking improvement efficiency
  };
}

This pattern's uniqueness: uses an iterative refinement cycle. Unlike sequential processing, which follows a straight path, or parallelization, which handles different parts at the same time, the evaluator-optimizer creates a feedback loop for ongoing improvement. The main point is the structured evaluation that gives specific, useful feedback, and the iterative method that uses this feedback until the quality is good enough or the maximum number of tries is reached.

Multi-Step Tool Usage

Core Pattern

Give LLM access to functional tools (like calculator)
Set maxSteps parameter (e.g., 10) to allow multiple tool calls
LLM decides when and which tools to use as it works

How It Works

The LLM solves problems by repeatedly using tools as needed
Each tool call + result = one "step"
The SDK automatically handles the loop of:
- LLM calls tool → Tool executes → Result returns to LLM → LLM continues

Getting Structured Output (Previously Confusing Parts)

Key Insight #1: LLMs naturally want to return text, not structured data
Key Insight #2: Tools without execute functions terminate the agent
Solution:
1. Create an "answer tool" with NO execute function
2. Give this tool a schema for your desired output structure
3. Set toolChoice: 'required' to force the LLM to use a tool for its final response

Final Process Flow

LLM uses regular tools as needed to solve the problem
When ready to conclude, it calls the answer tool (due to its description)
Since the answer tool has no execute function, the process terminates
The parameters from this final tool call become your structured output

Why This Matters

You get consistent, structured data instead of variable text
The LLM still has freedom to solve problems however it wants
You maintain control over the output format
The process has a clear, predictable endpoint

Agent Recipes (code & visuals)

Sequential Processing (Prompt Chaining)

Description

Diagram

When It's Useful

Code Example

Routing

Description

Diagram

When It's Useful

Code Example

Parallelization

Description

Diagram

When It's Useful

Code Example

Orchestrator-Worker

Description

Diagram

When It's Useful

Code Example

Evaluator-Optimizer

Description

Diagram

When It's Useful

Code Example

Multi-Step Tool Usage

Core Pattern

How It Works

Getting Structured Output (Previously Confusing Parts)

Final Process Flow

Why This Matters

Comments

More from this blog

Become a cracked product engineer today

Bipedal, humanoid, and the words for creature shapes in games

Lerp and smoothstep, what they actually do

IK and FK, what they actually are

How to make your textures fast on the GPU

Command Palette

Sequential Processing (Prompt Chaining)

Description

Diagram

When It's Useful

Code Example

Routing

Description

Diagram

When It's Useful

Code Example

Parallelization

Description

Diagram

When It's Useful

Code Example

Orchestrator-Worker

Description

Diagram

When It's Useful

Code Example

Evaluator-Optimizer

Description

Diagram

When It's Useful

Code Example

Multi-Step Tool Usage

Core Pattern

How It Works

Getting Structured Output (Previously Confusing Parts)

Final Process Flow

Why This Matters

Comments

More from this blog