Building Multi-Agent Systems with AutoGen and CrewAI

Table of Contents

Building Production AI Systems - This article is part of a series.

Part : This Article

Part : Fine-tuning vs RAG vs Prompt Engineering 2025: Complete Decision Guide

Part : LLM Security Guide 2025: Prevent Prompt Injection and Data Leakage in Production

Part : Prompt Engineering Guide 2025: Build Production-Ready Prompt Libraries at Scale

Part : LLM Monitoring Guide 2025: Complete Tutorial for Production Observability

Part : Vector Database Comparison 2025: Complete Guide to Pinecone vs Weaviate vs Chroma vs Qdrant

Part : RAG Application Tutorial 2025: Build Production-Ready Retrieval Augmented Generation Systems

Part : Cost Optimization in AI Workloads

Part : Creating and Managing LLM APIs

Part : Running Local Large Language Models (LLMs)

Part : Model Context Protocol (MCP) Tutorial: Complete Developer Guide 2025

The Rise of Multi-Agent AI Systems
#

Multi-agent systems represent a paradigm shift in AI development, moving from single monolithic models to orchestrated teams of specialized agents. This approach enables complex problem-solving through collaboration, much like how human teams leverage diverse expertise.

graph TB
    subgraph "Multi-Agent System"
        A[Orchestrator] --> B[Research Agent]
        A --> C[Analysis Agent]
        A --> D[Writing Agent]
        A --> E[Review Agent]
        
        B -.-> C
        C -.-> D
        D -.-> E
        E -.-> A
    end
    
    F[User Request] --> A
    A --> G[Final Output]
    
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style F fill:#bbf,stroke:#333,stroke-width:2px
    style G fill:#9f9,stroke:#333,stroke-width:2px

Understanding Multi-Agent Architecture
#

Core Components
#

Agents: Autonomous entities with specific roles and capabilities
Communication Protocol: How agents exchange information
Orchestrator: Coordinates agent interactions and task flow
Memory System: Shared or individual agent memory
Tool Integration: External tools and APIs agents can use

Agent Design Patterns
#

// Base Agent Interface
interface Agent {
  id: string;
  role: string;
  capabilities: string[];
  model: string;
  temperature: number;
  
  execute(task: Task, context: Context): Promise<AgentResponse>;
  communicate(message: Message, recipient: Agent): Promise<void>;
}

// Specialized Agent Implementation
class ResearchAgent implements Agent {
  id = "research-001";
  role = "Research Specialist";
  capabilities = ["web_search", "paper_analysis", "fact_checking"];
  model = "gpt-4";
  temperature = 0.3;
  
  private tools: Map<string, Tool>;
  
  constructor(tools: Tool[]) {
    this.tools = new Map(tools.map(t => [t.name, t]));
  }
  
  async execute(task: Task, context: Context): Promise<AgentResponse> {
    // Agent-specific logic
    const plan = await this.planResearch(task);
    const results = await this.executeResearchPlan(plan);
    
    return {
      agentId: this.id,
      output: results,
      metadata: {
        toolsUsed: plan.tools,
        sources: results.sources,
      },
    };
  }
  
  private async planResearch(task: Task): Promise<ResearchPlan> {
    const prompt = `
      As a research specialist, create a plan to: ${task.description}
      Available tools: ${Array.from(this.tools.keys()).join(", ")}
      
      Return a structured plan with steps and required tools.
    `;
    
    // Call LLM to create plan
    return await this.llm.generateStructured(prompt, ResearchPlanSchema);
  }
}

Building with AutoGen
#

AutoGen by Microsoft provides a framework for building conversational multi-agent systems.

Basic AutoGen Implementation
#

import autogen
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager

# Configure LLM
config_list = [{
    "model": "gpt-4",
    "api_key": os.environ["OPENAI_API_KEY"],
}]

llm_config = {
    "seed": 42,
    "config_list": config_list,
    "temperature": 0,
}

# Create specialized agents
researcher = AssistantAgent(
    name="Researcher",
    system_message="""You are a research specialist. 
    Your role is to find accurate, up-to-date information from reliable sources.
    Always cite your sources and verify facts.""",
    llm_config=llm_config,
)

analyst = AssistantAgent(
    name="Analyst",
    system_message="""You are a data analyst.
    Your role is to analyze information, identify patterns, and provide insights.
    Use statistical reasoning and create visualizations when helpful.""",
    llm_config=llm_config,
)

writer = AssistantAgent(
    name="Writer",
    system_message="""You are a technical writer.
    Your role is to create clear, well-structured content based on research and analysis.
    Focus on clarity, accuracy, and engaging presentation.""",
    llm_config=llm_config,
)

critic = AssistantAgent(
    name="Critic",
    system_message="""You are a quality assurance specialist.
    Review outputs for accuracy, completeness, and quality.
    Provide constructive feedback and suggest improvements.""",
    llm_config=llm_config,
)

# Create user proxy for human interaction
user_proxy = UserProxyAgent(
    name="User",
    system_message="A human user.",
    code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
    human_input_mode="TERMINATE",
)

# Set up group chat
groupchat = GroupChat(
    agents=[user_proxy, researcher, analyst, writer, critic],
    messages=[],
    max_round=20,
)

manager = GroupChatManager(groupchat=groupchat, llm_config=llm_config)

# Start the conversation
user_proxy.initiate_chat(
    manager,
    message="Create a comprehensive report on the impact of AI on software development in 2024.",
)

Advanced AutoGen Patterns
#

1. Tool-Enabled Agents
#

from autogen import register_function

# Define tools for agents
def search_web(query: str) -> str:
    """Search the web for information."""
    # Implementation using search API
    return search_results

def analyze_code(code: str, language: str) -> dict:
    """Analyze code for quality and security issues."""
    # Implementation using code analysis tools
    return analysis_results

# Register tools with agents
register_function(
    search_web,
    caller=researcher,
    executor=user_proxy,
    description="Search the web for information",
)

register_function(
    analyze_code,
    caller=analyst,
    executor=user_proxy,
    description="Analyze code quality and security",
)

2. Persistent Memory System
#

import chromadb
from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent

# Set up vector database for memory
client = chromadb.Client()
collection = client.create_collection("agent_memory")

# Create agent with retrieval capabilities
memory_agent = RetrieveAssistantAgent(
    name="MemoryAgent",
    system_message="You have access to conversation history and can retrieve relevant past information.",
    llm_config=llm_config,
    retrieve_config={
        "collection": collection,
        "get_or_create": True,
    },
)

# Store conversation turns
def store_memory(agent_name: str, message: str, metadata: dict):
    collection.add(
        documents=[message],
        metadatas=[{
            "agent": agent_name,
            "timestamp": datetime.now().isoformat(),
            **metadata
        }],
        ids=[f"{agent_name}_{datetime.now().timestamp()}"]
    )

Building with CrewAI
#

CrewAI offers a more structured approach to building agent crews with defined roles and processes.

Basic CrewAI Implementation
#

from crewai import Agent, Task, Crew, Process

# Define agents with specific roles
researcher = Agent(
    role='Senior Research Analyst',
    goal='Uncover cutting-edge developments in AI and data science',
    backstory="""You work at a leading tech think tank.
    Your expertise lies in identifying emerging trends and technologies.""",
    verbose=True,
    allow_delegation=False,
    tools=[search_tool, arxiv_tool]
)

writer = Agent(
    role='Tech Content Strategist',
    goal='Craft compelling content on tech advancements',
    backstory="""You are a renowned Content Strategist, 
    known for your insightful and engaging articles.""",
    verbose=True,
    allow_delegation=True,
    tools=[writing_tool, seo_tool]
)

# Define tasks
research_task = Task(
    description="""Conduct a comprehensive analysis of the latest AI trends in 2024.
    Identify key breakthroughs, applications, and future implications.""",
    expected_output="""A detailed report with:
    - Key AI breakthroughs in 2024
    - Real-world applications
    - Future predictions
    - Supporting data and citations""",
    agent=researcher
)

writing_task = Task(
    description="""Using the research findings, create an engaging blog post
    about AI trends that appeals to both technical and non-technical audiences.""",
    expected_output="A well-structured blog post of 1500-2000 words with SEO optimization",
    agent=writer,
    context=[research_task]  # This task depends on research_task
)

# Create and run the crew
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,  # Tasks executed sequentially
    verbose=True
)

result = crew.kickoff()
print(result)

Advanced CrewAI Patterns
#

1. Hierarchical Process Flow
#

# Manager agent that delegates tasks
manager = Agent(
    role='Project Manager',
    goal='Ensure project success through effective delegation and coordination',
    backstory="""You're an experienced project manager skilled at breaking down
    complex projects and delegating to the right team members.""",
    verbose=True,
    allow_delegation=True
)

# Specialist agents
frontend_dev = Agent(
    role='Frontend Developer',
    goal='Create beautiful, responsive user interfaces',
    backstory="""You're a frontend specialist with expertise in React and TypeScript.""",
    verbose=True,
    tools=[code_tool, design_tool]
)

backend_dev = Agent(
    role='Backend Developer',
    goal='Build scalable, secure backend systems',
    backstory="""You're a backend expert specializing in Node.js and PostgreSQL.""",
    verbose=True,
    tools=[code_tool, database_tool]
)

# Hierarchical task structure
project_task = Task(
    description="Build a full-stack AI chatbot application",
    agent=manager,
    expected_output="Complete project plan with task assignments"
)

frontend_task = Task(
    description="Implement the chat UI with real-time updates",
    agent=frontend_dev,
    expected_output="React components with TypeScript"
)

backend_task = Task(
    description="Create REST API and WebSocket server",
    agent=backend_dev,
    expected_output="Node.js API with database integration"
)

# Create crew with hierarchical process
crew = Crew(
    agents=[manager, frontend_dev, backend_dev],
    tasks=[project_task, frontend_task, backend_task],
    process=Process.hierarchical,
    manager_llm=ChatOpenAI(temperature=0, model="gpt-4"),
    verbose=True
)

2. Custom Tools Integration
#

from crewai_tools import BaseTool
from typing import Type
from pydantic import BaseModel, Field

class CodeAnalysisInput(BaseModel):
    """Input schema for code analysis tool."""
    code: str = Field(description="The code to analyze")
    language: str = Field(description="Programming language")

class CodeAnalysisTool(BaseTool):
    name: str = "Code Analyzer"
    description: str = "Analyzes code for quality, security, and best practices"
    args_schema: Type[BaseModel] = CodeAnalysisInput
    
    def _run(self, code: str, language: str) -> str:
        # Integrate with actual code analysis service
        analysis = {
            "complexity": calculate_complexity(code),
            "security_issues": scan_security(code),
            "style_violations": check_style(code, language),
            "test_coverage": estimate_coverage(code)
        }
        
        return json.dumps(analysis, indent=2)

# Attach tool to agent
code_reviewer = Agent(
    role='Senior Code Reviewer',
    goal='Ensure code quality and security',
    backstory="""You're a seasoned developer with a keen eye for code quality.""",
    tools=[CodeAnalysisTool()],
    verbose=True
)

Production Deployment Strategies
#

1. Containerized Deployment
#

# Dockerfile for multi-agent system
FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy agent configurations and code
COPY agents/ ./agents/
COPY config/ ./config/
COPY main.py .

# Environment variables for API keys
ENV OPENAI_API_KEY=""
ENV ANTHROPIC_API_KEY=""

# Run the multi-agent system
CMD ["python", "main.py"]

2. Scalable Architecture with Message Queues
#

import { Queue, Worker } from 'bullmq';
import Redis from 'ioredis';

// Redis connection for message queue
const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: parseInt(process.env.REDIS_PORT || '6379'),
});

// Create queues for different agent types
const researchQueue = new Queue('research-tasks', { connection: redis });
const analysisQueue = new Queue('analysis-tasks', { connection: redis });
const writingQueue = new Queue('writing-tasks', { connection: redis });

// Agent workers
const researchWorker = new Worker(
  'research-tasks',
  async (job) => {
    const { task, context } = job.data;
    const agent = new ResearchAgent();
    const result = await agent.execute(task, context);
    
    // Queue next task for analysis
    await analysisQueue.add('analyze', {
      researchResults: result,
      originalTask: task,
    });
    
    return result;
  },
  { connection: redis }
);

// Orchestration service
class OrchestrationService {
  async processRequest(request: UserRequest): Promise<FinalResult> {
    // Break down request into tasks
    const tasks = this.decomposeTasks(request);
    
    // Queue initial research tasks
    const jobs = await Promise.all(
      tasks.map(task => researchQueue.add('research', { task }))
    );
    
    // Monitor job completion
    const results = await this.waitForCompletion(jobs);
    
    return this.aggregateResults(results);
  }
  
  private decomposeTasks(request: UserRequest): Task[] {
    // Use LLM to break down complex request into subtasks
    const decomposer = new TaskDecomposer();
    return decomposer.decompose(request);
  }
}

3. Monitoring and Observability
#

import { MeterProvider } from '@opentelemetry/sdk-metrics';
import { PrometheusExporter } from '@opentelemetry/exporter-prometheus';
import * as Sentry from '@sentry/node';

// Initialize monitoring
Sentry.init({
  dsn: process.env.SENTRY_DSN,
  tracesSampleRate: 1.0,
});

// Prometheus metrics
const exporter = new PrometheusExporter({
  port: 9464,
}, () => {
  console.log('Prometheus metrics server started on port 9464');
});

const meterProvider = new MeterProvider();
meterProvider.addMetricReader(exporter);
const meter = meterProvider.getMeter('multi-agent-system');

// Custom metrics
const agentExecutionTime = meter.createHistogram('agent_execution_time', {
  description: 'Time taken for agent execution',
  unit: 'ms',
});

const taskQueueDepth = meter.createObservableGauge('task_queue_depth', {
  description: 'Number of tasks in queue',
});

// Agent execution wrapper with monitoring
async function monitoredAgentExecution(
  agent: Agent,
  task: Task,
  context: Context
): Promise<AgentResponse> {
  const startTime = Date.now();
  const transaction = Sentry.startTransaction({
    op: 'agent.execution',
    name: `${agent.role} - ${task.type}`,
  });
  
  try {
    const result = await agent.execute(task, context);
    
    agentExecutionTime.record(Date.now() - startTime, {
      agent_role: agent.role,
      task_type: task.type,
      success: 'true',
    });
    
    transaction.setStatus('ok');
    return result;
  } catch (error) {
    agentExecutionTime.record(Date.now() - startTime, {
      agent_role: agent.role,
      task_type: task.type,
      success: 'false',
    });
    
    Sentry.captureException(error);
    transaction.setStatus('internal_error');
    throw error;
  } finally {
    transaction.finish();
  }
}

Best Practices for Multi-Agent Systems
#

1. Agent Communication Patterns
#

from enum import Enum
from dataclasses import dataclass
from typing import Any, List, Optional

class MessageType(Enum):
    REQUEST = "request"
    RESPONSE = "response"
    BROADCAST = "broadcast"
    ERROR = "error"

@dataclass
class Message:
    sender_id: str
    recipient_id: Optional[str]  # None for broadcasts
    message_type: MessageType
    content: Any
    timestamp: float
    correlation_id: str

class CommunicationProtocol:
    def __init__(self):
        self.message_queue: List[Message] = []
        self.subscribers: Dict[str, List[Callable]] = {}
    
    def send_message(self, message: Message):
        # Direct message
        if message.recipient_id:
            self.deliver_to_agent(message.recipient_id, message)
        # Broadcast
        else:
            self.broadcast_message(message)
    
    def subscribe(self, agent_id: str, handler: Callable):
        if agent_id not in self.subscribers:
            self.subscribers[agent_id] = []
        self.subscribers[agent_id].append(handler)
    
    def deliver_to_agent(self, agent_id: str, message: Message):
        handlers = self.subscribers.get(agent_id, [])
        for handler in handlers:
            handler(message)

2. Error Handling and Recovery
#

class AgentErrorHandler:
    def __init__(self, max_retries: int = 3):
        self.max_retries = max_retries
        self.fallback_agents: Dict[str, List[Agent]] = {}
    
    async def execute_with_fallback(
        self,
        primary_agent: Agent,
        task: Task,
        context: Context
    ) -> AgentResponse:
        last_error = None
        
        # Try primary agent
        for attempt in range(self.max_retries):
            try:
                return await primary_agent.execute(task, context)
            except Exception as e:
                last_error = e
                await asyncio.sleep(2 ** attempt)  # Exponential backoff
        
        # Try fallback agents
        fallbacks = self.fallback_agents.get(primary_agent.role, [])
        for fallback_agent in fallbacks:
            try:
                logger.warning(f"Using fallback agent: {fallback_agent.id}")
                return await fallback_agent.execute(task, context)
            except Exception as e:
                continue
        
        # All attempts failed
        raise AgentExecutionError(
            f"All agents failed for role {primary_agent.role}",
            last_error
        )

3. Performance Optimization
#

// Agent pooling for better resource utilization
class AgentPool {
  private pools: Map<string, Agent[]> = new Map();
  private inUse: Set<string> = new Set();
  
  constructor(private config: PoolConfig) {
    this.initializePools();
  }
  
  async getAgent(role: string): Promise<Agent> {
    const pool = this.pools.get(role);
    if (!pool) {
      throw new Error(`No pool for role: ${role}`);
    }
    
    // Find available agent
    for (const agent of pool) {
      if (!this.inUse.has(agent.id)) {
        this.inUse.add(agent.id);
        return agent;
      }
    }
    
    // Wait for available agent
    return await this.waitForAvailableAgent(role);
  }
  
  releaseAgent(agent: Agent): void {
    this.inUse.delete(agent.id);
  }
  
  private async waitForAvailableAgent(role: string): Promise<Agent> {
    return new Promise((resolve) => {
      const checkInterval = setInterval(() => {
        const agent = this.tryGetAgent(role);
        if (agent) {
          clearInterval(checkInterval);
          resolve(agent);
        }
      }, 100);
    });
  }
}

Integration with External Services
#

1. Supabase for Agent State Management
#

import { createClient } from '@supabase/supabase-js';

const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_ANON_KEY!
);

class AgentStateManager {
  async saveAgentState(agentId: string, state: AgentState) {
    const { data, error } = await supabase
      .from('agent_states')
      .upsert({
        agent_id: agentId,
        state: state,
        updated_at: new Date().toISOString(),
      });
    
    if (error) throw error;
    return data;
  }
  
  async getAgentState(agentId: string): Promise<AgentState | null> {
    const { data, error } = await supabase
      .from('agent_states')
      .select('state')
      .eq('agent_id', agentId)
      .single();
    
    if (error) return null;
    return data.state;
  }
  
  // Real-time state synchronization
  subscribeToStateChanges(agentId: string, callback: (state: AgentState) => void) {
    return supabase
      .channel(`agent-state-${agentId}`)
      .on('postgres_changes', {
        event: 'UPDATE',
        schema: 'public',
        table: 'agent_states',
        filter: `agent_id=eq.${agentId}`,
      }, (payload) => {
        callback(payload.new.state);
      })
      .subscribe();
  }
}

2. Modal for Scalable Agent Execution
#

import modal

stub = modal.Stub("multi-agent-system")

# Define container image with dependencies
image = modal.Image.debian_slim().pip_install([
    "autogen",
    "crewai",
    "openai",
    "anthropic",
    "langchain",
])

@stub.function(
    image=image,
    gpu="T4",  # Use GPU for faster inference
    timeout=300,
    retries=3,
)
def run_agent_task(agent_config: dict, task: dict, context: dict) -> dict:
    """Execute an agent task in Modal's serverless environment."""
    
    # Initialize agent based on config
    if agent_config["framework"] == "autogen":
        agent = create_autogen_agent(agent_config)
    elif agent_config["framework"] == "crewai":
        agent = create_crewai_agent(agent_config)
    
    # Execute task
    result = agent.execute(task, context)
    
    return {
        "agent_id": agent_config["id"],
        "task_id": task["id"],
        "result": result,
        "execution_time": time.time() - start_time,
    }

# Orchestrator that distributes tasks to Modal
class ModalOrchestrator:
    def __init__(self):
        self.modal_function = modal.Function.lookup("multi-agent-system", "run_agent_task")
    
    async def execute_parallel_tasks(self, tasks: List[Task]) -> List[Result]:
        # Launch all tasks in parallel on Modal
        futures = []
        for task in tasks:
            agent_config = self.select_agent_for_task(task)
            future = self.modal_function.spawn(
                agent_config=agent_config,
                task=task.to_dict(),
                context=self.get_context(),
            )
            futures.append(future)
        
        # Collect results
        results = []
        for future in futures:
            result = future.get()
            results.append(result)
        
        return results

3. Trigger.dev for Orchestration
#

import { TriggerClient, eventTrigger, cronTrigger } from "@trigger.dev/sdk";
import { z } from "zod";

const client = new TriggerClient({
  id: "multi-agent-orchestrator",
  apiKey: process.env.TRIGGER_API_KEY!,
});

// Define multi-agent workflow
client.defineJob({
  id: "complex-research-workflow",
  name: "Complex Research Workflow",
  version: "1.0.0",
  trigger: eventTrigger({
    name: "research.requested",
    schema: z.object({
      topic: z.string(),
      depth: z.enum(["basic", "intermediate", "comprehensive"]),
      deadline: z.string().datetime(),
    }),
  }),
  run: async (payload, io, ctx) => {
    // Phase 1: Initial Research
    const researchResults = await io.runTask("initial-research", async () => {
      const agent = new ResearchAgent();
      return await agent.research(payload.topic);
    });
    
    // Phase 2: Parallel Analysis
    const analyses = await io.runTask("parallel-analysis", async () => {
      return await Promise.all([
        new TechnicalAnalyst().analyze(researchResults),
        new MarketAnalyst().analyze(researchResults),
        new CompetitorAnalyst().analyze(researchResults),
      ]);
    });
    
    // Phase 3: Synthesis
    const synthesis = await io.runTask("synthesis", async () => {
      const synthesizer = new SynthesisAgent();
      return await synthesizer.synthesize(analyses);
    });
    
    // Phase 4: Report Generation
    const report = await io.runTask("report-generation", async () => {
      const writer = new ReportWriter();
      return await writer.generateReport(synthesis, payload.depth);
    });
    
    // Phase 5: Quality Review
    const finalReport = await io.runTask("quality-review", async () => {
      const reviewer = new QualityReviewer();
      const feedback = await reviewer.review(report);
      
      if (feedback.approved) {
        return report;
      } else {
        // Iterate with improvements
        return await writer.revise(report, feedback);
      }
    });
    
    // Send notification
    await io.sendEvent("research.completed", {
      topic: payload.topic,
      reportUrl: await uploadReport(finalReport),
      completedAt: new Date().toISOString(),
    });
    
    return finalReport;
  },
});

Testing Multi-Agent Systems
#

Unit Testing Individual Agents
#

import pytest
from unittest.mock import Mock, patch

class TestResearchAgent:
    @pytest.fixture
    def agent(self):
        return ResearchAgent(
            tools=[Mock(name="search_tool")],
            llm_config={"model": "gpt-4", "temperature": 0.3}
        )
    
    @patch('openai.ChatCompletion.create')
    def test_research_planning(self, mock_openai, agent):
        # Mock LLM response
        mock_openai.return_value = {
            "choices": [{
                "message": {
                    "content": json.dumps({
                        "steps": ["search recent papers", "analyze findings"],
                        "tools": ["search_tool"]
                    })
                }
            }]
        }
        
        task = Task(description="Research AI trends")
        plan = agent.plan_research(task)
        
        assert len(plan.steps) == 2
        assert "search_tool" in plan.tools

Integration Testing Agent Interactions
#

describe('Multi-Agent Workflow', () => {
  let orchestrator: Orchestrator;
  let researchAgent: ResearchAgent;
  let analysisAgent: AnalysisAgent;
  
  beforeEach(() => {
    researchAgent = new ResearchAgent();
    analysisAgent = new AnalysisAgent();
    orchestrator = new Orchestrator([researchAgent, analysisAgent]);
  });
  
  it('should complete research and analysis workflow', async () => {
    const request = {
      type: 'research_and_analyze',
      topic: 'Quantum Computing Applications',
    };
    
    const result = await orchestrator.processRequest(request);
    
    expect(result).toHaveProperty('research');
    expect(result).toHaveProperty('analysis');
    expect(result.research.sources).toHaveLength(greaterThan(0));
    expect(result.analysis.insights).toBeDefined();
  });
  
  it('should handle agent failures gracefully', async () => {
    // Simulate agent failure
    jest.spyOn(researchAgent, 'execute').mockRejectedValue(
      new Error('API rate limit exceeded')
    );
    
    const request = {
      type: 'research_and_analyze',
      topic: 'Test Topic',
    };
    
    // Should use fallback or retry mechanism
    const result = await orchestrator.processRequest(request);
    expect(result.metadata.usedFallback).toBe(true);
  });
});

Performance Optimization
#

1. Parallel Execution Strategies
#

import asyncio
from concurrent.futures import ThreadPoolExecutor

class ParallelExecutor:
    def __init__(self, max_workers: int = 10):
        self.executor = ThreadPoolExecutor(max_workers=max_workers)
    
    async def execute_parallel_tasks(
        self,
        agents: List[Agent],
        tasks: List[Task],
        context: Context
    ) -> List[AgentResponse]:
        # Create agent-task pairs
        assignments = self.assign_tasks_to_agents(agents, tasks)
        
        # Execute in parallel
        loop = asyncio.get_event_loop()
        futures = []
        
        for agent, task in assignments:
            future = loop.run_in_executor(
                self.executor,
                agent.execute,
                task,
                context
            )
            futures.append(future)
        
        # Wait for all to complete
        results = await asyncio.gather(*futures, return_exceptions=True)
        
        # Handle results and exceptions
        return self.process_results(results, assignments)

2. Resource Management
#

class ResourceManager {
  private resources = new Map<string, Resource>();
  private usage = new Map<string, number>();
  
  async allocateResources(agent: Agent): Promise<ResourceAllocation> {
    const required = this.calculateRequiredResources(agent);
    
    // Wait for resources if not available
    while (!this.canAllocate(required)) {
      await this.waitForResources();
    }
    
    // Allocate resources
    const allocation = this.allocate(required);
    
    // Track usage
    this.trackUsage(agent.id, allocation);
    
    return allocation;
  }
  
  releaseResources(agentId: string): void {
    const allocation = this.usage.get(agentId);
    if (allocation) {
      this.release(allocation);
      this.usage.delete(agentId);
    }
  }
  
  private calculateRequiredResources(agent: Agent): ResourceRequirements {
    // Based on agent type and model
    const baseRequirements = {
      memory: agent.model.includes('gpt-4') ? 8192 : 4096,
      cpu: agent.role === 'Analyzer' ? 4 : 2,
      gpu: agent.requiresGPU ? 1 : 0,
    };
    
    return baseRequirements;
  }
}

Conclusion
#

Multi-agent systems represent the future of AI applications, enabling complex problem-solving through specialized, collaborative agents. Key takeaways:

Choose the Right Framework: AutoGen for conversational flows, CrewAI for structured workflows
Design Clear Agent Roles: Each agent should have a specific purpose and expertise
Implement Robust Communication: Use message queues and protocols for agent coordination
Monitor and Debug: Comprehensive logging and monitoring are essential
Scale Intelligently: Use services like Modal and Trigger.dev for production deployments
Test Thoroughly: Both individual agents and their interactions need testing

By following these patterns and best practices, you can build powerful multi-agent systems that solve complex problems more effectively than any single AI model could alone.

Resources
#

Building Production AI Systems - This article is part of a series.

Part : This Article

Part : Fine-tuning vs RAG vs Prompt Engineering 2025: Complete Decision Guide

Part : LLM Security Guide 2025: Prevent Prompt Injection and Data Leakage in Production

Part : Prompt Engineering Guide 2025: Build Production-Ready Prompt Libraries at Scale

Part : LLM Monitoring Guide 2025: Complete Tutorial for Production Observability

Part : Vector Database Comparison 2025: Complete Guide to Pinecone vs Weaviate vs Chroma vs Qdrant

Part : RAG Application Tutorial 2025: Build Production-Ready Retrieval Augmented Generation Systems

Part : Cost Optimization in AI Workloads

Part : Creating and Managing LLM APIs

Part : Running Local Large Language Models (LLMs)

Part : Model Context Protocol (MCP) Tutorial: Complete Developer Guide 2025

The Rise of Multi-Agent AI Systems#

Understanding Multi-Agent Architecture#

Core Components#

Agent Design Patterns#

Building with AutoGen#

Basic AutoGen Implementation#

Advanced AutoGen Patterns#

1. Tool-Enabled Agents#

2. Persistent Memory System#

Building with CrewAI#

Basic CrewAI Implementation#

Advanced CrewAI Patterns#

1. Hierarchical Process Flow#

2. Custom Tools Integration#

Production Deployment Strategies#

1. Containerized Deployment#

2. Scalable Architecture with Message Queues#

3. Monitoring and Observability#

Best Practices for Multi-Agent Systems#

1. Agent Communication Patterns#

2. Error Handling and Recovery#

3. Performance Optimization#

Integration with External Services#

1. Supabase for Agent State Management#

2. Modal for Scalable Agent Execution#

3. Trigger.dev for Orchestration#

Testing Multi-Agent Systems#

Unit Testing Individual Agents#

Integration Testing Agent Interactions#

Performance Optimization#

1. Parallel Execution Strategies#

2. Resource Management#

Conclusion#

Resources#