Shahzad Bhatti Welcome to my ramblings and rants!

October 14, 2025

Agentic AI for API Compatibility: Building Intelligent Guardians with LangChain and LangGraph

Filed under: Computing — admin @ 2:02 pm

Introduction

I’ve been in software development for decades, and if there’s one lesson that’s been burned into my memory through countless production incidents, it’s this: innocuous-looking API changes have an uncanny ability to break everything. You’re getting alerts—an API change that sailed through testing is breaking production. Customer support is calling. You’re coordinating an emergency rollback, wondering how your tests missed this entirely.

The Problem We Keep Facing

Throughout my career, I’ve watched teams struggle with the same challenge: API evolution shouldn’t be a game of Russian roulette. Yet “safe” changes repeatedly pass tests only to break production. Unit testing doesn’t catch the subtle semantic changes that break client integrations. For years, I’ve been building tools to solve this. I created PlexMockServices for API mocking, then evolved it into api-mock-service with full mock and contract testing support. These tools have saved us from many production incidents. I have also written about various testing methodologies for validating APIs such as:

When gRPC and Protocol Buffers arrived, I thought we’d finally solved it. Tools like Buf excel at catching wire-level protocol changes—remove a field, Buf catches it. But here’s what I discovered: Buf and similar tools only see part of the picture.

The Blind Spots

Traditional static analysis tools understand syntax but not semantics. They catch structural changes but miss:

  • Fields made required through validation rules—wire-compatible, but every client fails
  • Fields that were “always” populated until you made them conditional
  • Error messages that clients parse with regex
  • Sort orders that changed, breaking customer dashboards
  • Default values that shifted behavior

With enough users, all observable behaviors will be depended upon—that’s Hyrum’s Law. The challenge isn’t just detecting changes; it’s understanding their impact from every consumer’s perspective.

Enter Agentic AI

Over the past year, I’ve been experimenting with combining static analysis tools like Buf with the contextual understanding of Large Language Models. Not to replace traditional tools, but to augment them—to catch what they structurally cannot see. In this blog, I’ll show you how to build an intelligent API guardian using LangChain and LangGraph—an agentic AI system that:

  • Orchestrates multiple tools (Git, Buf, LLMs) in coordinated workflows
  • Understands not just what changed, but what it means
  • Catches both wire-level and semantic breaking changes
  • Explains why something breaks and how to fix it
  • Makes autonomous deployment decisions based on comprehensive analysis

Let me show you how we built this system and how you can implement it for your APIs. Those emergency customer calls about broken integrations might just become a thing of the past.

Architecture Overview: The Intelligent Pipeline

The key insight behind this approach is that no single tool can catch all breaking changes. Static analyzers like Buf excel at structural validation but can’t reason about semantics. LLMs understand context and business logic but lack the deterministic guarantees of rule-based systems. The solution? Combine them in an orchestrated pipeline where each component contributes its strengths.

What I’ve built is an intelligent pipeline that layers multiple detection strategies:

  • Buf provides fast, deterministic detection of wire-level protocol violations
  • LangGraph orchestrates a stateful workflow that coordinates all the analysis steps
  • LangChain manages the LLM interactions, handling prompts, retries, and structured output parsing
  • Vertex AI/Gemini brings semantic understanding to analyze what changes actually mean for API consumers

Here’s how these components work together in practice:

Setting Up the Environment

Let’s walk through setting up this system step by step. We’ll use a sample Todo API project as our example.

Prerequisites

# Clone the sample repository
git clone https://github.com/bhatti/todo-api-errors.git
cd todo-api-errors/check-api-break-automation

# Create Python virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Installing Buf

Buf is essential for proto file analysis:

# macOS
brew install bufbuild/buf/buf

# Linux
curl -sSL "https://github.com/bufbuild/buf/releases/latest/download/buf-Linux-x86_64" -o /usr/local/bin/buf
chmod +x /usr/local/bin/buf

# Verify installation
buf --version

Configuring Google Cloud and Vertex AI

  1. Set up GCP Project:
# Install gcloud CLI if not already installed
# Follow: https://cloud.google.com/sdk/docs/install

# Authenticate
gcloud auth application-default login

# Set your project
gcloud config set project YOUR_PROJECT_ID
  1. Enable Vertex AI API:
gcloud services enable aiplatform.googleapis.com
  1. Create Configuration File:
# Create .env file
cat > .env << EOF
GCP_PROJECT=your-project-id
GCP_REGION=us-central1
VERTEX_AI_MODEL=gemini-2.0-flash-exp
EOF

Implementation Deep Dive

The LangGraph State Machine

Our implementation uses LangGraph to create a deterministic workflow for analyzing API changes:

Here’s the core LangGraph implementation:

from langgraph.graph import StateGraph, MessagesState
from typing import TypedDict, List, Dict, Any
import logging

class CompatibilityState(TypedDict):
    """State for the compatibility checking workflow"""
    workspace_path: str
    proto_files: List[str]
    git_diff: str
    buf_results: Dict[str, Any]
    ai_analysis: Dict[str, Any]
    final_report: Dict[str, Any]
    can_deploy: bool

class CompatibilityChecker:
    def __init__(self, project_id: str, model_name: str = "gemini-2.0-flash-exp"):
        self.logger = logging.getLogger(__name__)
        self.project_id = project_id
        self.model = self._initialize_llm(model_name)
        self.workflow = self._build_workflow()

    def _build_workflow(self) -> StateGraph:
        """Build the LangGraph workflow"""
        workflow = StateGraph(CompatibilityState)

        # Add nodes for each step
        workflow.add_node("load_protos", self.load_proto_files)
        workflow.add_node("get_diff", self.get_git_diff)
        workflow.add_node("buf_check", self.run_buf_analysis)
        workflow.add_node("ai_analysis", self.run_ai_analysis)
        workflow.add_node("generate_report", self.generate_report)

        # Define the flow
        workflow.add_edge("load_protos", "get_diff")
        workflow.add_edge("get_diff", "buf_check")
        workflow.add_edge("buf_check", "ai_analysis")
        workflow.add_edge("ai_analysis", "generate_report")

        # Set entry point
        workflow.set_entry_point("load_protos")
        workflow.set_finish_point("generate_report")

        return workflow.compile()

Intelligent Prompt Engineering

The key to accurate breaking change detection lies in the prompt design. Here’s our approach:

def create_analysis_prompt(self, diff: str, buf_results: dict) -> str:
    """Create a comprehensive prompt for the LLM"""
    return f"""
    You are an API compatibility expert analyzing protobuf changes.

    CONTEXT:
    - This is a production API with existing consumers
    - Breaking changes can cause service outages
    - We follow semantic versioning principles

    STATIC ANALYSIS RESULTS:
    {json.dumps(buf_results, indent=2)}

    GIT DIFF:
    ```
    {diff}
    ```

    ANALYZE THE FOLLOWING:
    1. Wire-level breaking changes (trust buf results completely)
    2. Semantic breaking changes:
       - Required fields added without defaults
       - Field removals (always breaking)
       - Type changes that lose precision
       - Enum value removals or reordering

    3. Behavioral concerns:
       - Fields that might be parsed by consumers
       - Error message format changes
       - Ordering or filtering logic changes

    CRITICAL RULES:
    - If buf reports breaking changes, mark them as is_breaking=true
    - Field removal is ALWAYS breaking (severity: HIGH)
    - Adding REQUIRED fields is breaking (severity: MEDIUM-HIGH)
    - Be conservative - when in doubt, flag as potentially breaking

    OUTPUT FORMAT:
    Return a JSON object with this structure:
    {{
        "changes": [...],
        "overall_severity": "NONE|LOW|MEDIUM|HIGH|CRITICAL",
        "can_deploy": true|false,
        "recommendations": [...]
    }}
    """

Real-World Example: When Buf Missed Half the Problem

Let me show you exactly why we need AI augmentation with a concrete example. I’m going to intentionally break a Todo API in two different ways to demonstrate the difference between what traditional tools catch versus what our AI-enhanced system detects.

The Original Proto File

message Task {
  string id = 1;
  string title = 2;
  string description = 3;  // This field will be removed
  bool completed = 4;
  google.protobuf.Timestamp created_at = 5;
  google.protobuf.Timestamp updated_at = 6;
  repeated string tags = 7;
  TaskPriority priority = 8;
  string assignee_id = 9;
  google.protobuf.Timestamp due_date = 10;
  repeated Comment comments = 11;
}

The Modified Proto File

message Task {
  string id = 1;
  string title = 2;
  // REMOVED: string description = 3;
  bool completed = 4;
  google.protobuf.Timestamp created_at = 5;
  google.protobuf.Timestamp updated_at = 6;
  repeated string tags = 7;
  TaskPriority priority = 8;
  string assignee_id = 9;
  google.protobuf.Timestamp due_date = 10;
  repeated Comment comments = 11;

  // NEW REQUIRED FIELD ADDED:
  TaskMetadata metadata = 12 [(validate.rules).message.required = true];
}

message TaskMetadata {
  string created_by = 1;
  int64 version = 2;
  map<string, string> labels = 3;
}

What Buf Detected

When we ran buf breaking --against '.git#branch=main', Buf only detected one breaking change:

api/proto/todo/v1/todo.proto:83:3:Field "3" with name "description" on message "Task" was deleted.

Why did Buf miss the second breaking change? Because adding a field with [(validate.rules).message.required = true] is an application-level annotation, not a wire-protocol breaking change. Buf focuses on wire compatibility – it doesn’t understand application-level validation rules.

What Our AI-Enhanced System Detected

Here’s the actual output from our tool:

2025-10-14 18:29:11,388 - __main__ - INFO - Collecting git diffs...
2025-10-14 18:29:11,392 - __main__ - INFO - Analyzing with LLM...
2025-10-14 18:29:14,471 - __main__ - INFO - Generating final report...
================================================================================
API BACKWARD COMPATIBILITY REPORT
================================================================================
Timestamp: 2025-10-14T18:29:14.471705
Files Analyzed: api/proto/todo/v1/todo.proto
Total Changes: 2
Breaking Changes: 2
Overall Severity: HIGH
Can Deploy: NO

DETECTED CHANGES:
----------------------------------------
1. Removed field 'description'
   Location: api/proto/todo/v1/todo.proto:83
   Category: field_removal
   Breaking: YES
   Severity: HIGH
   Recommendation: Consider providing a migration path for clients relying on this field.

2. Added required field 'metadata'
   Location: api/proto/todo/v1/todo.proto:136
   Category: field_addition
   Breaking: YES
   Severity: HIGH
   Recommendation: Ensure all clients are updated to include this field before deployment.

LLM ANALYSIS:
----------------------------------------
The changes include the removal of the 'description' field and the addition of a required
'metadata' field, both of which are breaking changes.

================================================================================
2025-10-14 18:29:14,472 - __main__ - INFO - JSON report saved to results/non_breaking.json

The “Aha!” Moment

This is exactly the scenario I warned about in my presentation. Here’s what happened:

  1. Buf did its job – It caught the field removal. That’s wire-level breaking change detection working as designed.
  2. But Buf has blind spots – It completely missed the required field addition because [(validate.rules).message.required = true] is an application-level annotation. To Buf, it’s just another optional field on the wire.
  3. The AI understood context – Our LLM looked at that validation rule and immediately recognized: “Hey, this server is going to reject any request without this field. That’s going to break every existing client!”

Think about it – if we had only relied on Buf, we would have deployed thinking we fixed the one breaking change. Then boom – production down because no existing client sends the new metadata field. This is precisely why we need AI augmentation. It’s not about replacing Buf – it’s about catching what Buf structurally cannot see.

Beyond This Example

This pattern repeats across many scenarios that static analysis misses:

  • Validation rules that make previously optional behavior mandatory
  • Fields that were always populated but are now conditional
  • Changes to default values that alter behavior
  • Error message format changes (clients parse these!)
  • Response ordering changes (someone always depends on order)
  • Rate limiting or throttling policy changes
  • Authentication requirements that changed

Integrating with CI/CD

The tool can be integrated into your CI/CD pipeline:

# .github/workflows/api-compatibility.yml
name: API Compatibility Check

on:
  pull_request:
    paths:
      - '**/*.proto'

jobs:
  check-breaking-changes:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0  # Need full history for comparison

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install Buf
        run: |
          curl -sSL "https://github.com/bufbuild/buf/releases/latest/download/buf-Linux-x86_64" -o /usr/local/bin/buf
          chmod +x /usr/local/bin/buf

      - name: Install dependencies
        run: |
          pip install -r check-api-break-automation/requirements.txt

      - name: Run compatibility check
        env:
          GCP_PROJECT: ${{ secrets.GCP_PROJECT }}
        run: |
          cd check-api-break-automation
          python api_compatibility_checker.py \
            --workspace .. \
            --against origin/main \
            --output results/pr-check.json

      - name: Comment PR with results
        if: always()
        uses: actions/github-script@v6
        with:
          script: |
            const fs = require('fs');
            const results = JSON.parse(fs.readFileSync('check-api-break-automation/results/pr-check.json'));

            const comment = `## ? API Compatibility Check Results

            **Can Deploy**: ${results.can_deploy ? '? Yes' : '? No'}
            **Severity**: ${results.overall_severity}
            **Breaking Changes**: ${results.summary.total_breaking_changes}

            ${results.can_deploy ? '' : '### ?? Breaking Changes Detected\n' + results.recommendations.join('\n')}
            `;

            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: comment
            });

Advanced Features: RAG and MCP in Action

1. RAG (Retrieval-Augmented Generation): Learning from Past Mistakes

One of the most powerful aspects of our system is how it learns from history. Here’s how RAG actually works in our implementation:

from langchain.vectorstores import Chroma
from langchain.embeddings import VertexAIEmbeddings
from langchain.schema import Document

class BreakingChangeKnowledgeBase:
    """RAG system that learns from past breaking changes"""

    def __init__(self, project_id: str):
        self.embeddings = VertexAIEmbeddings(
            model_name="textembedding-gecko@003",
            project=project_id
        )
        # Store historical breaking changes in vector database
        self.vector_store = Chroma(
            collection_name="api_breaking_changes",
            embedding_function=self.embeddings,
            persist_directory="./knowledge_base"
        )

    def index_breaking_change(self, change_data: dict):
        """Store a breaking change incident for future reference"""
        doc = Document(
            page_content=f"""
            Proto Change: {change_data['diff']}
            Breaking Type: {change_data['type']}
            Customer Impact: {change_data['impact']}
            Resolution: {change_data['resolution']}
            """,
            metadata={
                "severity": change_data['severity'],
                "date": change_data['date'],
                "service": change_data['service'],
                "prevented": change_data.get('caught_before_prod', False)
            }
        )
        self.vector_store.add_documents([doc])

    def find_similar_changes(self, current_diff: str, k: int = 5):
        """Find similar past breaking changes"""
        results = self.vector_store.similarity_search_with_score(
            current_diff,
            k=k,
            filter={"severity": {"$in": ["HIGH", "CRITICAL"]}}
        )
        return results

# How it's used in the main checker:
class CompatibilityChecker:
    def __init__(self, project_id: str):
        self.knowledge_base = BreakingChangeKnowledgeBase(project_id)

    def run_ai_analysis(self, state: dict):
        """Enhanced AI analysis using RAG"""
        # Find similar past incidents
        similar_incidents = self.knowledge_base.find_similar_changes(
            state['git_diff']
        )

        # Build context from past incidents
        historical_context = ""
        if similar_incidents:
            historical_context = "\n\nSIMILAR PAST INCIDENTS:\n"
            for doc, score in similar_incidents:
                if score > 0.8:  # High similarity
                    historical_context += f"""
                    - Previous incident: {doc.metadata['date']}
                      Impact: {doc.page_content}
                      This suggests high risk of similar issues.
                    """

        # Include historical context in prompt
        enhanced_prompt = f"""
        {self.base_prompt}

        {historical_context}

        Based on historical patterns, pay special attention to similar past issues.
        """

        return self.llm.invoke(enhanced_prompt)

2. Model Context Protocol (MCP) Integration

MCP allows our AI to interact with external tools seamlessly. Here’s the actual implementation:

# mcp_server.py - MCP server for API compatibility tools
from mcp.server import MCPServer
from mcp.tools import Tool, ToolResult
import subprocess
import json

class APICompatibilityMCPServer(MCPServer):
    """MCP server exposing API compatibility tools to AI agents"""

    def __init__(self):
        super().__init__("api-compatibility-checker")
        self.register_tools()

    def register_tools(self):
        """Register all available tools"""

        @self.tool("buf_lint")
        async def buf_lint(proto_path: str) -> ToolResult:
            """Run buf lint on proto files"""
            result = subprocess.run(
                ["buf", "lint", proto_path],
                capture_output=True,
                text=True
            )
            return ToolResult(
                success=result.returncode == 0,
                output=result.stdout,
                error=result.stderr
            )

        @self.tool("buf_breaking")
        async def buf_breaking(proto_path: str, against: str = "main") -> ToolResult:
            """Check for breaking changes using buf"""
            cmd = [
                "buf", "breaking",
                "--against", f".git#branch={against}",
                "--path", proto_path
            ]
            result = subprocess.run(cmd, capture_output=True, text=True)

            # Parse breaking changes
            breaking_changes = []
            for line in result.stdout.splitlines():
                if line.strip():
                    breaking_changes.append(self.parse_buf_output(line))

            return ToolResult(
                success=True,
                data={
                    "has_breaking": len(breaking_changes) > 0,
                    "changes": breaking_changes,
                    "raw_output": result.stdout
                }
            )

        @self.tool("check_consumer_contracts")
        async def check_contracts(service: str, version: str) -> ToolResult:
            """Check if change breaks consumer contracts"""
            # This connects to our contract testing system
            contracts = self.load_consumer_contracts(service)
            violations = []

            for contract in contracts:
                if not self.validate_contract(contract, version):
                    violations.append({
                        "consumer": contract["consumer"],
                        "expectation": contract["expectation"],
                        "impact": "Contract violation detected"
                    })

            return ToolResult(
                success=True,
                data={
                    "total_consumers": len(contracts),
                    "violations": violations,
                    "safe_to_deploy": len(violations) == 0
                }
            )

        @self.tool("generate_migration_guide")
        async def generate_migration(breaking_changes: list) -> ToolResult:
            """Generate migration guide for breaking changes"""
            guide = self.create_migration_steps(breaking_changes)
            return ToolResult(
                success=True,
                data={"migration_guide": guide}
            )

# How LangChain uses MCP tools:
from langchain.agents import create_mcp_agent
from langchain_mcp import MCPToolkit

# Initialize MCP toolkit
mcp_toolkit = MCPToolkit(
    server_url="http://localhost:8080",  # MCP server endpoint
    available_tools=["buf_lint", "buf_breaking", "check_consumer_contracts"]
)

# Create agent with MCP tools
agent = create_mcp_agent(
    llm=llm,
    tools=mcp_toolkit.get_tools(),
    system_prompt="""
    You are an API compatibility expert. Use the available MCP tools to:
    1. Run buf lint and breaking checks
    2. Verify consumer contracts
    3. Generate migration guides when needed

    Always check consumer contracts after detecting breaking changes.
    """
)

# Usage in the main workflow
class CompatibilityChecker:
    def __init__(self):
        self.mcp_agent = agent

    def comprehensive_check(self, proto_path: str):
        """Run comprehensive compatibility check using MCP tools"""

        # Let the agent orchestrate the tools
        result = self.mcp_agent.invoke({
            "input": f"""
            Analyze {proto_path} for breaking changes:
            1. Run buf lint first
            2. Check breaking changes against main branch
            3. If breaking changes found, check consumer contracts
            4. Generate migration guide if needed
            """
        })

        return result

3. How RAG + MCP Work Together

Here’s the magic – combining RAG’s historical knowledge with MCP’s tool access:

class IntelligentAPIGuardian:
    """Combines RAG and MCP for comprehensive analysis"""

    def analyze_change(self, proto_diff: str):
        # Step 1: Use MCP to run all tools
        mcp_results = self.mcp_agent.invoke({
            "input": f"Analyze this diff: {proto_diff}"
        })

        # Step 2: Use RAG to find similar past incidents
        historical_data = self.knowledge_base.find_similar_changes(proto_diff)

        # Step 3: Combine insights
        combined_analysis = self.llm.invoke(f"""
        Current change analysis from tools:
        {mcp_results}

        Historical patterns from similar changes:
        {historical_data}

        Synthesize a comprehensive risk assessment considering both
        current tool results and historical precedents.

        If historical data shows issues that tools didn't catch,
        flag them as "HISTORICAL_RISK" items.
        """)

        # Step 4: Store this analysis for future RAG queries
        if combined_analysis['has_breaking_changes']:
            self.knowledge_base.index_breaking_change({
                'diff': proto_diff,
                'type': combined_analysis['breaking_type'],
                'impact': combined_analysis['impact'],
                'resolution': combined_analysis['recommendations'],
                'severity': combined_analysis['severity'],
                'date': datetime.now(),
                'caught_before_prod': True
            })

        return combined_analysis

The Power of This Combination:

  • MCP gives us real-time tool access – running buf, checking contracts, generating migrations
  • RAG gives us institutional memory – learning from every incident, getting smarter over time
  • Together they catch issues that neither could find alone

For example, RAG might recall “last time we added a required field to Task, the mobile team’s app crashed because they cache responses for 24 hours” – something no static tool would know, but crucial for preventing an outage.

Testing the System

Here’s a complete walkthrough of testing the system:

# 1. First, verify your setup
python test_simple.py

# Output should show:
# ? All core modules imported successfully
# ? Proto file found
# ? Proto modifier works - 12 test scenarios available
# ? Buf integration initialized successfully
# ? GCP_PROJECT configured: your-project-id
# ? Vertex AI connection verified

# 2. Make breaking changes to the proto file
python proto_modifier.py ../api/proto/todo/v1/todo.proto \
  --scenario remove_field

python proto_modifier.py ../api/proto/todo/v1/todo.proto \
  --scenario add_required_field

# 3. Run the compatibility checker
python api_compatibility_checker.py \
  --workspace .. \
  --against '.git#branch=main' \
  --output results/breaking_changes.json

# 4. Review the detailed report
cat results/breaking_changes.json | jq '.'

Lessons Learned and Best Practices

  1. Combine Multiple Analysis Methods: Static analysis catches structure, AI catches semantics
  2. Use Conservative Defaults: When uncertain, flag as potentially breaking
  3. Provide Clear Explanations: Developers need to understand why something is breaking
  4. Version Your Prompts: Treat prompts as code – version and test them
  5. Monitor LLM Costs: Use caching and optimize prompt sizes
  6. Implement Gradual Rollout: Start with warnings before blocking deployments
  7. Build Team Trust Gradually: Don’t start by blocking deployments. Run in shadow mode first, report findings alongside Buf results, and let teams see the value before enforcement. Track false positives and tune your prompts based on real feedback.
  8. Document Your Prompts: Your prompt engineering is as critical as your code. Version control your prompts, document why certain instructions exist, and treat them as first-class artifacts that need testing and review.

The Power of Agentic AI

What makes this approach “agentic” rather than just AI-assisted?

  1. Autonomous Decision Making: The system doesn’t just flag issues – it makes decisions whether API changes can deployed
  2. Multi-Step Reasoning: It performs complex analysis chains without human intervention
  3. Tool Integration: It orchestrates multiple tools (Git, Buf, LLMs) to achieve its goal
  4. Contextual Understanding: It considers historical patterns and project-specific rules
  5. Actionable Output: It provides specific remediation steps, not just warnings

Future Enhancements

The roadmap for this tool includes:

  1. Multi-Protocol Support: Extend beyond protobuf/gRPC to OpenAPI and GraphQL
  2. Behavioral Testing: Integration with contract testing frameworks
  3. Auto-Migration Generation: Create migration scripts for breaking changes
  4. Client SDK Updates: Automatically update client libraries
  5. Performance Impact Analysis: Predict performance implications of changes

Known Limitations: This system excels at catching semantic and behavioral changes, but it’s not perfect. It can’t predict how undocumented client implementations behave, can’t catch changes in external dependencies your API relies on, and can’t guarantee zero false positives. Human judgment remains essential—especially for nuanced cases where breaking changes might be intentional and necessary.

Conclusion

Throughout my decades in software development, I’ve learned that API compatibility isn’t just about wire protocols and field numbers. It’s about understanding how our users actually depend on our APIs—all the documented behaviors, the undocumented quirks, and yes, even the bugs they’ve built workarounds for. Traditional static analysis tools like Buf are essential—they catch structural breaking changes with perfect precision. But as we’ve seen with the required field example, they can’t reason about semantic changes, business context, or application-level validation rules. That’s where AI augmentation transforms the game. By combining Buf’s deterministic analysis with an LLM’s contextual understanding through LangChain and LangGraph, we’re not just catching more bugs—we’re fundamentally changing how we think about API evolution.

The complete implementation, including all the code and configurations demonstrated in this article, is available at: https://github.com/bhatti/todo-api-errors. Fork it, experiment with it, break it, improve it.

Resources and References


Postel’s Law: “Be conservative in what you send, liberal in what you accept” – but with Agentic AI, we can be intelligent about both.

Hyrum’s Law: “With a sufficient number of users, all observable behaviors will be depended upon” – which is why we need AI to catch the subtle breaking changes that static analysis misses.

No Comments

No comments yet.

RSS feed for comments on this post. TrackBack URL

Sorry, the comment form is closed at this time.

Powered by WordPress