Pragmatic Agentic AI: How I Rebuilt Years of FinTech Infrastructure with ReAct, RAG, and Free Local Models

October 21, 2025

Pragmatic Agentic AI: How I Rebuilt Years of FinTech Infrastructure with ReAct, RAG, and Free Local Models

Filed under: Agentic AI,Computing — Tags: Agentic AI, Langchain, Langgraph, RAG, ReAct — admin @ 1:00 pm

I spent over a decade in FinTech building the systems traders rely on every day like high-performance APIs streaming real-time charts, technical indicator calculators processing millions of data points per second, and comprehensive analytical platforms ingesting SEC 10-Ks and 10-Qs into distributed databases. We used to parse XBRL filings, ran news/sentiment analysis on earnings calls using early NLP models to detect market anomalies.

Over the past couple of years, I’ve been building AI agents and creating automated workflows that tackle complex problems using agentic AI. I’m also revisiting challenges I hit while building trading tools for fintech companies. For example, the AI I’m working with now reasons about which analysis to run. It grasps context, retrieves information on demand, and orchestrates complex workflows autonomously. It applies Black-Scholes when needed, switches to technical analysis when appropriate, and synthesizes insights from multiple sources—no explicit rules required.

The best part is that I’m running this entire system on my laptop using Ollama and open-source models. Zero API costs during development. When I need production scale, I can switch to cloud APIs with a few lines of code. I will walk you through this journey of rebuilding financial analysis with agentic AI – from traditional algorithms to thinking machines and from rigid pipelines to adaptive workflows.

Why This Approach Changes Everything

Traditional financial systems process data. Agentic AI systems understand objectives and figure out how to achieve them. That’s the fundamental difference that took me a while to fully grasp. And unlike my old systems that required separate codebases for each type of analysis, this one uses the same underlying patterns for everything.

The Money-Saving Secret: Local Development with Ollama

Here’s something that would have saved my startup thousands: you can build and test sophisticated AI systems entirely locally using Ollama. No API keys, no usage limits, no surprise bills.

# This runs entirely on your machine - zero external API calls
from langchain_ollama import OllamaLLM as Ollama

# Local LLM for development and testing
dev_llm = Ollama(
    model="llama3.2:latest",      # 3.2GB model that runs on most laptops
    temperature=0.7,
    base_url="http://localhost:11434"  # Your local Ollama instance
)

# When ready for production, switch to cloud providers
from langchain_openai import ChatOpenAI

prod_llm = ChatOpenAI(
    model="gpt-4",
    temperature=0.7
)

# The beautiful part? Same interface, same code
def analyze_stock(llm, ticker):
    # This function works with both local and cloud LLMs
    prompt = f"Analyze {ticker} stock fundamentals"
    return llm.invoke(prompt)

During development, I run hundreds of experiments daily without spending a cent. Once the prompts and workflows are refined, switching to cloud APIs is literally changing one line of code.

Understanding ReAct: How AI Learns to Think Step-by-Step

ReAct (Reasoning and Acting) was the first pattern that made me realize we weren’t just building chatbots anymore. Let me show you exactly how it works with real code from my system.

The Human Thought Process We’re Mimicking

When I manually analyzed stocks, my mental process looked something like this:

“I need to check if Apple is overvalued”
“Let me get the current P/E ratio”
“Hmm, 28.5 seems high, but what’s the industry average?”
“Tech sector average is 25, so Apple is slightly premium”
“But wait, what’s their growth rate?”
“15% annual growth… that PEG ratio of 1.9 suggests fair value”
“Let me check recent news for any red flags…”

ReAct agents follow this exact pattern. Here’s the actual implementation:

class ReActAgent:
    """ReAct Agent that demonstrates reasoning traces"""
    
    # This is the actual prompt from the project
    REACT_PROMPT = """You are a financial analysis agent that uses the ReAct framework to solve problems.

You have access to the following tools:

{tools_description}

Use the following format EXACTLY:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, must be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin! Remember to ALWAYS follow the format exactly.

Question: {question}
Thought: {scratchpad}"""

    def _parse_response(self, response: str) -> Tuple[str, str, str, bool]:
        """Parse LLM response to extract thought, action, and input"""
        response = response.strip()
        
        # Check for final answer
        if "Final Answer:" in response:
            parts = response.split("Final Answer:")
            thought = parts[0].strip()
            final_answer = parts[1].strip()
            return thought, "final_answer", final_answer, True
        
        # Parse using regex from actual implementation
        thought_match = re.search(r"Thought:\s*(.+?)(?=Action:|$)", response, re.DOTALL)
        action_match = re.search(r"Action:\s*(.+?)(?=Action Input:|$)", response, re.DOTALL)
        input_match = re.search(r"Action Input:\s*(.+?)(?=Observation:|$)", response, re.DOTALL)
        
        thought = thought_match.group(1).strip() if thought_match else "Thinking..."
        action = action_match.group(1).strip() if action_match else "unknown"
        action_input = input_match.group(1).strip() if input_match else ""
        
        return thought, action, action_input, False

I can easily trace through reasoning to debug how AI reached its conclusion.

RAG: Solving the Hallucination Problem Once and For All

Early in my experiments, I had to deal with a bit of hallucinations when querying financial data with AI so I applied RAG (Retrieval-Augmented Generation) to give AI access to a searchable library of documents.

How RAG Actually Works

You can think of RAG like having a research assistant who, instead of relying on memory, always checks the source documents before answering:

class RAGEngine:
    """
    This engine solved my hallucination problems by grounding 
    all responses in actual documents. It's like giving the AI
    access to your company's document database.
    """
    
    def __init__(self):
        # Initialize embeddings - this converts text to searchable vectors
        # Using Ollama's local embedding model (free!)
        self.embeddings = OllamaEmbeddings(
            model="nomic-embed-text:latest"  # 274MB model, runs fast
        )
        
        # Text splitter - crucial for handling large documents
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=512,      # Small enough for context window
            chunk_overlap=50,    # Overlap prevents losing context at boundaries
            separators=["\n\n", "\n", ". ", " "]  # Smart splitting
        )
        
        # Vector store - where we keep our searchable documents
        self.vector_store = FAISS.from_texts(["init"], self.embeddings)
    
    def load_financial_documents(self, ticker: str):
        """
        In production, this would load real 10-Ks, 10-Qs, earnings calls.
        For now, I'm using sample documents to demonstrate the concept.
        """
        
        # Imagine these are real SEC filings
        documents = [
            {
                "content": f"""
                {ticker} Q3 2024 Earnings Report
                
                Revenue: $94.9 billion, up 6% year over year
                iPhone revenue: $46.2 billion
                Services revenue: $23.3 billion (all-time record)
                
                Gross margin: 45.2%
                Operating cash flow: $28.7 billion
                
                CEO Tim Cook: "We're incredibly pleased with our record 
                September quarter results and strong momentum heading into
                the holiday season."
                """,
                "metadata": {
                    "source": "10-Q Filing",
                    "date": "2024-10-31",
                    "document_type": "earnings_report",
                    "ticker": ticker
                }
            },
            # ... more documents
        ]
        
        # Process each document
        for doc in documents:
            # Split into chunks
            chunks = self.text_splitter.split_text(doc["content"])
            
            # Create document objects with metadata
            for i, chunk in enumerate(chunks):
                metadata = doc["metadata"].copy()
                metadata["chunk_id"] = i
                metadata["total_chunks"] = len(chunks)
                
                # Add to vector store
                self.vector_store.add_texts(
                    texts=[chunk],
                    metadatas=[metadata]
                )
        
        print(f"? Loaded {len(documents)} documents for {ticker}")
    
    def answer_with_sources(self, question: str) -> Dict[str, Any]:
        """
        This is where RAG shines - every answer comes with sources
        """
        # Find relevant document chunks
        relevant_docs = self.vector_store.similarity_search_with_score(
            question, 
            k=5  # Top 5 most relevant chunks
        )
        
        # Build context from retrieved documents
        context_parts = []
        sources = []
        
        for doc, score in relevant_docs:
            # Only use highly relevant documents (score < 0.5)
            if score < 0.5:
                context_parts.append(doc.page_content)
                sources.append({
                    "content": doc.page_content[:100] + "...",
                    "source": doc.metadata.get("source"),
                    "date": doc.metadata.get("date"),
                    "relevance_score": float(score)
                })
        
        context = "\n\n---\n\n".join(context_parts)
        
        # Generate answer grounded in retrieved context
        prompt = f"""Based on the following verified documents, answer the question.
        If the answer is not in the documents, say "I don't have that information."
        
        Documents:
        {context}
        
        Question: {question}
        
        Answer (cite sources):"""
        
        response = self.llm.invoke(prompt)
        
        return {
            "answer": response,
            "sources": sources,
            "confidence": len(sources) / 5  # Simple confidence metric
        }

MCP-Style Tools: Extending AI Capabilities Beyond Text

Model Context Protocol (MCP) helped me to build a flexible tool system. Instead of hardcoding every capability, we give the AI tools it can discover and use:

class BaseTool(ABC):
    """
    Every tool self-describes its capabilities.
    This is like giving the AI an instruction manual for each tool.
    """
    
    @abstractmethod
    def get_schema(self) -> ToolSchema:
        """Define what this tool does and how to use it"""
        pass
    
    @abstractmethod
    def execute(self, **kwargs) -> Any:
        """Actually run the tool"""
        pass

class StockDataTool(BaseTool):
    """
    Real example: This tool replaced my entire market data microservice
    """
    
    def get_schema(self) -> ToolSchema:
        return ToolSchema(
            name="stock_data",
            description="Fetch real-time stock market data including price, volume, and fundamentals",
            category=ToolCategory.DATA_RETRIEVAL,
            parameters=[
                ToolParameter(
                    name="ticker",
                    type="string", 
                    description="Stock symbol like AAPL or GOOGL",
                    required=True
                ),
                ToolParameter(
                    name="metrics",
                    type="array",
                    description="Specific metrics to retrieve",
                    required=False,
                    default=["price", "volume", "pe_ratio"],
                    enum=["price", "volume", "pe_ratio", "market_cap", 
                          "dividend_yield", "beta", "rsi", "moving_avg_50"]
                )
            ],
            returns="Dictionary containing requested stock metrics",
            examples=[
                {"ticker": "AAPL", "metrics": ["price", "pe_ratio"]},
                {"ticker": "TSLA", "metrics": ["price", "volume", "rsi"]}
            ]
        )
    
    def execute(self, **kwargs) -> Dict[str, Any]:
        """
        This connects to real market data APIs.
        In my old system, this was a 500-line service.
        """
        ticker = kwargs["ticker"].upper()
        metrics = kwargs.get("metrics", ["price", "volume"])
        
        # Using yfinance for real market data
        import yfinance as yf
        stock = yf.Ticker(ticker)
        info = stock.info
        
        result = {"ticker": ticker, "timestamp": datetime.now().isoformat()}
        
        # Fetch requested metrics
        metric_mapping = {
            "price": lambda: info.get("currentPrice", stock.history(period="1d")['Close'].iloc[-1]),
            "volume": lambda: info.get("volume", 0),
            "pe_ratio": lambda: info.get("trailingPE", 0),
            "market_cap": lambda: info.get("marketCap", 0),
            "dividend_yield": lambda: info.get("dividendYield", 0) * 100,
            "beta": lambda: info.get("beta", 1.0),
            "rsi": lambda: self._calculate_rsi(stock),
            "moving_avg_50": lambda: stock.history(period="50d")['Close'].mean()
        }
        
        for metric in metrics:
            if metric in metric_mapping:
                try:
                    result[metric] = metric_mapping[metric]()
                except Exception as e:
                    result[metric] = f"Error: {str(e)}"
        
        return result

class ToolParameter(BaseModel):
    """Actual parameter definition from project"""
    name: str
    type: str  # "string", "number", "boolean", "object", "array"
    description: str
    required: bool = True
    default: Any = None
    enum: Optional[List[Any]] = None

class CalculatorTool(BaseTool):
    """Actual calculator implementation from project"""
    
    def execute(self, **kwargs) -> float:
        """Safely evaluate mathematical expression"""
        self.validate_input(**kwargs)
        
        expression = kwargs["expression"]
        precision = kwargs.get("precision", 2)
        
        try:
            # Security: Remove dangerous operations
            safe_expr = expression.replace("__", "").replace("import", "")
            
            # Define allowed functions (from actual code)
            safe_dict = {
                "abs": abs, "round": round, "min": min, "max": max,
                "sum": sum, "pow": pow, "len": len
            }
            
            # Add math functions
            import math
            for name in ["sqrt", "log", "log10", "sin", "cos", "tan", "pi", "e"]:
                if hasattr(math, name):
                    safe_dict[name] = getattr(math, name)
            
            result = eval(safe_expr, {"__builtins__": {}}, safe_dict)
            
            return round(result, precision)
            
        except Exception as e:
            raise ValueError(f"Calculation error: {e}")

Orchestrating Everything with LangGraph

This is where all the pieces come together. LangGraph allows coordinating multiple agents and tools in sophisticated workflows:

class FinancialAnalysisWorkflow:
    """
    This workflow replaces what used to be multiple microservices,
    message queues, and orchestration layers. It's beautiful.
    """
    
    def _build_graph(self) -> StateGraph:
        """
        Define how different analysis components work together
        """
        workflow = StateGraph(AgentState)
        
        # Add all our analysis nodes
        workflow.add_node("collect_data", self.collect_market_data)
        workflow.add_node("technical_analysis", self.run_technical_analysis)
        workflow.add_node("fundamental_analysis", self.run_fundamental_analysis)
        workflow.add_node("sentiment_analysis", self.analyze_sentiment)
        workflow.add_node("options_analysis", self.analyze_options)
        workflow.add_node("portfolio_optimization", self.optimize_portfolio)
        workflow.add_node("rag_research", self.search_documents)
        workflow.add_node("react_reasoning", self.reason_about_data)
        workflow.add_node("generate_report", self.create_final_report)
        
        # Entry point
        workflow.set_entry_point("collect_data")
        
        # Define the flow - some parallel, some sequential
        workflow.add_edge("collect_data", "technical_analysis")
        workflow.add_edge("collect_data", "fundamental_analysis")
        workflow.add_edge("collect_data", "sentiment_analysis")
        
        # These can run in parallel
        workflow.add_conditional_edges(
            "collect_data",
            self.should_run_options,  # Only if options are relevant
            {
                "yes": "options_analysis",
                "no": "rag_research"
            }
        )
        
        # Everything feeds into reasoning
        workflow.add_edge(["technical_analysis", "fundamental_analysis", 
                          "sentiment_analysis", "options_analysis"], 
                          "react_reasoning")
        
        # Reasoning leads to report
        workflow.add_edge("react_reasoning", "generate_report")
        
        # End
        workflow.add_edge("generate_report", END)
        
        return workflow
    
    def analyze_stock_comprehensive(self, ticker: str, investment_amount: float = 10000):
        """
        This single function replaces what used to be an entire team's
        worth of manual analysis.
        """
        initial_state = {
            "ticker": ticker,
            "investment_amount": investment_amount,
            "timestamp": datetime.now(),
            "messages": [],
            "market_data": {},
            "technical_indicators": {},
            "fundamental_metrics": {},
            "sentiment_scores": {},
            "options_data": {},
            "portfolio_recommendation": {},
            "documents_retrieved": [],
            "reasoning_trace": [],
            "final_report": "",
            "errors": []
        }
        
        # Run the workflow
        try:
            result = self.app.invoke(initial_state)
            return self._format_comprehensive_report(result)
        except Exception as e:
            # Graceful degradation
            return self._run_basic_analysis(ticker, investment_amount)

class WorkflowNodes:
    """Collection of workflow nodes from actual project"""
    
    def collect_market_data(self, state: AgentState) -> AgentState:
        """Node: Collect market data using tools"""
        print("? Collecting market data...")
        
        ticker = state["ticker"]
        
        try:
            # Use actual stock data tool from project
            tool = self.tool_registry.get_tool("stock_data")
            market_data = tool.execute(
                ticker=ticker,
                metrics=["price", "volume", "market_cap", "pe_ratio", "52_week_high", "52_week_low"]
            )
            
            state["market_data"] = market_data
            
            # Add message to history
            state["messages"].append(
                AIMessage(content=f"Collected market data for {ticker}")
            )
            
        except Exception as e:
            state["error"] = f"Failed to collect market data: {str(e)}"
            state["market_data"] = {}
        
        return state

Here is a screenshot from the example showing workflow analysis:

Production Considerations: From Tutorial to Trading Floor

This tutorial demonstrates core concepts, but let me be clear – production deployment in financial services requires significantly more rigor. Having deployed similar systems in regulated environments, here’s what you’ll need to consider:

The Reality of Production Deployment

Production financial systems require months of parallel running and validation. In my experience, you’ll need:

class ProductionValidation:
    """
    Always run new systems parallel to existing ones
    """
    def validate_against_legacy(self, ticker: str):
        # Run both systems
        legacy_result = self.legacy_system.analyze(ticker)
        agent_result = self.agent_system.analyze(ticker)
        
        # Compare results
        discrepancies = self.compare_results(legacy_result, agent_result)
        
        # Log everything for audit
        self.audit_log.record({
            "ticker": ticker,
            "timestamp": datetime.now(),
            "legacy": legacy_result,
            "agent": agent_result,
            "discrepancies": discrepancies,
            "approved": len(discrepancies) == 0
        })
        
        # Require human review for discrepancies
        if discrepancies:
            return self.escalate_to_human(discrepancies)
        
        return agent_result

Integrating Traditional Financial Algorithms

While this tutorial uses general-purpose LLMs, production systems should combine AI with proven financial algorithms:

class HybridAnalyzer:
    """
    Combine traditional algorithms with AI reasoning
    """
    def analyze_options(self, ticker: str, strike: float, expiry: str):
        # Use traditional Black-Scholes for pricing
        traditional_price = self.black_scholes_pricer.calculate(
            ticker, strike, expiry
        )
        
        # Use AI for market context
        ai_context = self.agent.analyze_market_conditions(ticker)
        
        # Combine both
        if ai_context["volatility_regime"] == "high":
            # AI detected unusual conditions, adjust model
            adjusted_price = traditional_price * (1 + ai_context["vol_adjustment"])
            confidence = "low - unusual market conditions"
        else:
            adjusted_price = traditional_price
            confidence = "high - normal market conditions"
        
        return {
            "model_price": traditional_price,
            "adjusted_price": adjusted_price,
            "confidence": confidence,
            "reasoning": ai_context["reasoning"]
        }

Fitness Functions for Financial Accuracy

Financial data cannot tolerate hallucinations. Implement strict validation:

class FinancialFitnessValidator:
    """
    Reject hallucinated or impossible financial data
    """
    def validate_metrics(self, ticker: str, metrics: Dict):
        validations = {
            "pe_ratio": lambda x: -100 < x < 1000,
            "price": lambda x: x > 0,
            "market_cap": lambda x: x > 0,
            "dividend_yield": lambda x: 0 <= x <= 20,
            "revenue_growth": lambda x: -100 < x < 200
        }
        
        for metric, validator in validations.items():
            if metric in metrics:
                value = metrics[metric]
                if not validator(value):
                    raise ValueError(f"Invalid {metric}: {value} for {ticker}")
        
        # Cross-validation
        if "pe_ratio" in metrics and "earnings" in metrics:
            calculated_pe = metrics["price"] / metrics["earnings"]
            if abs(calculated_pe - metrics["pe_ratio"]) > 1:
                raise ValueError("P/E ratio doesn't match price/earnings")
        
        return True

Leverage Your Existing Data

If you have years of financial data in databases, you don’t need to start over. Use RAG to make it searchable:

# Convert your SQL database to vector-searchable documents
existing_data = sql_query("SELECT * FROM financial_reports")
rag_engine.add_documents([
    {"content": row.text, "metadata": {"date": row.date, "ticker": row.ticker}}
    for row in existing_data
])

Human-in-the-Loop

No matter how sophisticated your agents become, financial decisions affecting real money require human oversight. Build it in from day one:

Confidence thresholds that trigger human review
Clear audit trails showing agent reasoning
Easy override mechanisms
Gradual automation based on proven accuracy

class HumanInTheLoopWorkflow:
    """
    Ensure human review for critical decisions
    """
    def execute_trade_recommendation(self, recommendation: Dict):
        # Auto-approve only for low-risk, small trades
        if (recommendation["risk_score"] < 0.3 and 
            recommendation["amount"] < 10000):
            return self.execute(recommendation)
        
        # Require human approval for everything else
        approval_request = {
            "recommendation": recommendation,
            "agent_reasoning": recommendation["reasoning_trace"],
            "confidence": recommendation["confidence_score"],
            "risk_assessment": self.assess_risks(recommendation)
        }
        
        # Send to human reviewer
        human_decision = self.request_human_review(approval_request)
        
        if human_decision["approved"]:
            return self.execute(recommendation)
        else:
            self.log_rejection(human_decision["reason"])

Cost Management and Budget Controls

During development, Ollama gives you free local inference. In production, costs add up quickly so you need to build proper controls for calculating cost of analysis:

GPT-4: ~$30 per million tokens
Claude-3: ~$20 per million tokens
Local Llama: Free but needs GPU infrastructure

class CostController:
    """
    Prevent runway costs in production
    """
    def __init__(self, daily_budget: float = 100.0):
        self.daily_budget = daily_budget
        self.costs_today = 0.0
        self.cost_per_token = {
            "gpt-4": 0.00003,  # $0.03 per 1K tokens
            "claude-3": 0.00002,
            "llama-local": 0.0  # Free but has compute cost
        }
    
    def check_budget(self, estimated_tokens: int, model: str):
        estimated_cost = estimated_tokens * self.cost_per_token.get(model, 0)
        
        if self.costs_today + estimated_cost > self.daily_budget:
            # Switch to local model or cache
            return "use_local_model"
        
        return "proceed"
    
    def track_usage(self, tokens_used: int, model: str):
        cost = tokens_used * self.cost_per_token.get(model, 0)
        self.costs_today += cost
        
        # Alert if approaching limit
        if self.costs_today > self.daily_budget * 0.8:
            self.send_alert(f"80% of daily budget used: ${self.costs_today:.2f}")

Caching Is Essential

Caching is crucial for both performance and cost effectiveness when running expensive analysis using LLMs.

class CachedRAGEngine(RAGEngine):
    """
    Caching reduced our costs by 70% and improved response time by 5x
    """
    
    def __init__(self):
        super().__init__()
        self.cache = Redis(host='localhost', port=6379, db=0)
        self.cache_ttl = 3600  # 1 hour for financial data
    
    def retrieve_with_cache(self, query: str, k: int = 5):
        # Create cache key from query
        cache_key = f"rag:{hashlib.md5(query.encode()).hexdigest()}"
        
        # Check cache first
        cached = self.cache.get(cache_key)
        if cached:
            return json.loads(cached)
        
        # If not cached, retrieve and cache
        docs = self.vector_store.similarity_search(query, k=k)
        
        # Cache the results
        self.cache.setex(
            cache_key, 
            self.cache_ttl,
            json.dumps([doc.to_dict() for doc in docs])
        )
        
        return docs

Fallback Strategies

A Cascading Fallback can help execute a task using a sequence of operations, ordered from the most preferred (highest quality/cost) to the least preferred (lowest quality/safest default).

class ResilientAgent:
    """
    Production agents need multiple fallback options
    """
    
    def analyze_with_fallbacks(self, ticker: str):
        strategies = [
            ("primary", self.run_full_analysis),
            ("fallback_1", self.run_simplified_analysis),
            ("fallback_2", self.run_basic_analysis),
            ("emergency", self.return_cached_or_default)
        ]
        
        for strategy_name, strategy_func in strategies:
            try:
                result = strategy_func(ticker)
                result["strategy_used"] = strategy_name
                return result
            except Exception as e:
                logger.warning(f"Strategy {strategy_name} failed: {e}")
                continue
        
        return {"error": "All strategies failed", "ticker": ticker}

Observability and Monitoring

Track token usage, latency, accuracy, and costs immediately. What you don’t measure, you can’t improve.

class ObservableWorkflow:
    """
    You need to know what your AI is doing in production
    """
    
    def __init__(self):
        self.metrics = PrometheusMetrics()
        self.tracer = JaegerTracer()
    
    def execute_with_observability(self, state: AgentState):
        with self.tracer.start_span("workflow_execution") as span:
            span.set_tag("ticker", state["ticker"])
            
            # Track token usage
            tokens_start = self.llm.get_num_tokens(state)
            
            # Execute workflow
            result = self.workflow.invoke(state)
            
            # Record metrics
            tokens_used = self.llm.get_num_tokens(result) - tokens_start
            self.metrics.record_tokens(tokens_used)
            self.metrics.record_latency(span.duration)
            
            # Log for debugging
            logger.info(f"Workflow completed", extra={
                "ticker": state["ticker"],
                "tokens": tokens_used,
                "duration": span.duration,
                "strategy": result.get("strategy_used", "primary")
            })
            
            return result

Closing Thoughts

This tutorial demonstrates how agentic AI transforms financial analysis from rigid pipelines to adaptive, thinking systems. The combination of ReAct reasoning, RAG grounding, tool use, and workflow orchestration creates capabilities that surpass traditional approaches in flexibility and ease of development.

Start Simple, Build Incrementally:

Week 1: Basic ReAct agent to understand reasoning loops
Week 2: Add tools for external capabilities
Week 3: Implement RAG to ground responses in real data
Week 4: Orchestrate with workflows
Develop everything locally with Ollama first – it’s free and private

The point of agentic AI is automation. Here’s the pragmatic approach:

Automate in Tiers:

Tier 1 (Fully Automated): Data collection, technical calculations, report generation
Tier 2 (Auto + Audit): Sentiment analysis, risk scoring, anomaly detection
Tier 3 (Human Required): Large trades, strategy changes, regulatory decisions

Clear Escalation Rules:

ESCALATE_IF = {
    "confidence_below": 0.8,
    "amount_above": 100000,
    "regulatory_flag": True,
    "anomaly_detected": True
}

Reinforcement Learning:

Instead of permanent human-in-the-loop, use RL to train agents that learn from feedback:

class ReinforcementLearningLoop:
    """
    Gradually reduce human involvement through learning
    """

    def ai_based_reinforcement(self, decision, outcome):
        """AI learns from market outcomes directly"""
        # Did the prediction match reality?
        reward = self.calculate_reward(decision, outcome)

        if decision["action"] == "buy" and outcome["price_change"] > 0.02:
            reward = 1.0  # Good decision
        elif decision["action"] == "hold" and abs(outcome["price_change"]) < 0.01:
            reward = 0.5  # Correct to avoid volatility
        else:
            reward = -0.5  # Poor decision

        # Update agent weights/prompts based on reward
        self.agent.update_policy(decision["context"], reward)

    def human_feedback_learning(self, decision, human_override=None):
        """Learn from human corrections when they occur"""
        if human_override:
            # Human disagreed - strong learning signal
            self.agent.record_correction(
                agent_decision=decision,
                human_decision=human_override,
                weight=10.0  # Human feedback weighted heavily
            )
        else:
            # Human agreed (implicitly by not overriding)
            self.agent.reinforce_decision(decision, weight=1.0)

    def adaptive_automation_threshold(self):
        """Dynamically adjust when human review is needed"""
        recent_accuracy = self.get_recent_accuracy(days=30)

        if recent_accuracy > 0.95:
            self.confidence_threshold *= 0.9  # Require less human review
        elif recent_accuracy < 0.85:
            self.confidence_threshold *= 1.1  # Require more human review

        return self.confidence_threshold

This approach reduces human involvement over time: use that feedback to train, gradually automate decisions where the agent consistently agrees with humans, and only escalate novel situations or low-confidence decisions.

Complete code at github.com/bhatti/agentic-ai-tutorial. Start local, validate thoroughly, scale confidently.

Shahzad Bhatti Welcome to my ramblings and rants!

October 21, 2025