Wrapping MCPs for Token Efficiency

Learn how to wrap MCP servers into subagents, drastically cutting token counts for tool descriptions and task output to enable dozens of active MCPs.

PydanticAI Redis Haiku MCP LLM

Overview

Working production system that uses clever caching to wrap MCP servers into subagents, reducing token counts by 95+% in tool descriptions and another 95+% in task output tokens that stay in the context window

Links

https://productboard.com
Productboard centralizes feedback and roadmapping to streamline enterprise product management.

Tech stack

PydanticAI

A Python agent framework for building production-grade, type-safe Generative AI applications with validated, structured outputs.

PydanticAI is the Python agent framework from the Pydantic team, designed to bring FastAPI's ergonomic, type-safe development experience to Generative AI. It leverages Pydantic’s core data validation to ensure Large Language Model (LLM) outputs conform strictly to defined schemas, eliminating unpredictable text responses. The framework uses 'Agents' as the primary interface, supporting model-agnostic integration (OpenAI, Anthropic, Gemini, etc.) and managing complex components like function tools and dependency injection. This structure ensures reliable, maintainable, and scalable AI workflows for production environments.

https://ai.pydantic.dev/

View projects
Redis

Redis is the ultra-fast, open-source, in-memory data structure store: a powerful NoSQL key/value database.

This is your go-to for low-latency data operations. Redis operates primarily in memory, delivering sub-millisecond response times for real-time applications (think: session storage, leaderboards, and caching). It functions as more than just a key/value store; it’s a versatile data structure server supporting Strings, Hashes, Lists, Sets, Sorted Sets, and JSON. Leverage its Pub/Sub capabilities for message brokering, or rely on its optional persistence for durability. Deploy it for high-speed caching to offload your primary database, or use it as a primary database for high-throughput microservices.

https://redis.io

View projects
Haiku

Haiku is a fast, open-source operating system, a community-driven continuation of the BeOS platform, specifically targeting efficient personal computing.

Haiku, originally OpenBeOS, is a free, open-source operating system that directly succeeds the BeOS architecture; development began in 2001. The system is built for responsiveness, featuring a fully threaded design for maximum efficiency on multi-core CPUs and a custom hybrid kernel derived from NewOS. It utilizes the Be File System (BFS), which supports indexed metadata, treating the file system like a database. The entire project (kernel, drivers, toolkit, and desktop applications) is written by a single team, ensuring a unique level of consistency and a cohesive object-oriented API for accelerated C++ development.

https://www.haiku-os.org/

View projects
MCP

MCP is the open-source standard for securely connecting AI agents (like LLMs) to external tools, data, and enterprise workflows.

The Model Context Protocol (MCP) functions as a standardized integration layer: think of it as a USB-C port for AI applications. Developed and open-sourced by Anthropic, this protocol allows large language models (LLMs) to access real-time context and execute actions via external tools like GitHub, Jira, or proprietary databases . It uses a simple JSON-RPC interface to define tools, schemas, and endpoints, which enables AI agents to perform complex, state-changing tasks—such as creating a GitHub issue or running a test script—rather than just generating text . MCP is essential for building agentic AI systems that can autonomously pursue goals and operate within defined safety and permission boundaries .

https://modelcontextprotocol.io/

View projects
LLM

Large Language Models (LLMs) are deep learning models, built on the Transformer architecture, that process and generate human-quality text and code at scale.

LLMs are a class of foundation models: massive, pre-trained neural networks (often with billions to trillions of parameters) that leverage the self-attention mechanism of the Transformer architecture (introduced in 2017) to predict the next token in a sequence. Trained on vast datasets (e.g., Common Crawl's 50 billion+ web pages), these models—like GPT-4, Gemini, and Claude—acquire predictive power over syntax and semantics. They function as general-purpose sequence models, enabling critical applications such as complex content generation, language translation, and automated code completion (e.g., GitHub Copilot). Their core value: generalizing across diverse tasks with minimal task-specific fine-tuning.

https://en.wikipedia.org/wiki/Large_language_model

View projects