You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an example of an AI agent that uses the Membrabe MCP server. Rather than exposing all available tools from the MCP server to the language model, it dynamically selects and provides only a small, relevant subset based on the user’s query. See how it works for more details.
Why is this useful?
LLMs struggle when overloaded with too many tools, tool-selection accuracy drops dramatically as tool count increases
Narrowing down tools help you spend less since the token count for request to the LLM is reduced.
Most LLMs have a hard limit on number of tool that can be provided
Prerequisites 🛠️
Membrane – the central platform for building and running your app integrations. getmembrane.com
Pinecone – a managed vector database used to store and query embeddings (e.g., for tool/data lookup). pinecone.io
Anthropic Claude – the default LLM in this project (can be easily swapped for others if needed). Learn about providers
PostgreSQL – Stores all chat history and user information. You can use Supabase for a free and easy setup.
To prevent token overflow errors, large tool results are automatically truncated. You can configure the truncation threshold via the MAX_TOOL_RESULT_SIZE_KB environment variable (default: 50KB).
The default setting works with most models. Only increase if you're using a model with a significantly larger context window and need more data per tool result.
How it works
When an MCP server provide a large number of tools to an LLM, the following can happen:
The LLM can completely freeze or hallucinate on what tools to call
LLM consumes a large number of token per message since the tool list is sent in the request
To solve this problem, this example exposes a small number of tools to the LLM based on the user query.
Here's a diagram that shows how it works:
Summary
Pre-index the metadata of all available actions in your workspace.
When a user starts a chat to perform a task, prompt LLM to search the MCP tools index for the most relevant tools based on the user’s query.
If no relevant tool is found in the MCP index, the LLM will fall back to searching the full index of all available workspace actions.
The LLM is then provided with the most relevant tool to call based on the search results.
When a new app is connected, we re-index the MCP server with available actions.