How to Build a RAG Pipeline with LangChain and MCP

Traditional RAG usually starts with a vector database.

That works when your corpus is stable.

But many agent tasks depend on information that changes: docs, changelogs, GitHub issues, package APIs, and release notes.

MCP gives RAG pipelines a cleaner way to call external retrieval tools.

LangChain can load MCP tools through langchain-mcp-adapters, so your chain or agent can retrieve live context without writing a custom integration for every source.

Architecture

The important shift is that retrieval becomes a tool call.

The model does not need a stale embedded copy of the web.

It can call a live retrieval server when the question requires it.

Install Dependencies

pip install langchain-mcp-adapters langchain langgraph

Install your model provider package as needed.

For example, if you are using OpenAI-compatible LangChain models:

pip install "langchain[openai]"

Connect to an MCP Search Server

Use MultiServerMCPClient:

import asyncio
from langchain_mcp_adapters.client import MultiServerMCPClient

async def main():
    client = MultiServerMCPClient({
        "ninelayer": {
            "url": "https://mcp.ninelayer.in/mcp",
            "transport": "http",
            "headers": {
                "Authorization": "Bearer YOUR_NINELAYER_API_TOKEN"
            },
        }
    })

    tools = await client.get_tools()
    print([tool.name for tool in tools])

asyncio.run(main())

This should expose the Ninelayer tools to your LangChain workflow.

Retrieve Evidence

In a simple pipeline, call the search tool first, then pass the result into the model.

The exact tool invocation depends on the tool objects returned by your MCP client, but the flow should look like this:

async def retrieve_context(tools, question: str):
    search_tool = next(t for t in tools if "deep_search" in t.name)
    result = await search_tool.ainvoke({
        "query": question,
        "num_results": 5
    })
    return result

Keep the retrieval result compact.

RAG quality depends on giving the model enough evidence, not every page you can fetch.

Build the Grounded Prompt

def build_prompt(question: str, context: str) -> str:
    return f"""
Answer the question using only the provided context.

If the context is insufficient, say what is missing.
Include source URLs when available.

Question:
{question}

Context:
{context}
"""

This is the simplest grounded-generation pattern.

For production, add:

source deduplication
authority ranking
freshness checks
token budgets
refusal behavior for missing evidence
logging of retrieved sources

Add URL Expansion

Search results are often enough.

When they are not, fetch one or two selected URLs:

async def expand_url(tools, url: str):
    get_url_tool = next(t for t in tools if "get_url" in t.name)
    return await get_url_tool.ainvoke({"url": url})

Do not expand every result by default.

Search first. Read selectively.

Why MCP Helps RAG

MCP gives your RAG pipeline a standard way to connect retrieval systems.

Instead of hardcoding every API, you can expose search, docs, databases, and internal tools through a common interface.

That makes your RAG stack easier to evolve:

swap retrieval providers
add private data sources
share tools with Claude Code or Cursor
keep auth and transport separate from prompting

The Practical Takeaway

A good LangChain plus MCP RAG pipeline is not complicated:

Load MCP tools.
Search for current evidence.
Expand only the best URLs.
Pass compact context to the model.
Require source-aware answers.

For fast-moving technical questions, live retrieval through MCP can be more reliable than a stale vector index.

Sources

LangChain MCP adapters: GitHub repository
Model Context Protocol: Introduction to MCP
Ninelayer: Full LLM reference