Metadata-Version: 2.4
Name: langchain-onelake
Version: 0.2.0
Summary: LangChain toolkit for Microsoft Fabric OneLake — workspace, file, table, directory, and item operations powered by the Fabric MCP Server.
Project-URL: Homepage, https://github.com/kavmedcharla_microsoft/langchain-onelake
Project-URL: Repository, https://github.com/kavmedcharla_microsoft/langchain-onelake
Project-URL: Documentation, https://github.com/kavmedcharla_microsoft/langchain-onelake#readme
Project-URL: Issues, https://github.com/kavmedcharla_microsoft/langchain-onelake/issues
License-Expression: MIT
License-File: LICENSE
Keywords: data-lakehouse,langchain,mcp,microsoft-fabric,onelake,toolkit
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: langchain-core>=0.3.0
Requires-Dist: langchain-mcp-adapters>=0.2.1
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: examples
Requires-Dist: langchain-anthropic>=0.3.0; extra == 'examples'
Requires-Dist: langchain-openai>=0.3.0; extra == 'examples'
Requires-Dist: langchain>=0.3.0; extra == 'examples'
Requires-Dist: langgraph>=0.2.0; extra == 'examples'
Requires-Dist: python-dotenv>=1.0.0; extra == 'examples'
Description-Content-Type: text/markdown

# langchain-onelake

**LangChain toolkit for Microsoft Fabric OneLake** — powered by the [Fabric MCP Server](https://github.com/microsoft/mcp).

Connect your LangChain / LangGraph agents to OneLake with zero boilerplate. This toolkit dynamically loads all OneLake tools from the Fabric MCP Server (`fabmcp`) and presents them as LangChain-compatible tools.

## Features

- **Context-aware tools** — set workspace + item once, all subsequent calls auto-inject context
- **Raw MCP tools** — full pass-through for power users who want direct control
- **OneLake tools** across 8 categories — workspace, item, file, directory, upload/download, blob, and table
- **Local & remote** — connect to a local `fabmcp` binary (stdio) or a remote server (HTTP/SSE)
- **Auto-discovery** — tools stay in sync with the Fabric MCP Server
- **Category filtering** — load all tools or just the ones you need
- **Async context manager** — clean startup and shutdown of the MCP server
- **Flexible auth** — uses your Azure CLI / managed identity credentials (local), or bearer tokens (remote)
- **Model-agnostic** — works with OpenAI, Azure AI Foundry (OpenAI, Llama, Mistral, Phi, etc.), Anthropic, Google, Ollama, or any LangChain-supported model

## Installation

```bash
pip install langchain-onelake
```

### Prerequisites

1. **Fabric MCP Server** (`fabmcp`) — [build from source](https://github.com/microsoft/mcp) or install the binary
2. **Azure authentication** — run `az login` or configure managed identity
3. **Python 3.10+**

## Quick Start — Context-Aware Tools (Recommended)

Context-aware tools let the agent set a workspace and item once — all subsequent
operations automatically use that context:

```python
import asyncio
from langchain_onelake import OneLakeToolkit

async def main():
    async with OneLakeToolkit.create("path/to/fabmcp") as toolkit:
        # Context-aware tools — agent sets context once, then queries flow naturally
        tools = toolkit.get_context_tools()
        print(f"Loaded {len(tools)} context-aware tools:")
        for t in tools:
            print(f"  - {t.name}: {t.description[:80]}")

asyncio.run(main())
```

### Context-Aware Tool List

| Tool | Description |
|------|-------------|
| `set_onelake_context` | Set the active workspace and item (by name, ID, or URI) |
| `get_onelake_context` | Check the currently set workspace and item context |
| `list_onelake_workspaces` | List all workspaces available to the current user |
| `list_onelake_items` | List items (lakehouses, warehouses, …) in the active workspace |
| `list_onelake_items_data` | List items via DFS data API |
| `create_onelake_item` | Create a new Fabric item (Lakehouse, Notebook, etc.) |
| `list_onelake_tables` | List tables (Delta/Iceberg) in the active item |
| `list_onelake_folders` | List folders/directories at a given path |
| `get_onelake_table_metadata` | Get file-level metadata for a table |
| `read_onelake_file` | Read file content from the active item |
| `write_onelake_file` | Write content to a file in OneLake |
| `delete_onelake_file` | Delete a file from OneLake |
| `upload_onelake_file` | Upload content to OneLake storage |
| `download_onelake_file` | Download a file with metadata |
| `create_onelake_directory` | Create a directory (supports nested structures) |
| `delete_onelake_directory` | Delete a directory |
| `list_onelake_blobs` | List files and directories as blobs |
| `delete_onelake_blob` | Delete a blob |
| `list_onelake_tables_api` | List tables via Iceberg REST catalog |
| `get_onelake_table_api` | Get detailed table metadata (columns, types, stats) |
| `get_onelake_table_config` | Get table endpoint configuration |
| `list_onelake_namespaces` | List table namespaces/schemas |
| `get_onelake_namespace` | Get namespace metadata |

### How Context Works

```
User: "What tables are in the Sales lakehouse in the Marketing workspace?"
  ↓
Agent calls: set_onelake_context(workspace="Marketing", item="Sales")
  ↓  Context stored: workspace=Marketing, item=Sales (Lakehouse)
Agent calls: list_onelake_tables()          ← no IDs needed!
  ↓  Context auto-injected
Agent returns: table list
```

## Quick Start — Raw Tools (Advanced)

For full control over every MCP parameter:

```python
import asyncio
from langchain_onelake import OneLakeToolkit

async def main():
    async with OneLakeToolkit.create("path/to/fabmcp") as toolkit:
        # Raw MCP tools — pass workspace/item on every call
        tools = toolkit.get_tools()
        print(f"Loaded {len(tools)} raw tools:")
        for t in tools:
            print(f"  - {t.name}: {t.description[:80]}")

asyncio.run(main())
```

### Remote server (HTTP/SSE)

```python
import asyncio
from langchain_onelake import OneLakeToolkit

async def main():
    # Connect to a remotely hosted Fabric MCP Server
    async with OneLakeToolkit.from_url("https://fabmcp.example.com/sse") as toolkit:
        tools = toolkit.get_context_tools()
        print(f"Loaded {len(tools)} tools")

    # With authentication
    async with OneLakeToolkit.from_url(
        "https://fabmcp.example.com/sse",
        headers={"Authorization": "Bearer <your-token>"},
    ) as toolkit:
        tools = toolkit.get_context_tools()

asyncio.run(main())
```

## Usage with LangGraph Agent

Works with **any model provider** — OpenAI, Azure OpenAI, Anthropic, Google, Ollama, etc.:

```python
import asyncio
from langchain_onelake import OneLakeToolkit
from langchain.chat_models import init_chat_model
from langgraph.graph import StateGraph, MessagesState, START
from langgraph.prebuilt import ToolNode, tools_condition

SYSTEM_PROMPT = """You are a data assistant connected to Microsoft Fabric OneLake.

WORKFLOW:
1. If the user mentions a workspace or item, call set_onelake_context first.
2. Once context is set, list tables, browse files, inspect metadata, etc.
3. If no context is set, list workspaces to help the user pick one.

Present tabular data in clean markdown tables. Be concise."""


async def main():
    async with OneLakeToolkit.create() as toolkit:
        tools = toolkit.get_context_tools()

        # Use any model — just change the model name
        model = init_chat_model("gpt-4o", temperature=0)

        def call_model(state: MessagesState):
            messages = [{"role": "system", "content": SYSTEM_PROMPT}] + state["messages"]
            return {"messages": model.bind_tools(tools).invoke(messages)}

        builder = StateGraph(MessagesState)
        builder.add_node("agent", call_model)
        builder.add_node("tools", ToolNode(tools))
        builder.add_edge(START, "agent")
        builder.add_conditional_edges("agent", tools_condition)
        builder.add_edge("tools", "agent")
        agent = builder.compile()

        # Run a query
        result = await agent.ainvoke(
            {"messages": [{"role": "user", "content": "List my OneLake workspaces"}]}
        )
        print(result["messages"][-1].content)


asyncio.run(main())
```

## Accessing the Context Object

You can read or modify the context programmatically:

```python
async with OneLakeToolkit.create() as toolkit:
    tools = toolkit.get_context_tools()

    # Pre-set context before the agent runs
    toolkit.context.workspace_id = "12345678-..."
    toolkit.context.workspace_name = "Marketing"
    toolkit.context.item_id = "87654321-..."
    toolkit.context.item_name = "SalesLH"
    toolkit.context.item_type = "Lakehouse"

    # Now the agent can query without calling set_onelake_context
    print(toolkit.context.summary())
    # → "workspace=Marketing, item=SalesLH (Lakehouse)"
```

## Raw Tool Categories

Filter raw tools by category when you only need a subset:

```python
async with OneLakeToolkit.create() as toolkit:
    # Only table tools
    table_tools = toolkit.get_tools(category="table")

    # Only file operations
    file_tools = toolkit.get_tools(category="file")

    # List all categories
    print(toolkit.categories)
```

| Category | Tools | Description |
|----------|-------|-------------|
| `workspace` | `workspace_list` | List OneLake workspaces |
| `item` | `item_list`, `item_list-data`, `item_create` | List and create Fabric items |
| `file` | `file_list`, `file_read`, `file_write`, `file_delete` | Read/write files in OneLake |
| `directory` | `directory_create`, `directory_delete` | Manage directories |
| `upload` | `upload_file` | Upload files via blob API |
| `download` | `download_file` | Download files from OneLake |
| `blob` | `blob_list`, `blob_delete` | Blob storage operations |
| `table` | `table_config_get`, `table_list`, `table_get`, ... | Table introspection (schemas, columns, stats) |

## Configuration

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `FABMCP_PATH` | Path to `fabmcp` executable (local mode) | Searches `$PATH` |
| `FABMCP_URL` | URL of remote Fabric MCP Server (remote mode) | None |
| `ONELAKE_ENVIRONMENT` | OneLake environment (default: `PROD`) | None |

### Programmatic Configuration

**Local (stdio):**
```python
toolkit = await OneLakeToolkit.create(
    command=r"C:\fabmcp.exe",          # Explicit path
    environment="PROD",                 # Override OneLake environment
    namespaces=["onelake"],             # Restrict to specific namespaces
    env={"CUSTOM_VAR": "value"},        # Extra env vars for the server
)
```

**Remote (HTTP/SSE):**
```python
toolkit = await OneLakeToolkit.from_url(
    "https://fabmcp.example.com/sse",
    headers={"Authorization": "Bearer <token>"},
)
```

## API Reference

### `OneLakeToolkit`

#### Class Methods

| Method | Description |
|--------|-------------|
| `await OneLakeToolkit.create(command, *, environment, namespaces, env)` | Factory — spawn local fabmcp and load tools (stdio) |
| `await OneLakeToolkit.from_url(url, *, headers)` | Factory — connect to remote fabmcp server (HTTP/SSE) |

#### Instance Methods & Properties

| Member | Description |
|--------|-------------|
| `get_context_tools()` | Return context-aware tools (recommended) |
| `get_tools(*, category=None)` | Return raw MCP tools, optionally filtered by category |
| `context` | The shared `OneLakeContext` instance |
| `tool_names` | Sorted list of raw tool names |
| `categories` | List of valid category names |
| `await close()` | Shut down the MCP server |

### `OneLakeContext`

| Member | Description |
|--------|-------------|
| `workspace_id` / `workspace_name` | Current workspace |
| `item_id` / `item_name` / `item_type` | Current item (lakehouse, warehouse, …) |
| `is_workspace_set()` | Returns `True` if a workspace is in context |
| `is_item_set()` | Returns `True` if both workspace and item are in context |
| `clear()` | Reset all context fields |
| `summary()` | Human-readable summary of current context |

#### Context Manager

```python
async with OneLakeToolkit.create() as toolkit:
    ...  # server shuts down automatically
```

## Development

```bash
# Clone the repo
git clone https://github.com/microsoft/mcp.git
cd mcp/examples/langchain-onelake

# Install in development mode
pip install -e ".[dev,examples]"

# Run tests
pytest

# Lint
ruff check .
```

## License

MIT — see [LICENSE](../../LICENSE) for details.
