Open WebUI Tools, Functions & Pipelines: Extend Your Local LLM

You Installed Open WebUI. Now What?

So you’ve got Ollama running, you’ve got Open WebUI in front of it, and you’ve been having conversations with Llama 3 like a normal person. That’s great. But you’ve also probably noticed there are menu items labeled “Tools,” “Functions,” and something called “Pipelines”, and if you clicked them, you found a Python editor, a confusing description, and maybe a vague sense that there’s a whole other level to this thing.

There is. And it’s actually useful, not just impressive-looking.

Open WebUI’s extension system is one of the most underrated parts of the project. The problem is that “Tools,” “Functions,” and “Pipelines” sound interchangeable until you need one of them, at which point the distinctions matter a lot. This post is the map you needed before you started clicking around.

Let’s break them all down, build a working example of each, and end with a decision rule you can actually use.

The Mental Model

Before code, here’s the 30-second version:

Tools: Python code the model can choose to call. The LLM decides “I should look up the weather” and your code runs. Function calling, basically.
Functions: Python code that hooks into the chat lifecycle on the server. Filters that transform messages before/after the model sees them, or custom model sources (pipes).
Pipelines: A completely separate Python service (its own Docker container) that Open WebUI talks to like it’s an OpenAI-compatible backend. For heavy lifting: RAG, agent loops, custom auth, anything you don’t want running inside the WebUI process.

They look similar in the UI because they’re all Python. They solve wildly different problems.

Tools: Giving the Model a Screwdriver

Tools are what people mean when they say “function calling” or “tool use.” You write a Python class with methods decorated to describe what they do, the model sees those descriptions in its system context, and when the model decides it needs to use one, Open WebUI runs the code and feeds the result back into the conversation.

The model has to support tool use, Llama 3.1+, Mistral, Qwen 3, most modern models do. Older 7B models often don’t.

A Weather Tool

Here’s a minimal tool that fetches current weather from wttr.in (no API key required, because we’re not masochists):

"""
title: Weather Lookup
author: sumguy
description: Fetches current weather for a given city using wttr.in
version: 0.1.0
required_open_webui_version: 0.3.0
"""

import requests
from pydantic import BaseModel, Field


class Tools:
    class Valves(BaseModel):
        """Optional config exposed in the UI."""
        units: str = Field(
            default="metric",
            description="Temperature units: metric or imperial"
        )

    def __init__(self):
        self.valves = self.Valves()

    def get_weather(self, city: str) -> str:
        """
        Get the current weather for a city.
        :param city: Name of the city to look up weather for
        :return: Weather summary as plain text
        """
        unit_flag = "m" if self.valves.units == "metric" else "u"
        url = f"https://wttr.in/{city}?format=3&{unit_flag}"
        try:
            resp = requests.get(url, timeout=5)
            resp.raise_for_status()
            return resp.text.strip()
        except requests.RequestException as e:
            return f"Weather lookup failed: {e}"

Paste this into Open WebUI → Workspace → Tools → Create Tool. Enable it on a model. Then ask “what’s the weather in Berlin?” and watch the model call get_weather("Berlin") and use the result.

A few things to notice:

The Valves inner class creates a config form in the UI: users can set units without touching code.
The docstring on each method is what the model reads to decide whether to call it. Write them clearly.
Return a string. The model gets that string as a tool result and weaves it into its answer.

Tools are sandboxed-ish, they run in the Open WebUI process with whatever network access the server has. We’ll come back to why that matters.

Functions: Hooking the Chat Lifecycle

Functions are different. The model never “decides” to call a Function, Functions run automatically as messages flow through the system. Think of them as middleware.

There are three types:

Type	When it runs	What it’s for
Filter	Before and after every message	Transform/inspect/block content
Pipe	Instead of a model call	Custom model sources, routing
Action	On user click (button in UI)	Post-processing, export, triggers

Filters are the most common and most useful for self-hosters. Let’s build one.

A PII Redaction Filter

Your prompts go to a local model, but maybe you’re logging them, or you’ve got multiple users on your instance and you don’t want someone accidentally pasting their credentials and having them end up in conversation history. This filter strips email addresses from outgoing prompts:

"""
title: Email Redaction Filter
author: sumguy
description: Strips email addresses from user messages before they reach the model
version: 0.1.0
required_open_webui_version: 0.3.0
"""

import re
from pydantic import BaseModel


class Filter:
    class Valves(BaseModel):
        enabled: bool = True
        replacement: str = "[REDACTED_EMAIL]"

    def __init__(self):
        self.valves = self.Valves()

    def inlet(self, body: dict, user: dict | None = None) -> dict:
        """
        Runs before the message reaches the model.
        Strips email addresses from user message content.
        """
        if not self.valves.enabled:
            return body

        email_pattern = r"[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}"
        messages = body.get("messages", [])

        for msg in messages:
            if msg.get("role") == "user" and isinstance(msg.get("content"), str):
                msg["content"] = re.sub(
                    email_pattern,
                    self.valves.replacement,
                    msg["content"]
                )

        body["messages"] = messages
        return body

    def outlet(self, body: dict, user: dict | None = None) -> dict:
        """
        Runs after the model responds.
        Pass-through here — we only care about inlet.
        """
        return body

The inlet method sees the request before the model does. The outlet method sees the response before it hits the UI. You can use both, either, or neither, just implement what you need.

Go to Workspace → Functions → Create Function, paste it in, then enable it globally or per-model in the admin settings.

Pipes (the other Function type) let you present arbitrary backends as model options in the dropdown. If you want to route certain conversations to a remote API, a different local model via a custom URL, or a completely different inference backend, a Pipe is how you do it. They’re more complex, but the pattern is the same: a pipe() method that receives the messages and returns a string or a generator for streaming.

Pipelines: The Separate Service

Pipelines is a whole different animal. It’s a standalone Python FastAPI service, you run it separately, point Open WebUI at it as an OpenAI-compatible endpoint, and it shows up as a model in your model dropdown.

Why bother? A few reasons:

Isolation: Heavy dependencies (LangChain, ChromaDB, sentence-transformers) don’t bloat the WebUI container.
Scaling: Run the pipeline service on a different machine with more RAM, a GPU, whatever.
Complex logic: Multi-step agent loops, retrieval-augmented generation with real document stores, custom authentication and rate limiting.

Here’s the Docker Compose setup to get it running alongside your existing stack:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - OPENAI_API_BASE_URLS=http://pipelines:9099
      - OPENAI_API_KEYS=your-pipelines-key
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - ollama
      - pipelines

  ollama:
    image: ollama/ollama:latest
    volumes:
      - ollama:/root/.ollama

  pipelines:
    image: ghcr.io/open-webui/pipelines:main
    ports:
      - "9099:9099"
    environment:
      - PIPELINES_API_KEY=your-pipelines-key
    volumes:
      - pipelines:/app/pipelines

volumes:
  open-webui:
  ollama:
  pipelines:

A Multi-Doc RAG Pipeline Skeleton

This isn’t a full RAG implementation (that deserves its own post), but here’s the skeleton that shows how a Pipeline is structured:

"""
title: Simple RAG Pipeline
author: sumguy
description: Retrieves relevant context from a document store before answering
version: 0.1.0
"""

from typing import Generator, Iterator, Union
from pydantic import BaseModel


class Pipeline:
    class Valves(BaseModel):
        # Config exposed in the WebUI admin panel
        collection_name: str = "my_docs"
        top_k: int = 3
        ollama_base_url: str = "http://ollama:11434"
        ollama_model: str = "llama3.1:8b"

    def __init__(self):
        self.valves = self.Valves()
        # Initialize your vector store client here
        # self.chroma = chromadb.HttpClient(host="chroma", port=8000)

    async def on_startup(self):
        """Called when the pipeline service starts."""
        print(f"RAG Pipeline started, collection: {self.valves.collection_name}")

    async def on_shutdown(self):
        """Called on shutdown."""
        pass

    def pipe(
        self,
        user_message: str,
        model_id: str,
        messages: list[dict],
        body: dict
    ) -> Union[str, Generator, Iterator]:
        """
        Main handler. Receives the user's message, retrieves context,
        then calls the LLM with augmented prompt.
        """
        # Step 1: retrieve relevant chunks from vector store
        context_chunks = self._retrieve(user_message)

        # Step 2: build augmented prompt
        context_str = "\n\n".join(context_chunks)
        augmented_prompt = (
            f"Answer based on this context:\n\n{context_str}\n\n"
            f"Question: {user_message}"
        )

        # Step 3: call your LLM (Ollama, OpenAI, whatever)
        # Here you'd use requests or the ollama client lib
        # and return a string or yield chunks for streaming
        return f"[RAG would answer here using context from {len(context_chunks)} chunks]"

    def _retrieve(self, query: str) -> list[str]:
        """
        Pull relevant document chunks from the vector store.
        Replace this with your actual retrieval logic.
        """
        # Example: return self.chroma.query(...)
        return [f"Placeholder chunk for query: {query}"]

Drop this in the pipelines/ volume directory, restart the pipeline service, and it shows up as a model option in Open WebUI. Full RAG setup means wiring in ChromaDB or Qdrant, an embedding model, and document ingestion, but the Pipeline wrapper here stays exactly this shape.

The Security Warning You Skipped

Here it is, and I’m going to say it clearly: Tools and Functions run arbitrary Python on your server with the network access and file permissions of the Open WebUI process.

If you install a Tool from the community hub without reading it, you’re running random code from the internet on your home server. That hub is great, there are hundreds of useful Tools for web search, calendar integration, home automation, but treat it like you’d treat a random GitHub repo. Read the code. It’s Python, it’s short, you can do this.

A few things to be especially paranoid about:

Tools that accept user-controlled input and pass it to subprocess, eval, or exec
Filters that exfiltrate message content to external URLs
Pipes that proxy your prompts through a third-party service without being explicit about it
Anything that reads from ~/.ssh/, env variables, or /etc/

Run Open WebUI as a non-root user with minimal filesystem access. Consider network policies if you’re running this on a machine with other sensitive services. The tool call happens server-side, not in a browser sandbox.

This isn’t a reason to avoid the extension system, it’s a reason to not mindlessly paste code from the community hub into production.

The Decision Rule

You’re staring at the UI wondering which extension point to use. Here’s the flowchart in plain English:

Use a Tool when: you want the model to optionally call something, APIs, calculations, lookups, and the model should decide when that’s appropriate. Weather, web search, calendar queries, code execution.

Use a Filter Function when: you need to transform every message automatically, without the model choosing. PII scrubbing, prompt injection, content moderation, logging, response post-processing. The user and model don’t need to know it’s happening.

Use a Pipe Function when: you want to present a custom backend as a model in the dropdown. Routing logic, A/B testing between models, wrapping a custom API as a “model.”

Use a Pipeline (the separate service) when: your use case is heavy, stateful, or has dependencies you don’t want inside the WebUI container. RAG with a real document store, agent loops with tool orchestration, multi-model chaining, anything that needs its own scaling story.

When in doubt: start with a Tool. They’re the simplest, they’re scoped to the model’s decision-making, and they’re easy to test by just asking the model to use them.

Where to Find More

The Open WebUI community hub has hundreds of Tools and Functions. Filter by stars, read the code, and remember the security note above before you click Install.

The official docs at docs.openwebui.com have the full Valves reference, streaming patterns for Pipes, and the Pipeline API spec. They’re actually pretty good once you know which section to look in.

Your local LLM setup is already more capable than most people running cloud-hosted chat. The extension system is what turns “a ChatGPT clone pointed at Ollama” into something genuinely tailored to your workflow, whether that’s a home automation assistant that controls your lights, a research tool that searches your local document library, or just a filter that stops you from accidentally asking your AI about your AWS credentials.

Start with a weather Tool. You’ll be writing RAG pipelines by next weekend. Your 2 AM self will appreciate having read this first.

Open WebUI Tools, Functions & Pipelines: Extend Your Local LLM

You Installed Open WebUI. Now What?

The Mental Model

Tools: Giving the Model a Screwdriver

A Weather Tool

Functions: Hooking the Chat Lifecycle

A PII Redaction Filter

Pipelines: The Separate Service

A Multi-Doc RAG Pipeline Skeleton

The Security Warning You Skipped

The Decision Rule

Where to Find More

Responses from around the web

Discussion

Related Posts

RAGAS: Evaluating RAG Without Vibes

KV Cache Quantization: Free LLM Context, Almost

Aider & Cline: Terminal AI Coding That Actually Ships

Mixture of Experts (MoE) for Self-Hosters, Demystified