Unleash the Power of LLMs with LocalAI

LocalAI is a remarkable open-source project that unlocks the potential of large language models (LLMs) and brings them directly to your own hardware. Think of it as a locally-hosted, self-contained alternative to cloud-based AI solutions like OpenAI’s GPT-3. With LocalAI, you gain:

Privacy and Control: Your data and AI interactions remain entirely within your own environment.
Reduced Costs: Avoid recurring usage fees associated with third-party cloud services.
Customization: Adapt and fine-tune models to suit your specific needs.
Offline Functionality: Use LocalAI even without an internet connection.

What Can You Do with LocalAI?

LocalAI’s capabilities closely mirror those of cloud-based AI APIs, including:

Text Generation: Create human-quality text, write different creative content forms, or translate languages.
Code Generation: Assist with coding tasks and generate code snippets.
Text Summarization: Condense lengthy text into concise summaries.
Question Answering: Provide answers to factual inquiries based on your data.
Image Generation (with certain models): Craft original images from text descriptions.

Why Choose LocalAI?

Open-Source: Benefit from a community-driven project with transparent development and the freedom to modify.
Cost-Effective: LocalAI is free to use, with the main cost being your hardware resources.
Hardware Flexibility: Run LocalAI on standard consumer-grade hardware (CPU-based), or leverage a GPU for accelerated performance if you have one.

Streamlined Installation with Docker

While installing LocalAI’s dependencies directly is possible, Docker significantly simplifies the process:

Prerequisites: Ensure you have Docker and Docker Compose installed.
Project Setup: Create a project directory and the docker-compose.yml file as outlined in the previous article.
Run with Docker Compose: Use the command docker-compose up -d.
Access LocalAI: Open http://localhost:8080 in your web browser.

Exploring LocalAI

LocalAI provides both a web interface and a REST API. Start experimenting with different models and discover the incredible capabilities of LLMs on your own machine. The LocalAI documentation offers in-depth guidance.

Docker compose

This docker compose

services:
  localai:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    image: quay.io/go-skynet/local-ai:master-cublas-cuda12-ffmpeg
    container_name: localai
    tty: true # enable colorized logs
    restart: unless-stopped
    ports:
      - 8080:8080
    env_file:
      - .env
    volumes:
      - ./models:/models:cached
      - ./images/:/tmp/generated/images/
    environment:
      - 'PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/gpt4all-j.yaml", "name": "gpt-3.5-turbo"},{"url": "github:go-skynet/model-gallery/mixtral-Q3.yaml", "name": "mixtral-Q3"},{"url": "github:go-skynet/model-gallery/stablediffusion.yamll", "name": "stablediffusion"},{"url": "github:go-skynet/model-gallery/whisper-base.yaml", "name": "whisper"}]'
      - MODELS_PATH=/models
    command: ["/usr/bin/local-ai" ]
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 20

Docker compose explanation

services:

localai: This defines a single Docker container service named ‘localai’.

deploy:

resources:
reservations:
devices:
driver: nvidia: This section reserves a single NVIDIA GPU for use by the container.
count: 1: Specifies that only one GPU should be reserved.
capabilities: [gpu]: Ensures the container has the necessary permissions to access and use the GPU.
This whole deploy section can be removed if you DON’T have a GPU.

image: quay.io/go-skynet/local-ai:master-cublas-cuda12-ffmpeg

Specifies the Docker image to use. This image is a CUDA 12-enabled version of LocalAI that includes FFmpeg for additional multimedia capabilities.
remove –ffmepg if you dont need to work with audio
remove –cude12 if you dont want cuda support or change to cuda11 if you are on older cuda.

container_name: localai

Gives the container a user-friendly name.

tty: true

Enables colorized logs in your terminal for better readability.

restart: unless-stopped

Ensures the container restarts automatically unless you explicitly stop it.

ports:

8080:8080 Maps port 8080 on your host machine to port 8080 inside the container, allowing you to access the LocalAI service.

env_file:

.env Loads environment variables from an external .env file. This file might contain sensitive API keys or other configuration settings.

volumes:

./models:/models:cached Mounts your local ./models directory into the container’s /models directory using ‘cached’ mode for potential performance improvements.
./images/:/tmp/generated/images/ Maps a local directory for storing images generated by LocalAI.

environment:

PRELOAD_MODELS=[…] Defines a list of models to automatically load at startup. It includes GPT-3.5-turbo, Mixtral-Q3, Stable Diffusion, and Whisper models. you can remove any you dont want.
gpt-3.5-turbo
URL: github:go-skynet/model-gallery/gpt4all-j.yaml
Name: gpt-3.5-turbo
Description: This likely refers to a GPT-3 like text generation model. The “turbo” designation might suggest optimizations for speed or efficiency.
mixtral-Q3
URL: github:go-skynet/model-gallery/mixtral-Q3.yaml
Name: mixtral-Q3
Description: This model’s purpose is less evident from the name alone. It might be a multilingual model, a code generation model, or something else entirely. You’ll likely need to refer to the project’s documentation for a more specific description.
stablediffusion
URL: github:go-skynet/model-gallery/stablediffusion.yaml
Name: stablediffusion
Description: This is a Stable Diffusion model, used for generating images from text descriptions.
whisper
URL: github:go-skynet/model-gallery/whisper-base.yaml
Name: whisper
Description: This is an automatic speech recognition (ASR) model from OpenAI, capable of transcribing audio into text. The “base” likely indicates it’s a smaller or foundational version of the model.
MODELS_PATH=/models specifies the model directory within the container.

command: [“/usr/bin/local-ai”]

This is the command executed when the container starts, launching the LocalAI application.

healthcheck:

Provides a way for Docker to monitor the container’s health. It periodically runs curl -f http://localhost:8080/readyz to check if LocalAI is responsive. The settings define the interval, timeout, and number of retries.

LocalAI provides a variety of Docker images to accommodate different hardware setups and model preferences. You can select the most suitable image by adjusting the image tag in your docker-compose.yml file. Here’s a breakdown of some common options:

Latest Development Versions:

quay.io/go-skynet/local-ai:master-cublas-cuda12 or quay.io/go-skynet/local-ai:master-cublas-cuda11: These images offer the latest features and models from the development branch. Choose the CUDA version (11 or 12) that matches your NVIDIA GPU. quay.io/go-skynet/local-ai:master: This image is for CPU-only systems without a compatible NVIDIA GPU. Stable Releases:

quay.io/go-skynet/local-ai:latest-cublas-cuda12 or quay.io/go-skynet/local-ai:latest-cublas-cuda11: These images provide tested and stable functionality, along with GPU support (choose the matching CUDA version). quay.io/go-skynet/local-ai:latest: Use this image if you don’t have a GPU. Important Notes:

GPU Acceleration: CUDA-enabled images are specifically designed for systems with a compatible NVIDIA GPU. Development vs. Stable: Choose development images if you want the absolute latest features but are willing to accept a potentially less stable experience. For the most reliable setup, use the stable releases.

Conclusion

LocalAI empowers developers, enthusiasts, and businesses to harness the power of AI in a private, flexible, and cost-effective manner. Its easy integration with Docker further enhances its usability. If you’re seeking to leverage large language models in your projects or research, LocalAI is an invaluable tool to explore.

Unleash the Power of LLMs with LocalAI

Responses from around the web

Discussion

Related Posts

Ollama: Powerful Language Models on Your Own Machine

LLM Backends: vLLM vs llama.cpp vs Ollama

Open WebUI vs LibreChat: Self-Hosted ChatGPT Alternatives Compared

Ollama Model Management: Beyond ollama run