Every major AI assistant — ChatGPT, Claude, Gemini — sends your prompts to a company’s servers, where they may be logged, reviewed, and used to train future models. For many tasks that’s acceptable. For anything involving sensitive business information, personal health questions, client data, or simply private thoughts, it’s not.

Local AI solves this problem entirely. When you run an AI model on your own computer, your prompts never leave your machine. No account required. No subscription. No data policy to worry about. And with a tool called Ollama, the setup takes about five minutes.

What Is Ollama?

Ollama is a free, open-source application that makes it simple to download and run large language models (LLMs) on your Mac, Windows, or Linux computer. It handles all the technical complexity — model formats, memory management, CPU/GPU routing — and gives you a clean command-line interface and a local API you can connect other tools to.

Think of it as an app store and runtime for AI models. You pick a model, run one command to download it, and start using it immediately.

What Can These Models Actually Do?

Open-source models have improved dramatically. In 2025, the best local models are competitive with GPT-3.5 for most practical tasks and, for some tasks, approach GPT-4 level performance. What they can do well:

  • Writing and editing — drafting emails, blog posts, reports, and summaries
  • Code assistance — writing, explaining, and debugging code in most languages
  • Research and summarization — condensing long documents, extracting key points
  • Brainstorming — generating ideas, outlines, and variations
  • Q&A and explanation — answering questions and explaining complex topics
  • Document analysis — analyzing text you paste in, with no upload required

What they’re less suited for today: tasks requiring real-time web search, highly complex multi-step reasoning, or state-of-the-art image generation (though image models exist locally too).

Hardware Requirements

You don’t need a powerful machine to run useful local AI. A rough guide:

Your ComputerRecommended Models
8GB RAM (any modern laptop)Llama 3.2 (3B), Gemma 3 (4B)
16GB RAMLlama 3.1 (8B), Mistral (7B), Gemma 3 (12B)
32GB+ RAMLlama 3.3 (70B), DeepSeek-R1 (32B)
Mac with Apple Silicon (M1/M2/M3/M4)Any model up to your RAM limit — Apple Silicon runs these models exceptionally well

Apple Silicon Macs are particularly well-suited for local AI: the unified memory architecture means the GPU and CPU share the same pool of RAM, allowing larger models to run much faster than on equivalent Intel/AMD hardware.

Getting Started: Step by Step

Step 1: Install Ollama

Visit ollama.com and download the installer for your platform. On Mac, it installs like any other app — drag to Applications, launch it, and it runs quietly in your menu bar.

Step 2: Download Your First Model

Open Terminal (Mac) or Command Prompt (Windows) and run:

ollama run llama3.2

Ollama will download the model (about 2GB) and immediately start a conversation. Type your message and press Enter.

That’s it. You’re running AI locally.

Step 3: Try Different Models

The best starting models for most users:

Llama 3.2 (3B) — Meta’s compact model. Fast, capable, good for general use on any modern computer.

Llama 3.1 (8B) — Noticeably more capable. Good for writing, analysis, and coding. Runs well on 16GB machines.

Mistral (7B) — Excellent for European languages and technical writing. Strong instruction-following.

Gemma 3 (12B) — Google’s open model. Excellent reasoning and document analysis. Requires 16GB RAM.

DeepSeek-R1 — Remarkable reasoning model from DeepSeek. The 7B and 14B variants run well on 16-32GB machines and handle complex logic and math impressively.

Qwen2.5-Coder — Purpose-built for code generation and explanation. Excellent if coding is your primary use case.

Run any of them with ollama run [model-name].

The command line works, but a chat interface is more comfortable for most people. Two excellent options:

Open WebUI — The most fully-featured local AI interface. Looks and works like ChatGPT, supports conversations, document uploads, image generation, and multiple models. Runs in your browser, connects to your local Ollama instance.

LM Studio — A polished desktop app for downloading and running models. Has its own model library and a clean chat interface. Good choice if you prefer an all-in-one tool without running a local web server.

Both are free and keep everything on your machine.

Practical Example: Analyzing a Confidential Document

One of the most useful local AI applications: asking questions about a document you can’t upload to a cloud service — a contract, a financial report, an internal memo.

With Open WebUI running locally, you can upload the document and ask:

  • “Summarize the key obligations in this contract.”
  • “What are the payment terms and termination clauses?”
  • “Are there any unusual provisions I should flag for my attorney?”

The document never touches the internet. The analysis happens in RAM on your own hardware.

Connecting Local AI to Other Tools

Ollama exposes a local API (at http://localhost:11434) that other tools can connect to. This means you can use local AI models as the backend for:

  • Cursor and VS Code — code editors with AI assistance using your local model
  • Obsidian — note-taking app with local AI plugins for summarization and linking
  • n8n — automation workflows that use local AI for text processing
  • Custom scripts and applications via a simple REST API

The Trade-Off Is Real, But Shrinking

Cloud AI models — especially GPT-4, Claude Sonnet, and Gemini 1.5 Pro — are still ahead of the best local models for highly complex reasoning, nuanced writing, and cutting-edge research tasks. If you need the absolute best output, the frontier cloud models deliver it.

But the gap is closing. For the majority of everyday tasks — writing, summarization, coding assistance, document analysis, and Q&A — local models running on a modern laptop deliver excellent results at zero cost and with complete privacy.

Getting Help

Setting up Ollama is straightforward, but connecting it to Open WebUI, configuring it to run automatically, or integrating it with other tools can get technical. If you’d like help setting up a local AI environment — including model recommendations based on your hardware — schedule a free consultation. We set these systems up regularly for individuals and businesses.

Frequently Asked Questions

What’s the most capable model I can run on a 16GB MacBook?

On a 16GB MacBook, Llama 3.1 8B and Gemma 3 12B are both strong choices that run comfortably within available memory. Apple Silicon Macs handle these models particularly well because of the unified memory architecture — the GPU and CPU share the same RAM pool, allowing faster inference than equivalent amounts of RAM on an Intel or AMD system. For most writing, summarization, and Q&A tasks, Llama 3.1 8B delivers results comparable to early GPT-3.5. Gemma 3 12B is noticeably better for reasoning and document analysis if you have the headroom.

Does running local AI drain battery significantly?

Yes, comparable to running a demanding game or video editing software. Inference (generating a response) is computationally intensive and runs your CPU and/or GPU at high utilization, which draws significant power. For occasional use this isn’t a concern — a typical conversation response takes seconds and then the hardware goes idle. For sustained, heavy use on battery, expect noticeable drain. Plugging in is advisable for long document analysis sessions.

Do I need to use the command line to use Ollama?

Not for day-to-day use. The initial installation and downloading your first model require one or two terminal commands, but after that, a graphical interface handles everything. Open WebUI looks and works like ChatGPT — it runs in your browser, supports document uploads, conversation history, and multiple models, with no command line required after setup. LM Studio is a standalone desktop app that’s even simpler. Most users do the one-time setup and never return to the terminal.

Can I use locally run AI models for commercial purposes?

Most open-source models permit commercial use, but the licenses vary. Llama models from Meta allow commercial use under their community license (with some restrictions at scale). Gemma and Mistral models also generally allow commercial use. Before deploying a model in a commercial application or using it to process client work, verify the specific model’s license on Hugging Face or the model card. For individuals and small businesses, commercial use of major open-source models is usually permitted.