Ollama Local Integration

Overview

Adist integrates with Ollama to provide AI-driven code analysis using locally-run language models. This option is completely free, private, and doesn’t require an internet connection for inference.

Benefits

Free

No API costs - run unlimited queries

Private

Your code never leaves your machine

Offline

Works without internet (after initial setup)

Setup

Install Ollama

Download and install Ollama from ollama.com/download

macOS
Linux
Windows

Download the macOS installer from the website and run it.

curl -fsSL https://ollama.com/install.sh | sh

Pull a Model

Download a language model. Popular options include:

Llama 3 (Recommended)
CodeLlama
Mistral

ollama pull llama3

Llama 3 offers excellent code understanding and generation.

ollama pull codellama

Specialized for code-related tasks.

ollama pull mistral

Fast and efficient general-purpose model.

Start Ollama Service

Ensure Ollama is running:

ollama serve

On most systems, Ollama runs as a background service automatically after installation.

Configure Adist

Run the LLM configuration command:

adist llm-config

Select:

Ollama as your provider
Your preferred model from the list of installed models
Optionally customize the API URL (default: http://localhost:11434)

Verify Setup

Test the integration:

adist query "What does this project do?"

Features

Local Model Support

The Ollama service can use any locally installed model:

# List available models
ollama list

# Pull additional models
ollama pull llama3:70b  # Larger, more capable version
ollama pull phi3         # Smaller, faster model

Context Caching

The Ollama service includes intelligent context caching:

Topic Identification: Automatically identifies query topics
Cache Duration: Contexts are cached for 30 minutes
Cache Cleanup: Old entries are automatically removed

Context merging is simpler in Ollama compared to cloud providers due to smaller context windows.

Query Complexity Estimation

Queries are analyzed and categorized as:

Low Complexity: Simple questions (< 8 words, no technical terms)
Medium Complexity: Standard questions (8-15 words or basic technical terms)
High Complexity: Complex questions (> 15 words, code snippets, comparisons)

Context allocation is optimized based on complexity.

Streaming Support

Ollama supports real-time streaming responses:

adist query "Explain the authentication system" --stream
adist chat --stream

Code Reference

The Ollama service is implemented in /home/daytona/workspace/source/src/utils/ollama.ts:20

Key Methods

isAvailable

Checks if Ollama is running:

async isAvailable(): Promise<boolean>

listModels

Returns all locally installed models:

async listModels(): Promise<string[]>

summarizeFile

Generates summaries of individual files:

async summarizeFile(content: string, filePath: string): Promise<SummaryResult>

generateOverallSummary

Creates a project overview from file summaries:

async generateOverallSummary(fileSummaries: { path: string; summary: string }[]): Promise<SummaryResult>

queryProject

Answers questions about your project:

async queryProject(
  query: string,
  context: { content: string; path: string }[],
  projectId: string,
  streamCallback?: (chunk: string) => void
): Promise<SummaryResult>

chatWithProject

Enables conversational interactions:

async chatWithProject(
  messages: { role: 'user' | 'assistant'; content: string }[],
  context: { content: string; path: string }[],
  projectId: string,
  streamCallback?: (chunk: string) => void
): Promise<SummaryResult>

Configuration Options

Context Limits

Maximum Context Length: 30,000 characters (lower than cloud providers)
Cache Timeout: 30 minutes
Dynamic Adjustment: Context size varies based on query complexity

Custom API URL

If you’re running Ollama on a different host or port:

# During llm-config, specify custom URL
API URL: http://your-server:11434

Model Selection

Different models have different characteristics:

Small Models (< 10GB)
Medium Models (10-40GB)
Code-Specialized

phi3: Fast, good for simple queries
llama3:8b: Balanced performance
mistral: General purpose

Best for: Quick answers, limited hardware

Performance Optimization

Hardware Requirements

Minimum
Recommended
Optimal

RAM: 8GB
GPU: Optional (CPU-only works)
Storage: 5GB for small models

Suitable for: Basic queries with small models

GPU Acceleration

Ollama automatically uses GPU acceleration when available:

NVIDIA GPUs: CUDA support (recommended)
Apple Silicon: Metal support
AMD GPUs: ROCm support (Linux)

GPU acceleration can be 10-100x faster than CPU-only inference.

Cost Comparison

Ollama is completely free:

API Costs: $0 (no API calls)
Inference: Free unlimited usage
Storage: Only disk space for models

vs Anthropic
vs OpenAI

Example: 1000 queries

Ollama: $0
Anthropic (Claude Sonnet): ~$3-10

Ollama breaks even immediately.

Best Practices

Model Selection
Performance
Quality

Start with llama3 for balanced performance
Use codellama for code-heavy projects
Try smaller models first if hardware is limited
Experiment with different models for your use case

Close other applications when using large models
Use GPU acceleration when available
Keep models updated: ollama pull <model>
Monitor resource usage with system tools

Ensure projects are well-indexed
Use descriptive variable and function names
Generate summaries: adist reindex --summarize
Try different models if quality varies

Troubleshooting

Ollama Not Running

If you see connection errors:

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama service
ollama serve

No Models Available

If no models appear during configuration:

# List installed models
ollama list

# Pull a model if none are installed
ollama pull llama3

Slow Responses

Use a smaller model (e.g., llama3:8b instead of llama3:70b)
Enable GPU acceleration
Reduce context complexity
Close other applications

Out of Memory

Switch to a smaller model
Reduce the number of concurrent queries
Increase system swap space
Use CPU instead of GPU if VRAM is limited

Poor Response Quality

Try a larger or specialized model
Ensure project is properly indexed
Use more specific queries
Generate file summaries for better context

Privacy and Security

While Ollama runs locally, ensure you:

Keep Ollama updated for security patches
Don’t expose the Ollama API to untrusted networks
Use firewall rules if running on a server

Privacy Benefits:

Code never sent to external APIs
No data collection or telemetry
Complete control over your data
Suitable for sensitive or proprietary code

Advanced Configuration

Custom Model Parameters

You can customize model behavior by creating a Modelfile:

# Create a custom model with specific parameters
cat > Modelfile <<EOF
FROM llama3
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM You are a helpful coding assistant specialized in code review.
EOF

# Create the custom model
ollama create my-code-assistant -f Modelfile

Then select my-code-assistant in adist llm-config.

Running on Remote Server

To use Ollama running on another machine:

Configure Ollama to accept remote connections
Update the API URL in adist llm-config
Ensure proper network security (VPN, firewall, etc.)

Next Steps

Start Querying

Ask questions about your codebase

Start Chatting

Have conversations about your project

Get Started

Core Features

LLM Integration

Guides

​Overview

​Benefits

Free

Private

Offline

​Setup

​Features

​Local Model Support

​Context Caching

​Query Complexity Estimation

​Streaming Support

​Code Reference

​Key Methods

​isAvailable

​listModels

​summarizeFile

​generateOverallSummary

​queryProject

​chatWithProject

​Configuration Options

​Context Limits

​Custom API URL

​Model Selection

​Performance Optimization

​Hardware Requirements

​GPU Acceleration

​Cost Comparison

​Best Practices

​Troubleshooting

​Ollama Not Running

​No Models Available

​Slow Responses

​Out of Memory

​Poor Response Quality

​Privacy and Security

​Advanced Configuration

​Custom Model Parameters

​Running on Remote Server

​Next Steps

Start Querying

Start Chatting

Overview

Benefits

Setup

Features

Local Model Support

Context Caching

Query Complexity Estimation

Streaming Support

Code Reference

Key Methods

isAvailable

listModels

summarizeFile

generateOverallSummary

queryProject

chatWithProject

Configuration Options

Context Limits

Custom API URL

Model Selection

Performance Optimization

Hardware Requirements

GPU Acceleration

Cost Comparison

Best Practices

Troubleshooting

Ollama Not Running

No Models Available

Slow Responses

Out of Memory

Poor Response Quality

Privacy and Security

Advanced Configuration

Custom Model Parameters

Running on Remote Server

Next Steps