Lasse Server 1170c95f22 Remove unused function parameters in LLM class and add notes for tool selection implementation		2 months ago
_llm	Remove unused function parameters in LLM class and add notes for tool selection implementation	2 months ago
.gitignore	Update .gitignore: add Thumbs.db and test_ollama_embeddings.py to ignored files	3 months ago
LICENSE	first commit	7 months ago
README.md	Update installation instructions and to remove GitHub and add git.edfast.se	7 months ago
__init__.py	This is now vLLM only	2 months ago
install_deps.sh	Update README and install script: add installation instructions and create dependency installation script	7 months ago
llm_client.py	Refactor public API to include register_tool and get_tools; enhance LLM class with response normalization	3 months ago
setup.py	Update installation instructions and to remove GitHub and add git.edfast.se	7 months ago

README.md

llm_client

A Python package for interacting with LLM models through Ollama, supporting both remote API and local Ollama instances.

Requirements

Python 3.8+
Ollama 0.9.0+ for native thinking feature support

Installation

Install directly from Git:

pip install git+https://git.edfast.se/lasse/_llm.git

Or clone and install for development:

git clone https://git.edfast.se/lasse/_llm.git
cd _llm
pip install -e .

Alternatively, after cloning, you can install all dependencies (including those from git.edfast.se) using the provided script:

bash install_deps.sh

Dependencies

This package requires:

env_manager: pip install git+https://git.edfast.se/lasse/env_manager.git
colorprinter: pip install git+https://git.edfast.se/lasse/colorprinter.git
ollama: For local model inference
tiktoken: For token counting
requests: For API communication

Version Compatibility

Ollama v0.9.0 Native Thinking Support

This package leverages Ollama v0.9.0's native thinking feature. This allows models like qwen3, deepseek, and others to expose their reasoning process separately from their final answer.

Remote API: If using a remote API, ensure it runs on Ollama v0.9.0+
Local Ollama: Update to v0.9.0+ for native thinking support
Backward Compatibility: The library will attempt to handle both native thinking and older tag-based thinking (<think> tags)

For the best experience with the thinking feature, ensure all Ollama instances (both local and remote) are updated to v0.9.0 or later.

Native Thinking vs. Tag-Based Thinking

Feature	Native Thinking (v0.9.0+)	Tag-Based Thinking (older)
API Support	Native parameter and response field	Manual parsing of text tags
Content Separation	Clean separation of thinking and answer	Tags embedded in content
Access Method	`response.thinking` attribute	Text parsing of `<think>` tags
Streaming	Clean separation of thinking/content chunks	Manual detection of end tags
Reliability	More reliable, officially supported	Relies on model output format
Models	Works with all thinking-capable models	Works with models that follow tag conventions

Environment Variables

The package requires several environment variables to be set:

LLM_API_URL: URL of the Ollama API
LLM_API_USER: Username for API authentication
LLM_API_PWD_LASSE: Password for API authentication
LLM_MODEL: Standard model name
LLM_MODEL_SMALL: Small model name
LLM_MODEL_VISION: Vision model name
LLM_MODEL_LARGE: Large context model name
LLM_MODEL_REASONING: Reasoning model name
LLM_MODEL_TOOLS: Tools model name

These can be set in a .env file in your project directory or in the ArangoDB environment document in the div database.

Basic Usage

from llm_client import LLM

# Initialize the LLM
llm = LLM()

# Generate a response
result = llm.generate(
    query="I want to add 2 and 2",
)
print(result.content)

Advanced Usage

Working with Images

from llm_client import LLM

llm = LLM()
response = llm.generate(
    query="What's in this image?",
    images=["path/to/image.jpg"],
    model="vision"
)

Streaming Responses

from llm_client import LLM

llm = LLM()
for chunk_type, chunk in llm.generate(
    query="Write a paragraph about AI",
    stream=True
):
    print(f"{chunk_type}: {chunk}")

Using Async API

import asyncio
from llm_client import LLM

async def main():
    llm = LLM()
    response = await llm.async_generate(
        query="What is machine learning?",
        model="standard"
    )
    print(response)

asyncio.run(main())

Using Thinking Mode

The library supports Ollama's native thinking feature (v0.9.0+), which allows you to see the reasoning process of the model before it provides its final answer.

from llm_client import LLM

# Use with models that support thinking (qwen3, deepseek, etc.)
llm = LLM(model="reasoning")

# Enable thinking mode with the new native Ollama v0.9.0+ support
response = llm.generate(
    query="What would be the impact of increasing carbon taxes by 10%?",
    think=True
)

# Access thinking content (model's reasoning process)
if hasattr(response, 'thinking') and response.thinking:
    print("Model's reasoning process:")
    print(response.thinking)

# Access final answer
print("Final answer:")
print(response.content)

When streaming with thinking enabled, you'll receive chunks with both types:

from llm_client import LLM

llm = LLM(model="reasoning")

for chunk_type, chunk in llm.generate(
    query="Solve this step by step: If x² + 3x - 10 = 0, what are the values of x?",
    stream=True,
    think=True
):
    if chunk_type == "thinking":
        print(f"Reasoning: {chunk}")
    else:
        print(f"Answer: {chunk}")