You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Lasse Server 1170c95f22 Remove unused function parameters in LLM class and add notes for tool selection implementation 2 months ago
_llm Remove unused function parameters in LLM class and add notes for tool selection implementation 2 months ago
.gitignore Update .gitignore: add Thumbs.db and test_ollama_embeddings.py to ignored files 3 months ago
LICENSE first commit 7 months ago
README.md Update installation instructions and to remove GitHub and add git.edfast.se 7 months ago
__init__.py This is now vLLM only 2 months ago
install_deps.sh Update README and install script: add installation instructions and create dependency installation script 7 months ago
llm_client.py Refactor public API to include register_tool and get_tools; enhance LLM class with response normalization 3 months ago
setup.py Update installation instructions and to remove GitHub and add git.edfast.se 7 months ago

README.md

llm_client

A Python package for interacting with LLM models through Ollama, supporting both remote API and local Ollama instances.

Requirements

  • Python 3.8+
  • Ollama 0.9.0+ for native thinking feature support

Installation

Install directly from Git:

pip install git+https://git.edfast.se/lasse/_llm.git

Or clone and install for development:

git clone https://git.edfast.se/lasse/_llm.git
cd _llm
pip install -e .

Alternatively, after cloning, you can install all dependencies (including those from git.edfast.se) using the provided script:

bash install_deps.sh

Dependencies

This package requires:

  • env_manager: pip install git+https://git.edfast.se/lasse/env_manager.git
  • colorprinter: pip install git+https://git.edfast.se/lasse/colorprinter.git
  • ollama: For local model inference
  • tiktoken: For token counting
  • requests: For API communication

Version Compatibility

Ollama v0.9.0 Native Thinking Support

This package leverages Ollama v0.9.0's native thinking feature. This allows models like qwen3, deepseek, and others to expose their reasoning process separately from their final answer.

  • Remote API: If using a remote API, ensure it runs on Ollama v0.9.0+
  • Local Ollama: Update to v0.9.0+ for native thinking support
  • Backward Compatibility: The library will attempt to handle both native thinking and older tag-based thinking (<think> tags)

For the best experience with the thinking feature, ensure all Ollama instances (both local and remote) are updated to v0.9.0 or later.

Native Thinking vs. Tag-Based Thinking

Feature Native Thinking (v0.9.0+) Tag-Based Thinking (older)
API Support Native parameter and response field Manual parsing of text tags
Content Separation Clean separation of thinking and answer Tags embedded in content
Access Method response.thinking attribute Text parsing of <think> tags
Streaming Clean separation of thinking/content chunks Manual detection of end tags
Reliability More reliable, officially supported Relies on model output format
Models Works with all thinking-capable models Works with models that follow tag conventions

Environment Variables

The package requires several environment variables to be set:

  • LLM_API_URL: URL of the Ollama API
  • LLM_API_USER: Username for API authentication
  • LLM_API_PWD_LASSE: Password for API authentication
  • LLM_MODEL: Standard model name
  • LLM_MODEL_SMALL: Small model name
  • LLM_MODEL_VISION: Vision model name
  • LLM_MODEL_LARGE: Large context model name
  • LLM_MODEL_REASONING: Reasoning model name
  • LLM_MODEL_TOOLS: Tools model name

These can be set in a .env file in your project directory or in the ArangoDB environment document in the div database.

Basic Usage

from llm_client import LLM

# Initialize the LLM
llm = LLM()

# Generate a response
result = llm.generate(
    query="I want to add 2 and 2",
)
print(result.content)

Advanced Usage

Working with Images

from llm_client import LLM

llm = LLM()
response = llm.generate(
    query="What's in this image?",
    images=["path/to/image.jpg"],
    model="vision"
)

Streaming Responses

from llm_client import LLM

llm = LLM()
for chunk_type, chunk in llm.generate(
    query="Write a paragraph about AI",
    stream=True
):
    print(f"{chunk_type}: {chunk}")

Using Async API

import asyncio
from llm_client import LLM

async def main():
    llm = LLM()
    response = await llm.async_generate(
        query="What is machine learning?",
        model="standard"
    )
    print(response)

asyncio.run(main())

Using Thinking Mode

The library supports Ollama's native thinking feature (v0.9.0+), which allows you to see the reasoning process of the model before it provides its final answer.

from llm_client import LLM

# Use with models that support thinking (qwen3, deepseek, etc.)
llm = LLM(model="reasoning")

# Enable thinking mode with the new native Ollama v0.9.0+ support
response = llm.generate(
    query="What would be the impact of increasing carbon taxes by 10%?",
    think=True
)

# Access thinking content (model's reasoning process)
if hasattr(response, 'thinking') and response.thinking:
    print("Model's reasoning process:")
    print(response.thinking)

# Access final answer
print("Final answer:")
print(response.content)

When streaming with thinking enabled, you'll receive chunks with both types:

from llm_client import LLM

llm = LLM(model="reasoning")

for chunk_type, chunk in llm.generate(
    query="Solve this step by step: If x² + 3x - 10 = 0, what are the values of x?",
    stream=True,
    think=True
):
    if chunk_type == "thinking":
        print(f"Reasoning: {chunk}")
    else:
        print(f"Answer: {chunk}")