|
|
2 months ago | |
|---|---|---|
| _llm | 2 months ago | |
| .gitignore | 3 months ago | |
| LICENSE | 7 months ago | |
| README.md | 7 months ago | |
| __init__.py | 2 months ago | |
| install_deps.sh | 7 months ago | |
| llm_client.py | 3 months ago | |
| setup.py | 7 months ago | |
README.md
llm_client
A Python package for interacting with LLM models through Ollama, supporting both remote API and local Ollama instances.
Requirements
- Python 3.8+
- Ollama 0.9.0+ for native thinking feature support
Installation
Install directly from Git:
pip install git+https://git.edfast.se/lasse/_llm.git
Or clone and install for development:
git clone https://git.edfast.se/lasse/_llm.git
cd _llm
pip install -e .
Alternatively, after cloning, you can install all dependencies (including those from git.edfast.se) using the provided script:
bash install_deps.sh
Dependencies
This package requires:
- env_manager:
pip install git+https://git.edfast.se/lasse/env_manager.git - colorprinter:
pip install git+https://git.edfast.se/lasse/colorprinter.git - ollama: For local model inference
- tiktoken: For token counting
- requests: For API communication
Version Compatibility
Ollama v0.9.0 Native Thinking Support
This package leverages Ollama v0.9.0's native thinking feature. This allows models like qwen3, deepseek, and others to expose their reasoning process separately from their final answer.
- Remote API: If using a remote API, ensure it runs on Ollama v0.9.0+
- Local Ollama: Update to v0.9.0+ for native thinking support
- Backward Compatibility: The library will attempt to handle both native thinking and older tag-based thinking (
<think>tags)
For the best experience with the thinking feature, ensure all Ollama instances (both local and remote) are updated to v0.9.0 or later.
Native Thinking vs. Tag-Based Thinking
| Feature | Native Thinking (v0.9.0+) | Tag-Based Thinking (older) |
|---|---|---|
| API Support | Native parameter and response field | Manual parsing of text tags |
| Content Separation | Clean separation of thinking and answer | Tags embedded in content |
| Access Method | response.thinking attribute |
Text parsing of <think> tags |
| Streaming | Clean separation of thinking/content chunks | Manual detection of end tags |
| Reliability | More reliable, officially supported | Relies on model output format |
| Models | Works with all thinking-capable models | Works with models that follow tag conventions |
Environment Variables
The package requires several environment variables to be set:
LLM_API_URL: URL of the Ollama APILLM_API_USER: Username for API authenticationLLM_API_PWD_LASSE: Password for API authenticationLLM_MODEL: Standard model nameLLM_MODEL_SMALL: Small model nameLLM_MODEL_VISION: Vision model nameLLM_MODEL_LARGE: Large context model nameLLM_MODEL_REASONING: Reasoning model nameLLM_MODEL_TOOLS: Tools model name
These can be set in a .env file in your project directory or in the ArangoDB environment document in the div database.
Basic Usage
from llm_client import LLM
# Initialize the LLM
llm = LLM()
# Generate a response
result = llm.generate(
query="I want to add 2 and 2",
)
print(result.content)
Advanced Usage
Working with Images
from llm_client import LLM
llm = LLM()
response = llm.generate(
query="What's in this image?",
images=["path/to/image.jpg"],
model="vision"
)
Streaming Responses
from llm_client import LLM
llm = LLM()
for chunk_type, chunk in llm.generate(
query="Write a paragraph about AI",
stream=True
):
print(f"{chunk_type}: {chunk}")
Using Async API
import asyncio
from llm_client import LLM
async def main():
llm = LLM()
response = await llm.async_generate(
query="What is machine learning?",
model="standard"
)
print(response)
asyncio.run(main())
Using Thinking Mode
The library supports Ollama's native thinking feature (v0.9.0+), which allows you to see the reasoning process of the model before it provides its final answer.
from llm_client import LLM
# Use with models that support thinking (qwen3, deepseek, etc.)
llm = LLM(model="reasoning")
# Enable thinking mode with the new native Ollama v0.9.0+ support
response = llm.generate(
query="What would be the impact of increasing carbon taxes by 10%?",
think=True
)
# Access thinking content (model's reasoning process)
if hasattr(response, 'thinking') and response.thinking:
print("Model's reasoning process:")
print(response.thinking)
# Access final answer
print("Final answer:")
print(response.content)
When streaming with thinking enabled, you'll receive chunks with both types:
from llm_client import LLM
llm = LLM(model="reasoning")
for chunk_type, chunk in llm.generate(
query="Solve this step by step: If x² + 3x - 10 = 0, what are the values of x?",
stream=True,
think=True
):
if chunk_type == "thinking":
print(f"Reasoning: {chunk}")
else:
print(f"Answer: {chunk}")