You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
183 lines
4.9 KiB
183 lines
4.9 KiB
# llm_client |
|
|
|
A Python package for interacting with LLM models through Ollama, supporting both remote API and local Ollama instances. |
|
|
|
## Requirements |
|
|
|
- Python 3.8+ |
|
- Ollama 0.9.0+ for native thinking feature support |
|
|
|
## Installation |
|
|
|
Install directly from GitHub: |
|
|
|
```bash |
|
pip install git+https://github.com/lasseedfast/_llm.git |
|
``` |
|
|
|
Or clone and install for development: |
|
|
|
```bash |
|
git clone https://github.com/lasseedfast/_llm.git |
|
cd _llm |
|
pip install -e . |
|
``` |
|
|
|
Alternatively, after cloning, you can install all dependencies (including those from GitHub) using the provided script: |
|
|
|
```bash |
|
bash install_deps.sh |
|
``` |
|
|
|
## Dependencies |
|
|
|
This package requires: |
|
|
|
- env_manager: `pip install git+https://github.com/lasseedfast/env_manager.git` |
|
- colorprinter: `pip install git+https://github.com/lasseedfast/colorprinter.git` |
|
- ollama: For local model inference |
|
- tiktoken: For token counting |
|
- requests: For API communication |
|
|
|
## Version Compatibility |
|
|
|
### Ollama v0.9.0 Native Thinking Support |
|
|
|
This package leverages Ollama v0.9.0's native thinking feature. This allows models like qwen3, deepseek, and others to expose their reasoning process separately from their final answer. |
|
|
|
- **Remote API:** If using a remote API, ensure it runs on Ollama v0.9.0+ |
|
- **Local Ollama:** Update to v0.9.0+ for native thinking support |
|
- **Backward Compatibility:** The library will attempt to handle both native thinking and older tag-based thinking (`<think>` tags) |
|
|
|
For the best experience with the thinking feature, ensure all Ollama instances (both local and remote) are updated to v0.9.0 or later. |
|
|
|
### Native Thinking vs. Tag-Based Thinking |
|
|
|
| Feature | Native Thinking (v0.9.0+) | Tag-Based Thinking (older) | |
|
|---------|--------------------------|---------------------------| |
|
| API Support | Native parameter and response field | Manual parsing of text tags | |
|
| Content Separation | Clean separation of thinking and answer | Tags embedded in content | |
|
| Access Method | `response.thinking` attribute | Text parsing of `<think>` tags | |
|
| Streaming | Clean separation of thinking/content chunks | Manual detection of end tags | |
|
| Reliability | More reliable, officially supported | Relies on model output format | |
|
| Models | Works with all thinking-capable models | Works with models that follow tag conventions | |
|
|
|
## Environment Variables |
|
|
|
The package requires several environment variables to be set: |
|
|
|
- `LLM_API_URL`: URL of the Ollama API |
|
- `LLM_API_USER`: Username for API authentication |
|
- `LLM_API_PWD_LASSE`: Password for API authentication |
|
- `LLM_MODEL`: Standard model name |
|
- `LLM_MODEL_SMALL`: Small model name |
|
- `LLM_MODEL_VISION`: Vision model name |
|
- `LLM_MODEL_LARGE`: Large context model name |
|
- `LLM_MODEL_REASONING`: Reasoning model name |
|
- `LLM_MODEL_TOOLS`: Tools model name |
|
|
|
These can be set in a `.env` file in your project directory or in the ArangoDB environment document in the div database. |
|
|
|
## Basic Usage |
|
|
|
```python |
|
from llm_client import LLM |
|
|
|
# Initialize the LLM |
|
llm = LLM() |
|
|
|
# Generate a response |
|
result = llm.generate( |
|
query="I want to add 2 and 2", |
|
) |
|
print(result.content) |
|
``` |
|
|
|
## Advanced Usage |
|
|
|
### Working with Images |
|
|
|
```python |
|
from llm_client import LLM |
|
|
|
llm = LLM() |
|
response = llm.generate( |
|
query="What's in this image?", |
|
images=["path/to/image.jpg"], |
|
model="vision" |
|
) |
|
``` |
|
|
|
### Streaming Responses |
|
|
|
```python |
|
from llm_client import LLM |
|
|
|
llm = LLM() |
|
for chunk_type, chunk in llm.generate( |
|
query="Write a paragraph about AI", |
|
stream=True |
|
): |
|
print(f"{chunk_type}: {chunk}") |
|
``` |
|
|
|
### Using Async API |
|
|
|
```python |
|
import asyncio |
|
from llm_client import LLM |
|
|
|
async def main(): |
|
llm = LLM() |
|
response = await llm.async_generate( |
|
query="What is machine learning?", |
|
model="standard" |
|
) |
|
print(response) |
|
|
|
asyncio.run(main()) |
|
``` |
|
|
|
### Using Thinking Mode |
|
|
|
The library supports Ollama's native thinking feature (v0.9.0+), which allows you to see the reasoning process of the model before it provides its final answer. |
|
|
|
```python |
|
from llm_client import LLM |
|
|
|
# Use with models that support thinking (qwen3, deepseek, etc.) |
|
llm = LLM(model="reasoning") |
|
|
|
# Enable thinking mode with the new native Ollama v0.9.0+ support |
|
response = llm.generate( |
|
query="What would be the impact of increasing carbon taxes by 10%?", |
|
think=True |
|
) |
|
|
|
# Access thinking content (model's reasoning process) |
|
if hasattr(response, 'thinking') and response.thinking: |
|
print("Model's reasoning process:") |
|
print(response.thinking) |
|
|
|
# Access final answer |
|
print("Final answer:") |
|
print(response.content) |
|
``` |
|
|
|
When streaming with thinking enabled, you'll receive chunks with both types: |
|
|
|
```python |
|
from llm_client import LLM |
|
|
|
llm = LLM(model="reasoning") |
|
|
|
for chunk_type, chunk in llm.generate( |
|
query="Solve this step by step: If x² + 3x - 10 = 0, what are the values of x?", |
|
stream=True, |
|
think=True |
|
): |
|
if chunk_type == "thinking": |
|
print(f"Reasoning: {chunk}") |
|
else: |
|
print(f"Answer: {chunk}") |
|
```
|
|
|