A Python package for interacting with LLM models through Ollama, supporting both remote API and local Ollama instances.
A Python package for interacting with LLM models through Ollama, supporting both remote API and local Ollama instances.
## Requirements
- Python 3.8+
- Ollama 0.9.0+ for native thinking feature support
## Installation
## Installation
Install directly from GitHub:
Install directly from GitHub:
@ -34,6 +39,29 @@ This package requires:
- tiktoken: For token counting
- tiktoken: For token counting
- requests: For API communication
- requests: For API communication
## Version Compatibility
### Ollama v0.9.0 Native Thinking Support
This package leverages Ollama v0.9.0's native thinking feature. This allows models like qwen3, deepseek, and others to expose their reasoning process separately from their final answer.
- **Remote API:** If using a remote API, ensure it runs on Ollama v0.9.0+
- **Local Ollama:** Update to v0.9.0+ for native thinking support
- **Backward Compatibility:** The library will attempt to handle both native thinking and older tag-based thinking (`<think>` tags)
For the best experience with the thinking feature, ensure all Ollama instances (both local and remote) are updated to v0.9.0 or later.
| API Support | Native parameter and response field | Manual parsing of text tags |
| Content Separation | Clean separation of thinking and answer | Tags embedded in content |
| Access Method | `response.thinking` attribute | Text parsing of `<think>` tags |
| Streaming | Clean separation of thinking/content chunks | Manual detection of end tags |
| Reliability | More reliable, officially supported | Relies on model output format |
| Models | Works with all thinking-capable models | Works with models that follow tag conventions |
## Environment Variables
## Environment Variables
The package requires several environment variables to be set:
The package requires several environment variables to be set:
@ -110,6 +138,46 @@ async def main():
asyncio.run(main())
asyncio.run(main())
```
```
## License
### Using Thinking Mode
The library supports Ollama's native thinking feature (v0.9.0+), which allows you to see the reasoning process of the model before it provides its final answer.
```python
from llm_client import LLM
# Use with models that support thinking (qwen3, deepseek, etc.)
llm = LLM(model="reasoning")
# Enable thinking mode with the new native Ollama v0.9.0+ support
response = llm.generate(
query="What would be the impact of increasing carbon taxes by 10%?",