# llm_client A Python package for interacting with LLM models through Ollama, supporting both remote API and local Ollama instances. ## Requirements - Python 3.8+ - Ollama 0.9.0+ for native thinking feature support ## Installation Install directly from GitHub: ```bash pip install git+https://github.com/lasseedfast/_llm.git ``` Or clone and install for development: ```bash git clone https://github.com/lasseedfast/_llm.git cd _llm pip install -e . ``` Alternatively, after cloning, you can install all dependencies (including those from GitHub) using the provided script: ```bash bash install_deps.sh ``` ## Dependencies This package requires: - env_manager: `pip install git+https://github.com/lasseedfast/env_manager.git` - colorprinter: `pip install git+https://github.com/lasseedfast/colorprinter.git` - ollama: For local model inference - tiktoken: For token counting - requests: For API communication ## Version Compatibility ### Ollama v0.9.0 Native Thinking Support This package leverages Ollama v0.9.0's native thinking feature. This allows models like qwen3, deepseek, and others to expose their reasoning process separately from their final answer. - **Remote API:** If using a remote API, ensure it runs on Ollama v0.9.0+ - **Local Ollama:** Update to v0.9.0+ for native thinking support - **Backward Compatibility:** The library will attempt to handle both native thinking and older tag-based thinking (`` tags) For the best experience with the thinking feature, ensure all Ollama instances (both local and remote) are updated to v0.9.0 or later. ### Native Thinking vs. Tag-Based Thinking | Feature | Native Thinking (v0.9.0+) | Tag-Based Thinking (older) | |---------|--------------------------|---------------------------| | API Support | Native parameter and response field | Manual parsing of text tags | | Content Separation | Clean separation of thinking and answer | Tags embedded in content | | Access Method | `response.thinking` attribute | Text parsing of `` tags | | Streaming | Clean separation of thinking/content chunks | Manual detection of end tags | | Reliability | More reliable, officially supported | Relies on model output format | | Models | Works with all thinking-capable models | Works with models that follow tag conventions | ## Environment Variables The package requires several environment variables to be set: - `LLM_API_URL`: URL of the Ollama API - `LLM_API_USER`: Username for API authentication - `LLM_API_PWD_LASSE`: Password for API authentication - `LLM_MODEL`: Standard model name - `LLM_MODEL_SMALL`: Small model name - `LLM_MODEL_VISION`: Vision model name - `LLM_MODEL_LARGE`: Large context model name - `LLM_MODEL_REASONING`: Reasoning model name - `LLM_MODEL_TOOLS`: Tools model name These can be set in a `.env` file in your project directory or in the ArangoDB environment document in the div database. ## Basic Usage ```python from llm_client import LLM # Initialize the LLM llm = LLM() # Generate a response result = llm.generate( query="I want to add 2 and 2", ) print(result.content) ``` ## Advanced Usage ### Working with Images ```python from llm_client import LLM llm = LLM() response = llm.generate( query="What's in this image?", images=["path/to/image.jpg"], model="vision" ) ``` ### Streaming Responses ```python from llm_client import LLM llm = LLM() for chunk_type, chunk in llm.generate( query="Write a paragraph about AI", stream=True ): print(f"{chunk_type}: {chunk}") ``` ### Using Async API ```python import asyncio from llm_client import LLM async def main(): llm = LLM() response = await llm.async_generate( query="What is machine learning?", model="standard" ) print(response) asyncio.run(main()) ``` ### Using Thinking Mode The library supports Ollama's native thinking feature (v0.9.0+), which allows you to see the reasoning process of the model before it provides its final answer. ```python from llm_client import LLM # Use with models that support thinking (qwen3, deepseek, etc.) llm = LLM(model="reasoning") # Enable thinking mode with the new native Ollama v0.9.0+ support response = llm.generate( query="What would be the impact of increasing carbon taxes by 10%?", think=True ) # Access thinking content (model's reasoning process) if hasattr(response, 'thinking') and response.thinking: print("Model's reasoning process:") print(response.thinking) # Access final answer print("Final answer:") print(response.content) ``` When streaming with thinking enabled, you'll receive chunks with both types: ```python from llm_client import LLM llm = LLM(model="reasoning") for chunk_type, chunk in llm.generate( query="Solve this step by step: If x² + 3x - 10 = 0, what are the values of x?", stream=True, think=True ): if chunk_type == "thinking": print(f"Reasoning: {chunk}") else: print(f"Answer: {chunk}") ```