### Use LLM to extract relations in a text

For this session you either need Ollama installed (and running) together with a local model, or an account set up on OpenAI together with an API key.  
  
**Ollama**  
You can find instructions on how to install Ollama [here](https://ollama.com).  
I will use a model named `qwen3:14b` in this notebook. If your computer is less powerful, you might want to use a smaller model, like `phi4-mini`.
Install a model by running `ollama run <model>` in your terminal. Beware that the models are big files so it might take a while to download them, especially if you have a slow internet connection.  
  
**OpenAI**  
You can sign up for an account on OpenAI [here](https://platform.openai.com/signup).  
After you have an account, you can find your API key [here](https://platform.openai.com/account/api-keys).  
  
The first code block below is a class that will set up an LLM connection for you, either with Ollama or OpenAI. This is so that we after that can use the same code, no matter which service you are using.  
  
Install the *ollama package* and *openai package* a code block like:  
```
%pip install ollama
%pip install openai
```

#### Make a model for struvtured LLM output
Read more about how Ollama is handling structured output [here](https://ollama.com/blog/structured-outputs)

In [None]:
from pydantic import BaseModel
from pydantic import BaseModel, Field

class Relation(BaseModel):
  person1: str = Field(description="The first person in the conversation")
  person2: str = Field(description="The second person in the conversation")
  relation: str = Field(description="The relationship between the two people")

class ResponseFormat(BaseModel):
  relations: list[Relation] = Field(
    description="A list of relationships between the two people in the episode"
  )

#### Initialize the LLM

In [None]:
class LLM:
    def __init__(self, OpenAI_key=False, model=None, temperature=0):
        """
        Args:
            OpenAI_key (str, optional): If you provide a key OpenAI will be used. Defaults to False.
            model (str, optional): The model to use. Defaults to None.
            temperature (int, optional): The temperature for generating text. Defaults to 0.
        """
        self.model = model
        self.temeprature = temperature

        # For use with OpenAI
        if OpenAI_key:
            from openai import OpenAI

            self.llm = OpenAI
            self.client = OpenAI(api_key=OpenAI_key)
            self.openai = True
            self.ollama = False
            if not model:
                self.model = "gpt-3.5-turbo"

        # For use with Ollama
        else:
            import ollama
            self.llm = ollama
            self.ollama = True
            self.openai = False

    def generate(self, prompt, response_model: ResponseFormat = None):

        ## For use with OpenAI
        if self.openai:
            chat_completion = self.client.chat.completions.create(
                messages=[{"role": "user", "content": prompt}],
                model=self.model,
                response_format=response_model
            )

            if response_model:
                answer = chat_completion.choices[0].message.parsed
            else:
                answer = chat_completion.choices[0].message.content
        
        # For use with Ollama
        if self.ollama:
            messages = [{"role": "user", "content": prompt}]
            if response_model:
                response_format = response_model.model_json_schema()
            else:
                response_format = None
            answer = self.llm.chat(
                messages=messages, model=self.model, format=response_format, options={"temperature": self.temeprature}
            ).message.content
            if response_model:
                answer = ResponseFormat.model_validate_json(answer)
            
        return answer

In [None]:
# Initiate the LLM class
llm = LLM(model='qwen3:14b',)

#### Prepare the text
1. Import the text.
2. Split it into episodes.
3. Make a dictionary of the episodes like {episode_name: episode_text}.

In [None]:
text = open('got.txt').read()

chunks = {}
for chunk in text.split('Game of Thrones:')[:15]: # Limit to 15 chunks
    episode = chunk.split('\n')[0]
    if len(chunk) > 100: # Filter out short chunks
        chunks[episode] = chunk

#### Extract all relations from the chunks
1. Define a function to extract relations
2. Try out a working prompt.
3. Loop throuh the chunks to create a list of relations.

In [None]:
from typing import List, Tuple, Dict, Any

def extract_relations(chunk):
    prompt = f'''/no_think
    The text below is an episode of Game of Thrones. I want to extract all relations from it.\n
    """{chunk}"""\n
    Answer with all relations between characters. I ONLY want the relations between characters. Nothing else like greetings or explanations.
    '''
    answer: ResponseFormat = llm.generate(prompt, response_model=ResponseFormat)
    return answer.relations

all_relations: List[Tuple[str, Relation]] = []
for episode, chunk in chunks.items():
    relations = extract_relations(chunk)
    for relation in relations:
        all_relations.append((episode, relation))      


#### Get more information on every relation
1. Define a function to extract information about a relation.
2. Try out a working prompt.
3. Loop though the relations to add intormation to each.

In [None]:
import re
relations_to_graph = []
for episode, relation in all_relations:
    prompt = f'''no_think
    In the text below {relation.person1} has a relation to {relation.person2} describes as "{relation.relation}". I want to know more about this relation.\n
    """{chunks[episode]}"""\n
    Describe the relation between {relation.person1} and {relation.person2} in more detail. 
    Answer ONLY with the description, nothing else like a greeting or explanation. 
    Use ONLY the information given, not your own knowledge.
    '''
    info = llm.generate(prompt)
    # Remove the <think> tags and everything in between
    info = re.sub(r'<think>.*?</think>', '', info, flags=re.DOTALL).strip()
    print(info)
    relations_to_graph.append({'from': relation.person1, 'to': relation.person2, 'label': relation.relation, 'info': info})

#### Prepare networkx
1. Install networkx with ```%pip install networkx```

#### Export a network file for use with Gephi
1. Import the networkx module.
2. Create the graph.
3. Export a .gexf file.

In [None]:
# Make a graph

print(relations_to_graph[0])
import networkx as nx
G = nx.DiGraph()

for relation in relations_to_graph:
    G.add_edge(relation['from'], relation['to'], label=relation['label'], info=relation['info'])

nx.write_gexf(G, 'got.gexf')




#### Inspect the network in Gephi Light
Go to [Gephi Light](https://gephi.org/gephi-lite/) and upload the .gexf file.

#### The relations can be used to create a chatbot about Game of Thrones
View my version on [lasseedfast.se/got](https://lasseedfast.se/got)  
*This is how it works:*
<br>  
![Arbetsflöde GoT](/Users/Lasse/dataharvest2025/ArbetsflödeGoT.png "Title")