Knowledge Graph Generator – Full Process Documentation

1. Prepare a Text

Begin with a source text that will be used to generate a Knowledge Graph (KG).

2. Initialize a KnowledgeGraphGenerator Object

kggen = kg.KnowledgeGraphGenerator(
    model="openai/gpt-4o",
    api_key=os.getenv("OPENAI_API_KEY")
)

Steps 3–5 generate a KG tailored to the specific text. While these graphs remain faithful to the source, they are difficult to compare across texts since entities and relations are context-specific.

3. Initialize a New Graph

- If a text is provided, the function resets internal variables for new graph generation.
- If a JSON path is provided, it loads an existing graph:

kggen.init_graph(path=f"../../Ramban/json/{file_name}_c{chunk_size}.json")

4. Extract Entities

Uses an LLM to extract entities from the text. This function automatically calls init_graph, so Step 3 is not required if you call this function directly.

You need to load the text file into a variable and provide it to the function.

The chunk_size parameter determines how the text is analyzed:

kggen.extract_entities(text=text, chunk_size=chunk_size)

See Appendix A: Context Dilution for details.

5. Extract Relations

Uses an LLM to extract relations between the discovered entities.

kggen.extract_relations()

6. Converting Entities to Concepts

Step 4 may produce duplicate entities with the same meaning (due to orthographic, morphological, or synonymic variation). This step uses an LLM to merge them into concepts, where each concept has:

7. Relation Normalization

Step 5 may produce multiple relations with the same meaning. The package supports several approaches:

7.1 Using LLM to Merge Relations into Predicates

The LLM groups similar relations, assigns each cluster a canonical predicate, and lists alternative labels.

kggen.relations2ontology(["LLM"])

7.2 Align Relations to an Ontology

Provide a known ontology (full or partial). The system compares embeddings of detected relations to ontology relations and selects the closest match.

kggen.relations2ontology(["SKOS"])

7.3 LLM Validation of Step 7.2

(Planned): Use an LLM to confirm or correct embedding-based ontology mappings.

7.4 Multiple Ontologies

Provide multiple ontologies to overcome limitations of a single one. Each relation is mapped to the best matching relation across ontologies.

kggen.relations2ontology(["CIDOC_CRM", "SKOS", "DUBLIN_CORE"])

8. Visualization

Convert the normalized KG into a specific ontology and export to HTML for visualization:

ontology = "SKOS"  # Use "MIX" to visualize multiple ontologies
kggen.graph2Ontology(ontology)
viz = kggen.visualize(f"../../Ramban/vis/{file_name}_c{chunk_size}_ontology_{ontology}.html")

Appendix A: Context Dilution in Long Texts

When an LLM processes a long passage, attention is divided across many topics, leading to:

Shorter segments yield finer-grained concepts, but less global coherence.

Appendix B: Normalizing

Entities → Concepts

Entities are grouped into higher-level concepts. See entities2concepts for more details.

Relations

Extracted predicates may vary in form:

"is placed on"
"is part of"
"contains"
"dwells on"
"is above"
"is in"

Common Issues

Normalization Strategy

  1. Define a canonical predicate vocabulary, aligned to SKOS or custom properties:
  2. Group variants into canonical predicates:

Methods

Reference Ontologies

Dublin Core

dcterms:isPartOf
dcterms:hasPart
dcterms:isVersionOf
dcterms:hasVersion
dcterms:isFormatOf
dcterms:hasFormat
dcterms:references
dcterms:isReferencedBy
dcterms:relation

SKOS

skos:broader
skos:narrower
skos:related
skos:exactMatch
skos:closeMatch
skos:broadMatch
skos:narrowMatch
skos:relatedMatch

CIDOC CRM (selected)

P1_is_identified_by
P2_has_type
P3_has_note
P4_has_time_span
P5_consists_of
P7_took_place_at
P8_took_place_on_or_within
P10_falls_within
P11_had_participant
P12_occurred_in_the_presence_of
P13_destroyed
P14_carried_out_by
P15_was_influenced_by
P16_used_specific_object
P17_was_motivated_by
P19_was_intended_use_of
P20_had_specific_purpose
P21_had_general_purpose
P25_moved
P26_moved_to
P27_moved_from
P31_has_modified
P35_has_identified
P37_assigned
P38_deassigned
P39_measured
P40_observed_dimension
P43_has_dimension
P90_has_value
P94_has_created
P96_by_mother
P97_from_father
P98_brought_into_life
P100_was_death_of
P102_has_title
P104_is_subject_to
P105_right_held_by
P127_has_broader_term

Ontology Mapping Process

The system maps relations into existing ontologies: