Begin with a source text that will be used to generate a Knowledge Graph (KG).
kggen = kg.KnowledgeGraphGenerator(
model="openai/gpt-4o",
api_key=os.getenv("OPENAI_API_KEY")
)
Steps 3–5 generate a KG tailored to the specific text. While these graphs remain faithful to the source, they are difficult to compare across texts since entities and relations are context-specific.
- If a text is provided, the function resets internal variables for new graph generation.
- If a JSON path is provided, it loads an existing graph:
kggen.init_graph(path=f"../../Ramban/json/{file_name}_c{chunk_size}.json")
Uses an LLM to extract entities from the text. This function automatically calls init_graph, so Step 3 is not required if you call this function directly.
You need to load the text file into a variable and provide it to the function.
The chunk_size parameter determines how the text is analyzed:
chunk_size=0: process the text as a single chunk (may dilute context).chunk_size>0: split into multiple chunks, generate subgraphs, and merge them.kggen.extract_entities(text=text, chunk_size=chunk_size)
See Appendix A: Context Dilution for details.
Uses an LLM to extract relations between the discovered entities.
kggen.extract_relations()
Step 4 may produce duplicate entities with the same meaning (due to orthographic, morphological, or synonymic variation). This step uses an LLM to merge them into concepts, where each concept has:
Step 5 may produce multiple relations with the same meaning. The package supports several approaches:
The LLM groups similar relations, assigns each cluster a canonical predicate, and lists alternative labels.
kggen.relations2ontology(["LLM"])
Provide a known ontology (full or partial). The system compares embeddings of detected relations to ontology relations and selects the closest match.
kggen.relations2ontology(["SKOS"])
(Planned): Use an LLM to confirm or correct embedding-based ontology mappings.
Provide multiple ontologies to overcome limitations of a single one. Each relation is mapped to the best matching relation across ontologies.
kggen.relations2ontology(["CIDOC_CRM", "SKOS", "DUBLIN_CORE"])
Convert the normalized KG into a specific ontology and export to HTML for visualization:
ontology = "SKOS" # Use "MIX" to visualize multiple ontologies
kggen.graph2Ontology(ontology)
viz = kggen.visualize(f"../../Ramban/vis/{file_name}_c{chunk_size}_ontology_{ontology}.html")
When an LLM processes a long passage, attention is divided across many topics, leading to:
Shorter segments yield finer-grained concepts, but less global coherence.
Entities are grouped into higher-level concepts. See entities2concepts for more details.
Extracted predicates may vary in form:
"is placed on"
"is part of"
"contains"
"dwells on"
"is above"
"is in"
skos:broader / skos:narrower → hierarchicalskos:related → associativeex:contains, ex:isIn, ex:dwellsOn)contains → Variants: is in, is placed inisPartOf → Variants: is part of, belongs toisAbove → Variants: is over, stands upondcterms:isPartOf
dcterms:hasPart
dcterms:isVersionOf
dcterms:hasVersion
dcterms:isFormatOf
dcterms:hasFormat
dcterms:references
dcterms:isReferencedBy
dcterms:relation
skos:broader
skos:narrower
skos:related
skos:exactMatch
skos:closeMatch
skos:broadMatch
skos:narrowMatch
skos:relatedMatch
P1_is_identified_by
P2_has_type
P3_has_note
P4_has_time_span
P5_consists_of
P7_took_place_at
P8_took_place_on_or_within
P10_falls_within
P11_had_participant
P12_occurred_in_the_presence_of
P13_destroyed
P14_carried_out_by
P15_was_influenced_by
P16_used_specific_object
P17_was_motivated_by
P19_was_intended_use_of
P20_had_specific_purpose
P21_had_general_purpose
P25_moved
P26_moved_to
P27_moved_from
P31_has_modified
P35_has_identified
P37_assigned
P38_deassigned
P39_measured
P40_observed_dimension
P43_has_dimension
P90_has_value
P94_has_created
P96_by_mother
P97_from_father
P98_brought_into_life
P100_was_death_of
P102_has_title
P104_is_subject_to
P105_right_held_by
P127_has_broader_term
The system maps relations into existing ontologies:
Predicate objects with:
skos:broader, dcterms:isPartOf)