Object: Document Objects

Purpose:

This module provides the implementation for the project’s Document object(s).

These objects contain a document’s page contents as well as any metadata associated with the document.

Important

The Document class is a docp-based implementation of LangChain’s Document object to decrease library dependencies and provide us flexibility to configure the object as needed.

However, this object must be (and remain) compatible with LangChain’s text splitters and Chroma objects, as they are passed directly into the these objects.

Platform:

Linux/Windows | Python 3.11+

Developer:

J Berendt

Email:

development@s3dev.uk

Comments:

n/a

class Document(page_content: str, *, metadata: dict = None)[source]

Bases: object

Object used to store a document’s content and metadata.

Parameters:
  • page_content (str) – A single string containing a page’s text content.

  • metadata (dict, optional) – Any metadata to be associated to the document. Defaults to None.

property metadata: dict

Accessor to a document’s metadata.

property page_content: str

Accessor to a document’s page contents as a single string.