Utilities

Purpose:

This module provides utility-based functionality for the project.

Platform:

Linux/Windows | Python 3.10+

Developer:

J Berendt

Email:

development@s3dev.uk

Comments:

n/a

class Utilities[source]

Bases: object

General (cross-project) utility functions.

static build_project_outpath(subpath: str) str[source]

Build (and create) the path for project output files.

Parameters:
  • subpath (str) – The sub-path to be appended to the default

  • path. (~/Desktop/docp)

If the path does not exist, it will be created automatically.

Returns:

The full path as a string.

Return type:

str

static collect_files(path: str, ext: str = '**', recursive: bool = False) list[source]

Collect all files for a given extension from a path.

Parameters:
  • path (str) – Full path serving as the root for the search.

  • ext (str, optional) –

    If the path argument refers to a directory, a specific file extension can be specified here. For example: ext = 'pdf'.

    If anything other than '**' is provided, all alpha-characters are parsed from the string, and prefixed with *.. Meaning, if '.pdf' is passed, the characters 'pdf' are parsed and prefixed with *. to create '*.pdf'. However, if 'things.foo' is passed, the derived extension will be '*.thingsfoo'. Defaults to ‘**’, for an ‘everything’ or recursive search (if the resursive argument is passed as True).

  • recursive (bool, optional) – Instruct the search to recurse into sub-directories. Defaults to False.

Returns:

The list of full file paths returned by the glob call. Any directory-only paths are removed.

Return type:

list

static ispdf(path: str) bool[source]

Test the file signature. Verify this is a valid PDF file.

Parameters:

path (str) – Path to the file being tested.

Returns:

True if this is a valid PDF file, otherwise False.

Return type:

bool

static iszip(path: str) bool[source]

Test the file signature. Verify this is a valid ZIP archive.

Parameters:

path (str) – Path to the file being tested.

Returns:

True if this is a valid ZIP archive, otherwise False.

Return type:

bool

static parse_to_keywords(resp: str) str[source]

Parse the bot’s response into a list of keywords.

Parameters:

resp (str) – Text response directly from the bot.

The bullet points extracted must be in any of the following forms.

Asterisk as bullet points:

    • Spam

    • Eggs

Hyphen as bullet points:

    • Spam

    • Eggs

Numbered (1):

    1. Spam

    1. Eggs

Numbered (2):

  • 1) Spam

  • 2) Eggs

Returns:

A comma separated string of keywords extracted from the response, converted to lower case.

Return type:

str

static remove_duplicate_lines(text: str) str[source]

Remove any duplicated lines from the document.

Generally, this function will be used to remove repeated headers and footers from a document.

Parameters:

text (str) – A string containing text from which duplicated lines are to be removed.

Returns:

A string containing only the unique lines (or empty lines) from the provided text.

Return type:

str