Utilities
- Purpose:
This module provides utility-based functionality for the project.
- Platform:
Linux/Windows | Python 3.10+
- Developer:
J Berendt
- Email:
- Comments:
n/a
- class Utilities[source]
Bases:
objectGeneral (cross-project) utility functions.
- static build_project_outpath(subpath: str) str[source]
Build (and create) the path for project output files.
- Parameters:
subpath (str) – The sub-path to be appended to the default
path. (~/Desktop/docp)
If the path does not exist, it will be created automatically.
- Returns:
The full path as a string.
- Return type:
str
- static collect_files(path: str, ext: str = '**', recursive: bool = False) list[source]
Collect all files for a given extension from a path.
- Parameters:
path (str) – Full path serving as the root for the search.
ext (str, optional) –
If the
pathargument refers to a directory, a specific file extension can be specified here. For example:ext = 'pdf'.If anything other than
'**'is provided, all alpha-characters are parsed from the string, and prefixed with*.. Meaning, if'.pdf'is passed, the characters'pdf'are parsed and prefixed with*.to create'*.pdf'. However, if'things.foo'is passed, the derived extension will be'*.thingsfoo'. Defaults to ‘**’, for an ‘everything’ or recursive search (if theresursiveargument is passed as True).recursive (bool, optional) – Instruct the search to recurse into sub-directories. Defaults to False.
- Returns:
The list of full file paths returned by the
globcall. Any directory-only paths are removed.- Return type:
list
- static ispdf(path: str) bool[source]
Test the file signature. Verify this is a valid PDF file.
- Parameters:
path (str) – Path to the file being tested.
- Returns:
True if this is a valid PDF file, otherwise False.
- Return type:
bool
- static iszip(path: str) bool[source]
Test the file signature. Verify this is a valid ZIP archive.
- Parameters:
path (str) – Path to the file being tested.
- Returns:
True if this is a valid ZIP archive, otherwise False.
- Return type:
bool
- static parse_to_keywords(resp: str) str[source]
Parse the bot’s response into a list of keywords.
- Parameters:
resp (str) – Text response directly from the bot.
The bullet points extracted must be in any of the following forms.
Asterisk as bullet points:
Spam
Eggs
Hyphen as bullet points:
Spam
Eggs
Numbered (1):
Spam
Eggs
Numbered (2):
1) Spam
2) Eggs
- Returns:
A comma separated string of keywords extracted from the response, converted to lower case.
- Return type:
str
- static remove_duplicate_lines(text: str) str[source]
Remove any duplicated lines from the document.
Generally, this function will be used to remove repeated headers and footers from a document.
- Parameters:
text (str) – A string containing text from which duplicated lines are to be removed.
- Returns:
A string containing only the unique lines (or empty lines) from the provided text.
- Return type:
str