fm_index
An FM-index for efficient full-text search on a single string.
The FM-index is a compressed text index based on the Burrows–Wheeler Transform (BWT).
It supports fast substring queries whose runtime depends only on the pattern length,
not on the size of the indexed text.
Construction
Time / Space Complexity
- Time:
O(N log σ) - Space:
O(N log σ)
where:
N= length of the indexed stringσ= size of the alphabet (2⁸ for UCS-1, 2¹⁶ for UCS-2, etc.)
from fm_index import FMIndex
fm = FMIndex("mississippi")
Convert the FMIndex back into the original string.
Complexity
- Time:
O(N log σ) - Space:
O(N)
Examples
fm.item()
# 'mississippi'
Check whether the indexed string contains the given pattern.
Complexity
- Time:
O(|pattern| log σ) - Space:
O(|pattern|)
Examples
fm.contains("issi")
# True
Count how many times a pattern appears in the indexed string.
Complexity
- Time:
O(|pattern| log σ) - Space:
O(|pattern|)
Examples
fm.count("issi")
# 2
Locate all starting positions of the pattern in the indexed string.
⚠️ Order of returned positions is not guaranteed.
Complexity
- Time:
O((|pattern| + |count|) log σ) - Space:
O(|pattern| + |count|)
Examples
fm.locate("issi")
# [4, 1]
A multi-document FM-index for fast substring search across multiple strings.
Internally, all strings are concatenated with separators and indexed as a single FM-index,
while preserving the ability to map matches back to their original documents.
Construction
Time / Space Complexity
- Time:
O(S log σ) - Space:
O(S log σ)
where:
S= total length of all indexed stringsσ= size of the alphabet (2⁸ for UCS-1, 2¹⁶ for UCS-2, etc.)
from fm_index import MultiFMIndex
mfm = MultiFMIndex(["abcabcabcabc", "xxabcabcxxabc", "abcababcabc"])
Convert the index back into the original list of strings.
Complexity
- Time:
O(S log σ) - Space:
O(S)
Examples
mfm.item()
# ['abcabcabcabc', 'xxabcabcxxabc', 'abcababcabc']
Check if the pattern exists as a full document in the index.
Complexity
- Time:
O(|pattern| log σ) - Space:
O(|pattern|)
Examples
mfm.contains("abcabcabcabc")
# True
Count total occurrences of a pattern across all documents.
Complexity
- Time:
O(|pattern| log σ) - Space:
O(|pattern|)
Examples
mfm.count_all("abc")
# 10
Count occurrences per document.
Returns {doc_index: count}.
Complexity
- Time:
O((|pattern| + |total_count|) log σ) - Space:
O(|pattern| + |output|)
Examples
mfm.count("abc")
# {0: 4, 1: 3, 2: 3}
Locate occurrences per document.
⚠️ Order is not guaranteed.
Complexity
- Time:
O((|pattern| + |total_count|) log σ) - Space:
O(|pattern| + |total_count|)
Examples
mfm.locate("abc")
# {0: [9, 6, 3, 0], 1: [10, 2, 5], 2: [8, 0, 5]}