Metadata-Version: 2.1
Name: sequentia
Version: 2.0.0
Summary: Scikit-Learn compatible HMM and DTW based sequence machine learning algorithms in Python.
Home-page: https://github.com/eonu/sequentia
License: MIT
Keywords: python,machine-learning,time-series,hmm,hidden-markov-models,dtw,dynamic-time-warping,knn,k-nearest-neighbors,sequence-classification,time-series-classification,multivariate-time-series,variable-length,classification-algorithms
Author: Edwin Onuonga
Author-email: ed@eonu.net
Maintainer: Edwin Onuonga
Maintainer-email: ed@eonu.net
Requires-Python: >=3.11,<4.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Framework :: Pydantic :: 2
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Dist: dtaidistance (>=2.3.10,<3.0.0)
Requires-Dist: hmmlearn (>=0.2.8,<1)
Requires-Dist: joblib (>=1.2,<2.0)
Requires-Dist: numba (>=0.56,<1)
Requires-Dist: numpy (>=1.19.5,<2.0.0)
Requires-Dist: pydantic (>=2,<3)
Requires-Dist: scikit-learn (>=1.4,<2.0)
Requires-Dist: scipy (>=1.6,<2.0)
Project-URL: Documentation, https://sequentia.readthedocs.io/en/latest
Project-URL: Repository, https://github.com/eonu/sequentia
Description-Content-Type: text/markdown

<p align="center">
  <h1 align="center">
    <img src="https://raw.githubusercontent.com/eonu/sequentia/master/docs/source/_static/images/logo.png" width="75px"><br/>
    Sequentia
  </h1>
</p>

<p align="center">
  <em>Scikit-Learn compatible HMM and DTW based sequence machine learning algorithms in Python.</em>
</p>

<p align="center">
  <div align="center">
    <a href="https://pypi.org/project/sequentia">
      <img src="https://img.shields.io/pypi/v/sequentia?logo=pypi&style=flat-square" alt="PyPI"/>
    </a>
    <a href="https://pypi.org/project/sequentia">
      <img src="https://img.shields.io/pypi/pyversions/sequentia?logo=python&style=flat-square" alt="PyPI - Python Version"/>
    </a>
    <a href="https://sequentia.readthedocs.io/en/latest">
      <img src="https://img.shields.io/readthedocs/sequentia.svg?logo=read-the-docs&style=flat-square" alt="Read The Docs - Documentation">
    </a>
    <a href="https://coveralls.io/github/eonu/sequentia">
      <img src="https://img.shields.io/coverallsCoverage/github/eonu/sequentia?logo=coveralls&style=flat-square" alt="Coveralls - Coverage"/>
    </a>
    <a href="https://raw.githubusercontent.com/eonu/sequentia/master/LICENSE">
      <img src="https://img.shields.io/pypi/l/sequentia?style=flat-square" alt="PyPI - License"/>
    </a>
  </div>
</p>

<p align="center">
  <sup>
    <a href="#about">About</a> ·
    <a href="#build-status">Build Status</a> ·
    <a href="#features">Features</a> ·
    <a href="#documentation">Documentation</a> ·
    <a href="#examples">Examples</a> ·
    <a href="#acknowledgments">Acknowledgments</a> ·
    <a href="#references">References</a> ·
    <a href="#contributors">Contributors</a> ·
    <a href="#licensing">Licensing</a>
  </sup>
</p>

## About

Sequentia is a Python package that provides various classification and regression algorithms for sequential data, including methods based on hidden Markov models and dynamic time warping.

Some examples of how Sequentia can be used on sequence data include:

- determining a spoken word based on its audio signal or alternative representations such as MFCCs,
- predicting motion intent for gesture control from sEMG signals,
- classifying hand-written characters according to their pen-tip trajectories.

### Why Sequentia?

- **Simplicity and interpretability**: Sequentia offers a limited set of machine learning algorithms, chosen specifically to be more interpretable and easier to configure than more complex alternatives such as recurrent neural networks and transformers, while maintaining a high level of effectiveness.
- **Familiar and user-friendly**: To fit more seamlessly into the workflow of data science practitioners, Sequentia follows the ubiquitous Scikit-Learn API, providing a familiar model development process for many, as well as enabling wider access to the rapidly growing Scikit-Learn ecosystem.

## Build Status

| `master`                                                                                                                                                                                                 | `dev`                                                                                                                                                                                                      |
| -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [![CircleCI Build (Master)](https://img.shields.io/circleci/build/github/eonu/sequentia/master?logo=circleci&style=flat-square)](https://app.circleci.com/pipelines/github/eonu/sequentia?branch=master) | [![CircleCI Build (Development)](https://img.shields.io/circleci/build/github/eonu/sequentia/dev?logo=circleci&style=flat-square)](https://app.circleci.com/pipelines/github/eonu/sequentia?branch=master) |

## Features

### Models

The following models provided by Sequentia all support variable length sequences.

#### [Dynamic Time Warping + k-Nearest Neighbors](https://sequentia.readthedocs.io/en/latest/sections/models/knn/index.html) (via [`dtaidistance`](https://github.com/wannesm/dtaidistance))

- [x] Classification
- [x] Regression
- [x] Multivariate real-valued observations
- [x] Sakoe–Chiba band global warping constraint
- [x] Dependent and independent feature warping (DTWD/DTWI)
- [x] Custom distance-weighted predictions
- [x] Multi-processed predictions

#### [Hidden Markov Models](https://sequentia.readthedocs.io/en/latest/sections/models/hmm/index.html) (via [`hmmlearn`](https://github.com/hmmlearn/hmmlearn))

Parameter estimation with the Baum-Welch algorithm and prediction with the forward algorithm [[1]](#references)

- [x] Classification
- [x] Multivariate real-valued observations (Gaussian mixture model emissions)
- [x] Univariate categorical observations (discrete emissions)
- [x] Linear, left-right and ergodic topologies
- [x] Multi-processed predictions

### Scikit-Learn compatibility

**Sequentia (≥2.0) is fully compatible with the Scikit-Learn API (≥1.4), enabling for rapid development and prototyping of sequential models.**

In most cases, the only necessary change is to add a `lengths` key-word argument to provide sequence length information, e.g. `fit(X, y, lengths=lengths)` instead of `fit(X, y)`.

## Installation

The latest stable version of Sequentia can be installed with the following command:

```console
pip install sequentia
```

### C library compilation

For optimal performance when using any of the k-NN based models, it is important that `dtaidistance` C libraries are compiled correctly.

Please see the [`dtaidistance` installation guide](https://dtaidistance.readthedocs.io/en/latest/usage/installation.html) for troubleshooting if you run into C compilation issues, or if setting `use_c=True` on k-NN based models results in a warning.

You can use the following to check if the appropriate C libraries have been installed.

```python
from dtaidistance import dtw
dtw.try_import_c()
```

### Development

Please see the [contribution guidelines](/CONTRIBUTING.md) to see installation instructions for contributing to Sequentia.

## Documentation

Documentation for the package is available on [Read The Docs](https://sequentia.readthedocs.io/en/latest).

## Examples

Demonstration of classifying multivariate sequences with two features into two classes using the `KNNClassifier`.

This example also shows a typical preprocessing workflow, as well as compatibility with Scikit-Learn.

```python
import numpy as np

from sklearn.preprocessing import scale
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline

from sequentia.models import KNNClassifier
from sequentia.preprocessing import IndependentFunctionTransformer, median_filter

# Create input data
# - Sequentia expects sequences to be concatenated into a single array
# - Sequence lengths are provided separately and used to decode the sequences when needed
# - This avoids the need for complex structures such as lists of arrays with different lengths

# Sequences
X = np.array([
    # Sequence 1 - Length 3
    [1.2 , 7.91],
    [1.34, 6.6 ],
    [0.92, 8.08],
    # Sequence 2 - Length 5
    [2.11, 6.97],
    [1.83, 7.06],
    [1.54, 5.98],
    [0.86, 6.37],
    [1.21, 5.8 ],
    # Sequence 3 - Length 2
    [1.7 , 6.22],
    [2.01, 5.49],
])

# Sequence lengths
lengths = np.array([3, 5, 2])

# Sequence classes
y = np.array([0, 1, 1])

# Create a transformation pipeline that feeds into a KNNClassifier
# 1. Individually denoise each sequence by applying a median filter for each feature
# 2. Individually standardize each sequence by subtracting the mean and dividing the s.d. for each feature
# 3. Reduce the dimensionality of the data to a single feature by using PCA
# 4. Pass the resulting transformed data into a KNNClassifier
pipeline = Pipeline([
    ('denoise', IndependentFunctionTransformer(median_filter)),
    ('scale', IndependentFunctionTransformer(scale)),
    ('pca', PCA(n_components=1)),
    ('knn', KNNClassifier(k=1))
])

# Fit the pipeline to the data - lengths must be provided
pipeline.fit(X, y, lengths=lengths)

# Predict classes for the sequences and calculate accuracy - lengths must be provided
y_pred = pipeline.predict(X, lengths=lengths)
acc = pipeline.score(X, y, lengths=lengths)
```

## Acknowledgments

In earlier versions of the package, an approximate DTW implementation [`fastdtw`](https://github.com/slaypni/fastdtw) was used in hopes of speeding up k-NN predictions, as the authors of the original FastDTW paper [[2]](#references) claim that approximated DTW alignments can be computed in linear memory and time, compared to the O(N<sup>2</sup>) runtime complexity of the usual exact DTW implementation.

I was contacted by [Prof. Eamonn Keogh](https://www.cs.ucr.edu/~eamonn/) whose work makes the surprising revelation that FastDTW is generally slower than the exact DTW algorithm that it approximates [[3]](#references). Upon switching from the `fastdtw` package to [`dtaidistance`](https://github.com/wannesm/dtaidistance) (a very solid implementation of exact DTW with fast pure C compiled functions), DTW k-NN prediction times were indeed reduced drastically.

I would like to thank Prof. Eamonn Keogh for directly reaching out to me regarding this finding.

## References

<table>
  <tbody>
    <tr>
      <td>[1]</td>
      <td>
        <a href=https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf">Lawrence R. Rabiner. <b>"A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition"</b> <em>Proceedings of the IEEE 77 (1989)</em>, no. 2, 257-86.</a>
      </td>
    </tr>
    <tr>
      <td>[2]</td>
      <td>
        <a href="https://pdfs.semanticscholar.org/05a2/0cde15e172fc82f32774dd0cf4fe5827cad2.pdf">Stan Salvador & Philip Chan. <b>"FastDTW: Toward accurate dynamic time warping in linear time and space."</b> <em>Intelligent Data Analysis 11.5 (2007)</em>, 561-580.</a>
      </td>
    </tr>
    <tr>
      <td>[3]</td>
      <td>
        <a href="https://arxiv.org/ftp/arxiv/papers/2003/2003.11246.pdf">Renjie Wu & Eamonn J. Keogh. <b>"FastDTW is approximate and Generally Slower than the Algorithm it Approximates"</b> <em>IEEE Transactions on Knowledge and Data Engineering (2020)</em>, 1–1.</a>
      </td>
    </tr>
  </tbody>
</table>

## Contributors

All contributions to this repository are greatly appreciated. Contribution guidelines can be found [here](/CONTRIBUTING.md).

<table>
	<thead>
		<tr>
			<th align="center">
        <a href="https://github.com/eonu">
          <img src="https://avatars0.githubusercontent.com/u/24795571?s=460&v=4" alt="eonu" width="60px">
          <br/><sub><b>eonu</b></sub>
        </a>
			</th>
      <th align="center">
        <a href="https://github.com/Prhmma">
          <img src="https://avatars0.githubusercontent.com/u/16954887?s=460&v=4" alt="Prhmma" width="60px">
          <br/><sub><b>Prhmma</b></sub>
        </a>
			</th>
      <th align="center">
        <a href="https://github.com/manisci">
          <img src="https://avatars.githubusercontent.com/u/30268711?v=4" alt="manisci" width="60px">
          <br/><sub><b>manisci</b></sub>
        </a>
      </th>
      <th align="center">
        <a href="https://github.com/jonnor">
          <img src="https://avatars.githubusercontent.com/u/45185?v=4" alt="jonnor" width="60px">
          <br/><sub><b>jonnor</b></sub>
        </a>
      </th>
			<!-- Add more <th></th> blocks for more contributors -->
		</tr>
	</thead>
</table>

## Licensing

Sequentia is released under the [MIT](https://opensource.org/licenses/MIT) license.

Certain parts of the source code are heavily adapted from [Scikit-Learn](scikit-learn.org/).
Such files contain a copy of [their license](https://github.com/scikit-learn/scikit-learn/blob/main/COPYING).

---

<p align="center">
  <b>Sequentia</b> &copy; 2019-2025, Edwin Onuonga - Released under the <a href="https://opensource.org/licenses/MIT">MIT</a> license.<br/>
  <em>Authored and maintained by Edwin Onuonga.</em>
</p>

