Metadata-Version: 2.1
Name: imml
Version: 0.1.0
Summary: A python package for multi-modal learning with incomplete data
Author-email: Alberto López <a.l.sanchez@medisin.uio.no>
Maintainer-email: Alberto López <a.l.sanchez@medisin.uio.no>
License: BSD 3-Clause License
        
        Copyright (c) 2025, Open source contributors.
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        * Redistributions of source code must retain the above copyright notice, this
          list of conditions and the following disclaimer.
        
        * Redistributions in binary form must reproduce the above copyright notice,
          this list of conditions and the following disclaimer in the documentation
          and/or other materials provided with the distribution.
        
        * Neither the name of the copyright holder nor the names of its
          contributors may be used to endorse or promote products derived from
          this software without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Project-URL: Documentation, https://imml.readthedocs.io
Project-URL: Source, https://github.com/ocbe-uio/imml
Project-URL: Download, https://github.com/ocbe-uio/imml/archive/refs/heads/main.zip
Project-URL: Tracker, https://github.com/ocbe-uio/imml/issues
Project-URL: Changelog, https://imml.readthedocs.io/stable/development/changelog.html
Keywords: multi-modal learning,machine learning,incomplete data,missing data
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX
Classifier: Operating System :: Unix
Classifier: Operating System :: MacOS
Classifier: Topic :: Software Development
Classifier: Topic :: Scientific/Engineering
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: scikit-learn>=1.4.1
Requires-Dist: pandas>=2.2.2
Requires-Dist: networkx>=2.5
Requires-Dist: h5py>=3.9.0
Requires-Dist: snfpy>=0
Requires-Dist: control>=0.10.1
Requires-Dist: cvxpy>=1.7.1
Requires-Dist: numba>=0.58.0
Provides-Extra: deep
Requires-Dist: lightning>=2.2; extra == "deep"
Requires-Dist: transformers>=4.51.3; extra == "deep"
Requires-Dist: torch-geometric>=2.6.1; extra == "deep"
Requires-Dist: torchvision>=0.19.1; extra == "deep"
Provides-Extra: docs
Requires-Dist: lightning>=2.2; extra == "docs"
Requires-Dist: transformers>=4.51.3; extra == "docs"
Requires-Dist: ipython>=8.20.0; extra == "docs"
Requires-Dist: datasets==3.6.0; extra == "docs"
Requires-Dist: rpy2>=3.5.14; extra == "docs"
Requires-Dist: sphinx>=7.0.0; extra == "docs"
Requires-Dist: sphinx-gallery>=0.17.1; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "docs"
Provides-Extra: matlab
Requires-Dist: oct2py>=5.8.0; extra == "matlab"
Provides-Extra: r
Requires-Dist: rpy2>=3.5.14; extra == "r"
Provides-Extra: tests
Requires-Dist: pytest>=7.1.2; extra == "tests"
Requires-Dist: pytest-cov>=2.9.0; extra == "tests"

![PyPI - Version](https://img.shields.io/pypi/v/imml)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/imml)
[![Read the Docs](https://img.shields.io/readthedocs/imml)](https://imml.readthedocs.io)
[![CI Tests](https://github.com/ocbe-uio/imml/actions/workflows/ci_test.yml/badge.svg)](https://github.com/ocbe-uio/imml/actions/workflows/ci_test.yml)
![Codecov](https://codecov.io/github/ocbe-uio/imml/graph/bundle/badge.svg)
[![CodeQL](https://github.com/ocbe-uio/imml/actions/workflows/github-code-scanning/codeql/badge.svg)](https://github.com/ocbe-uio/imml/actions/workflows/github-code-scanning/codeql)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](https://github.com/ocbe-uio/imml/pulls)
![GitHub repo size](https://img.shields.io/github/repo-size/ocbe-uio/imml)
[![GitHub License](https://img.shields.io/github/license/ocbe-uio/imml)](https://github.com/ocbe-uio/imml/blob/main/LICENSE)

[//]: # ([![DOI]&#40;&#41;]&#40;&#41;)
[//]: # ([![Paper]&#40;&#41;]&#40;&#41;)

<p align="center">
  <img alt="iMML Logo" src="https://raw.githubusercontent.com/ocbe-uio/imml/refs/heads/main/docs/figures/logo_imml.png">
</p>

Overview
====================

*iMML* is a Python package that provides a **robust tool-set for integrating, processing, and analyzing incomplete
multi-modal datasets** to support a wide range of machine learning tasks. Starting with a dataset containing N samples
with K modalities, *iMML* effectively handles missing data for **classification, clustering, data retrieval,
imputation and amputation, feature selection, feature extraction and data exploration**, hence enabling efficient
analysis of partially observed samples.

![Overview of iMML for multi-modal learning with incomplete data](https://raw.githubusercontent.com/ocbe-uio/imml/refs/heads/main/docs/figures/graph.png)
<p align="center"><strong>Overview of iMML for multi-modal learning with incomplete data.</strong></p>


Background
----------

Multi-modal learning, where diverse data types are integrated and analyzed together, has emerged as a critical field
in artificial intelligence. Multi-modal machine learning models that effectively integrate multiple data modalities
generally outperform their uni-modal counterparts by leveraging more comprehensive and complementary information.
However, **most algorithms in this field assume fully observed data**, an assumption that is often
unrealistic in real-world scenarios.

Motivation
----------

Learning from incomplete multi-modal data has seen an important growth last years.
Despite this progress, several limitations still persist.
The landscape of available methods is fragmented, largely due to the diversity of use cases and data modalities,
which complicates both their application and benchmarking.
Systematic use and comparison of the current methods are often hindered by practical challenges, such as
incompatible input data formats and conflicting software dependencies.
As a result, researchers and practitioners frequently face challenges in choosing a practical method and invest
considerable efforts into reconciling codebases, rather than addressing the core scientific questions.
This suggests that **the community currently lacks robust and standardized tools to effectively handle
incomplete multi-modal data**.

Key features
------------

To address this gap, we have developed *iMML*, a Python package designed for multi-modal learning with incomplete data.
The key features of this package are:

-   **Comprehensive toolkit**: *iMML* offers a broad set of tools for integrating, processing, and analyzing
    incomplete multi-modal datasets implemented as a single, user-friendly interface to facilitate adoption by
    a wide community of users.
    The package includes extensive technical testing to ensure robustness, and thorough documentation enables
    end-users to apply its functionality effectively.
-   **Accessible**: *iMML* makes the tools readily available to the Python community, simplifying their usage,
    comparison, and benchmarking, and thereby addresses the current lack of resources and standardized methods 
    for handling incomplete multi-modal data.
-   **Extensible**: *iMML* provides a common framework where researchers can contribute and
    integrate new approaches, serving as a community platform for hosting new algorithms and methods.


Installation
--------

Run the following command to install the most recent release of *iMML* using *pip*:

```bash
pip install imml
```

Or if you prefer *uv*, use:

```bash
uv pip install imml
```

Some features of *iMML* rely on optional dependencies. To enable these additional features, ensure you install 
the required packages as described in our documentation: https://imml.readthedocs.io/stable/main/installation.html.


Usage
--------

This package provides a user-friendly interface to apply these algorithms to user-provided data.
*iMML* was designed to be compatible with widely-used machine learning and data analysis tools, such as Pandas,
NumPy, Scikit-learn, and Lightning AI, hence allowing researchers to **apply machine learning models with
minimal programming effort**.
Moreover, it can be easily integrated into Scikit-learn pipelines for data preprocessing and modeling.

For this demonstration, we will generate a random dataset, that we have called ``Xs``, as a multi-modal dataset
to simulate a multi-modal scenario:

```python
import numpy as np
Xs = [np.random.random((10,5)) for i in range(3)] # or your multi-modal dataset
```

You can use any other complete or incomplete multi-modal dataset. Once you have your dataset ready, you can
leverage the *iMML* library for a wide range of machine learning tasks, such as:

- Decompose a multi-modal dataset using ``MOFA`` to capture joint information.

```python
from imml.decomposition import MOFA
transformed_Xs = MOFA().fit_transform(Xs)
```

- Cluster samples from a multi-modal dataset using ``NEMO`` to find hidden groups.

```python
from imml.cluster import NEMO
labels = NEMO().fit_predict(Xs)
```

- Simulate incomplete multi-modal datasets for evaluation and testing purposes using ``Amputer``.

```python
from imml.ampute import Amputer
transformed_Xs = Amputer(p=0.8).fit_transform(Xs)
```

Free software
-------------

*iMML* is free software; you can redistribute it and/or modify it under the terms of the `BSD 3-Clause License`.

Contribute
------------

**We welcome practitioners, researchers, and the open-source community** to contribute to the *iMML* project,
and in doing so, helping us extend and refine the library for the community. Such a community-wide effort will
make *iMML* more versatile, sustainable, powerful, and accessible to the machine learning community across
many domains.

Project roadmap
------------

Our vision is to establish *iMML* as a leading and reliable library for multi-modal learning across research and 
applied settings. Therefore, our priorities include to broaden algorithmic coverage, improve performance and 
scalability, strengthen interoperability, and grow a healthy contributor community.
