Metadata-Version: 2.4
Name: foundry_ml
Version: 1.2.0
Summary: Package to support simplified application of machine learning models to datasets in materials science
Home-page: https://github.com/MLMI2-CSSI/foundry
Author: Aristana Scourtas, KJ Schmidt, Isaac Darling, Aadit Ambadkar, Braeden Cullen,
            Imogen Foster, Ribhav Bose, Zoa Katok, Ethan Truelove, Ian Foster, Ben Blaiszik
Author-email: blaiszik@uchicago.edu
License: MIT License
Keywords: materials science,machine learning,datasets,MCP,AI agents
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mdf_toolbox>=0.6.0
Requires-Dist: globus-sdk<4,>=3
Requires-Dist: dlhub_sdk>=1.0.0
Requires-Dist: numpy>=1.15.4
Requires-Dist: pandas>=0.23.4
Requires-Dist: pydantic>=2.7.2
Requires-Dist: mdf_connect_client>=0.5.0
Requires-Dist: h5py>=2.10.0
Requires-Dist: json2table
Requires-Dist: openpyxl>=3.1.0
Requires-Dist: typer[all]>=0.9.0
Requires-Dist: rich>=13.0.0
Provides-Extra: huggingface
Requires-Dist: datasets>=2.14.0; extra == "huggingface"
Requires-Dist: huggingface_hub>=0.17.0; extra == "huggingface"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary


<picture>
  <source srcset="https://raw.githubusercontent.com/MLMI2-CSSI/foundry/main/assets/foundry-white.png" height=175" media="(prefers-color-scheme: dark)">
  <img src="https://raw.githubusercontent.com/MLMI2-CSSI/foundry/main/assets/foundry-black.png" height="175">
</picture>

[![PyPI](https://img.shields.io/pypi/v/foundry_ml.svg)](https://pypi.python.org/pypi/foundry_ml)
[![Tests](https://github.com/MLMI2-CSSI/foundry/actions/workflows/tests.yml/badge.svg)](https://github.com/MLMI2-CSSI/foundry/actions/workflows/tests.yml)
[![NSF-1931306](https://img.shields.io/badge/NSF-1931306-blue)](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1931306&HistoricalAwards=false)
[<img src="https://img.shields.io/badge/view-documentation-blue">](https://ai-materials-and-chemistry.gitbook.io/foundry/)

**Foundry-ML** simplifies access to machine learning-ready datasets in materials science and chemistry.

- **Search & Load** - Find and use curated datasets with a few lines of code
- **Understand** - Rich schemas describe what each field means
- **Cite** - Automatic citation generation for publications
- **Publish** - Share your datasets with the community
- **AI-Ready** - MCP server for Claude and other AI assistants

## Quick Start

```bash
pip install foundry-ml
```

```python
from foundry import Foundry

# Connect
f = Foundry()

# Search
results = f.search("band gap", limit=5)

# Load
dataset = results.iloc[0].FoundryDataset
X, y = dataset.get_as_dict()['train']

# Understand
schema = dataset.get_schema()
print(schema['fields'])

# Cite
print(dataset.get_citation())
```

## Cloud Environments

For Google Colab or remote Jupyter:

```python
f = Foundry(no_browser=True, no_local_server=True)
```

## CLI

```bash
foundry search "band gap"
foundry schema 10.18126/abc123
foundry --help
```

## AI Agent Integration

```bash
foundry mcp install  # Add to Claude Code
```

## Documentation

- [Getting Started](https://ai-materials-and-chemistry.gitbook.io/foundry/quickstart)
- [User Guide](https://ai-materials-and-chemistry.gitbook.io/foundry/)
- [API Reference](https://ai-materials-and-chemistry.gitbook.io/foundry/api/foundry)
- [Examples](./examples)

## Features

| Feature | Description |
|---------|-------------|
| Search | Find datasets by keyword, DOI, or browse catalog |
| Load | Automatic download, caching, and format conversion |
| PyTorch/TensorFlow | `dataset.get_as_torch()`, `dataset.get_as_tensorflow()` |
| CLI | Terminal-based workflows |
| MCP Server | AI assistant integration |
| HuggingFace Export | Publish to HuggingFace Hub |

## Available Datasets

Browse datasets at [Foundry-ML.org](https://foundry-ml.org/) or:

```python
f = Foundry()
f.list(limit=20)  # See available datasets
```

## How to Cite

If you use Foundry-ML, please cite:

```bibtex
@article{Schmidt2024,
  doi = {10.21105/joss.05467},
  year = {2024},
  publisher = {The Open Journal},
  volume = {9},
  number = {93},
  pages = {5467},
  author = {Kj Schmidt and Aristana Scourtas and Logan Ward and others},
  title = {Foundry-ML - Software and Services to Simplify Access to Machine Learning Datasets in Materials Science},
  journal = {Journal of Open Source Software}
}
```

## Contributing

Foundry is open source. To contribute:

1. Fork from `main`
2. Make your changes
3. Open a Pull Request

See [CONTRIBUTING.md](docs/how-to-contribute/contributing.md) for details.

## Support

This work was supported by the National Science Foundation under NSF Award Number: 1931306 "Collaborative Research: Framework: Machine Learning Materials Innovation Infrastructure".

Foundry integrates with [Materials Data Facility](https://materialsdatafacility.org), [DLHub](https://www.dlhub.org), and [MAST-ML](https://mastmldocs.readthedocs.io/).
