Metadata-Version: 2.2
Name: atomgpt
Version: 2024.11.30
Summary: atomgpt
Home-page: https://github.com/usnistgov/atomgpt
Author: Kamal Choudhary
Author-email: kamal.choudhary@nist.gov
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE.rst
Requires-Dist: accelerate==0.31.0
Requires-Dist: aiohttp==3.9.5
Requires-Dist: aiosignal==1.3.1
Requires-Dist: alignn==2024.4.20
Requires-Dist: annotated-types==0.7.0
Requires-Dist: ase==3.23.0
Requires-Dist: async-timeout==4.0.3
Requires-Dist: attrs==23.2.0
Requires-Dist: autopep8==2.3.1
Requires-Dist: bitsandbytes==0.43.1
Requires-Dist: black==24.4.2
Requires-Dist: certifi==2024.6.2
Requires-Dist: cffi
Requires-Dist: chardet==3.0.4
Requires-Dist: charset-normalizer==3.3.2
Requires-Dist: click==8.1.7
Requires-Dist: contourpy==1.2.1
Requires-Dist: cycler==0.12.1
Requires-Dist: datasets==2.20.0
Requires-Dist: dgl==1.1.1
Requires-Dist: dill==0.3.8
Requires-Dist: docstring_parser==0.16
Requires-Dist: eval_type_backport==0.2.0
Requires-Dist: filelock
Requires-Dist: flake8==7.1.0
Requires-Dist: fonttools==4.53.0
Requires-Dist: frozenlist==1.4.1
Requires-Dist: fsspec==2024.5.0
Requires-Dist: gmpy2
Requires-Dist: huggingface-hub==0.23.4
Requires-Dist: idna==3.7
Requires-Dist: importlib_resources==6.4.0
Requires-Dist: jarvis-tools>=2024.4.30
Requires-Dist: Jinja2
Requires-Dist: joblib==1.4.2
Requires-Dist: kiwisolver==1.4.5
Requires-Dist: lmdb==1.4.1
Requires-Dist: markdown-it-py==3.0.0
Requires-Dist: MarkupSafe
Requires-Dist: matplotlib==3.9.0
Requires-Dist: mccabe==0.7.0
Requires-Dist: mdurl==0.1.2
Requires-Dist: mpmath
Requires-Dist: multidict==4.7.6
Requires-Dist: multiprocess==0.70.16
Requires-Dist: mypy-extensions==1.0.0
Requires-Dist: networkx
Requires-Dist: numpy==1.26.4
Requires-Dist: packaging==24.1
Requires-Dist: pandas==2.2.2
Requires-Dist: pathspec==0.12.1
Requires-Dist: peft==0.11.1
Requires-Dist: pillow==10.3.0
Requires-Dist: platformdirs==4.2.2
Requires-Dist: protobuf==3.20.3
Requires-Dist: psutil==6.0.0
Requires-Dist: pyarrow==16.1.0
Requires-Dist: pyarrow-hotfix==0.6
Requires-Dist: pycodestyle==2.12.0
Requires-Dist: pycparser
Requires-Dist: pydantic==2.7.4
Requires-Dist: pydantic-settings==2.3.3
Requires-Dist: pydantic_core==2.18.4
Requires-Dist: pydocstyle==6.3.0
Requires-Dist: pyflakes==3.2.0
Requires-Dist: Pygments==2.18.0
Requires-Dist: pyparsing==2.4.7
Requires-Dist: python-dateutil==2.9.0.post0
Requires-Dist: python-dotenv==1.0.1
Requires-Dist: pytz==2024.1
Requires-Dist: PyYAML
Requires-Dist: regex==2024.5.15
Requires-Dist: requests==2.32.3
Requires-Dist: rich==13.7.1
Requires-Dist: safetensors==0.4.3
Requires-Dist: scikit-learn==1.5.0
Requires-Dist: scipy==1.13.1
Requires-Dist: sentencepiece==0.2.0
Requires-Dist: shtab==1.7.1
Requires-Dist: six==1.16.0
Requires-Dist: snowballstemmer==2.2.0
Requires-Dist: spglib==2.4.0
Requires-Dist: sympy
Requires-Dist: threadpoolctl==3.5.0
Requires-Dist: tokenizers==0.19.1
Requires-Dist: tomli==2.0.1
Requires-Dist: toolz==0.12.1
Requires-Dist: torch==2.2.2
Requires-Dist: torchdata==0.7.1
Requires-Dist: tqdm==4.66.4
Requires-Dist: transformers==4.41.2
Requires-Dist: triton==2.2.0
Requires-Dist: trl==0.8.6
Requires-Dist: typing_extensions
Requires-Dist: tyro==0.8.4
Requires-Dist: tzdata==2024.1
Requires-Dist: urllib3==2.2.2
Requires-Dist: xformers==0.0.25.post1
Requires-Dist: xmltodict==0.13.0
Requires-Dist: xxhash==3.4.1
Requires-Dist: yarl==1.9.4
Requires-Dist: zipp==3.19.2
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# AtomGPT & DiffractGPT: atomistic generative pre-trained transformer for forward and inverse materials design

Large language models (LLMs) such as [ChatGPT](https://openai.com/chatgpt/) have shown immense potential for various commercial applications, but their applicability for materials design remains underexplored. In this work, AtomGPT is introduced as a model specifically developed for materials design based on transformer architectures, demonstrating capabilities for both atomistic property prediction and structure generation tasks. This study shows that a combination of chemical and structural text descriptions can efficiently predict material properties with accuracy comparable to graph neural network models, including formation energies, electronic bandgaps from two different methods, and superconducting transition temperatures. Furthermore, AtomGPT can generate atomic structures for tasks such as designing new superconductors, with the predictions validated through density functional theory calculations. This work paves the way for leveraging LLMs in forward and inverse materials design, offering an efficient approach to the discovery and optimization of materials.

![AtomGPT layer schematic](https://github.com/usnistgov/atomgpt/blob/main/atomgpt/data/schematic.jpeg)

Both forward and inverse models take a config.json file as an input. Such a config file provides basic training parameters, and an `id_prop.csv` file path similar to the ALIGNN (https://github.com/usnistgov/alignn) model. See an example here: [id_prop.csv](https://github.com/usnistgov/atomgpt/blob/develop/atomgpt/examples/forward_model/id_prop.csv). 

## Installation

First create a conda environment:
Install miniforge https://github.com/conda-forge/miniforge

For example: 

```
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
```

Based on your system requirements, you'll get a file something like 'Miniforge3-XYZ'.

```
bash Miniforge3-$(uname)-$(uname -m).sh
```

Now, make a conda environment:

```
conda create --name my_atomgpt python=3.10 -y
conda activate my_atomgpt
```

```
git clone https://github.com/usnistgov/atomgpt.git
cd atomgpt
pip install -e .
```

## Forward model example (structure to property)

Forwards model are used for developing surrogate models for atomic structure to property predictions. It requires text input which can be either the raw POSCAR type files or a text description of the material. After that, we can use Google-T5/ OpenAI GPT2 etc. models with customizing langauage head for accomplishing such a task. The description of a material is generated with [ChemNLP/describer](https://github.com/usnistgov/jarvis/blob/master/jarvis/core/atoms.py#L1567) function. If you turn [`convert`](https://github.com/usnistgov/atomgpt/blob/develop/atomgpt/forward_models/forward_models.py#L277) to `False`, you can also train on bare POSCAR files.

For training:

```
python atomgpt/forward_models/forward_models.py --config_name atomgpt/examples/forward_model/config.json
```

or use `atomgpt_forward_train` global executable.

For inference:


```
python atomgpt/forward_models/forward_predict.py --output_dir out --pred_csv atomgpt/examples/forward_model/pred_list_forward.csv
```

or use `atomgpt_forward_predict` global executable.


## Inverse model example (property to structure)

Inverse models are used for generating materials given property and description such as chemical formula. Currently, we use Mistral model, but other models such as Gemma, Lllama etc. can also be easily used. After the structure generation, we can optimize the structure with ALIGNN-FF model (example [here](https://colab.research.google.com/github/knc6/jarvis-tools-notebooks/blob/master/jarvis-tools-notebooks/ALIGNN_Structure_Relaxation_Phonons_Interface.ipynb) and then subject to density functional theory calculations for a few selected candidates using JARVIS-DFT or similar workflow (tutorial for example [here](https://pages.nist.gov/jarvis/tutorials/). Note that currently, the inversely model training as well as conference requires GPUs.

For training:

```
python atomgpt/inverse_models/inverse_models.py --config_name atomgpt/examples/inverse_model/config.json
```

or use `atomgpt_inverse_train` global executable.

For inference:

```
python atomgpt/inverse_models/inverse_predict.py --output_dir outputs/ --pred_csv "atomgpt/examples/inverse_model/pred_list_inverse.csv"
```

or use `atomgpt_inverse_predict` global executable.


## DiffractGPT model example (spectral property to structure)

Inverse models are also used for generating materials given spectra/multi value property such as X-ray diffraction and description such as chemical formula. 

For training:

```
python atomgpt/inverse_models/inverse_models.py --config_name atomgpt/examples/inverse_model_multi/config.json
```

For inference:

```
python atomgpt/inverse_models/inverse_predict.py --output_dir outputs_xrd --pred_csv atomgpt/examples/inverse_model_multi/pred_list_inverse.csv
```

or if you want to use the original model:

```
python atomgpt/inverse_models/inverse_predict.py --output_dir atomgpt/examples/inverse_model_multi --pred_csv atomgpt/examples/inverse_model_multi/pred_list_inverse.csv
```


Example inference only case:

Make a `tmp/pred_list.csv`

```
LaB6.dat
```
You can add multiple .dat file with 2theta, intentisty values in this csv file.

Then add a `tmp/config.json`

```
{
    "id_prop_path": "atomgpt/examples/inverse_model_multi/id_prop.csv",
    "prefix": "atomgpt_run",
    "model_name": "knc6/diffractgpt_mistral_chemical_formula",
    "batch_size": 2,
    "num_epochs": 2,
    "logging_steps": 1,
    "dataset_num_proc": 2,
    "seed_val": 3407,
    "learning_rate": 0.0002,
    "per_device_train_batch_size": 2,
    "gradient_accumulation_steps": 4,
    "num_train": 2,
    "num_val": 0,
    "num_test": 2,
    "model_save_path": "",
    "loss_type": "default",
    "optim": "adamw_8bit",
    "lr_scheduler_type": "linear",
    "output_dir": "outputs_xrd",
    "csv_out": "AI-AtomGen-prop-dft_3d-test-rmse.csv",
    "chem_info": "formula",
    "max_seq_length": 2048,
    "prop": "XRD",
    "dtype": null,
    "load_in_4bit": true,
    "instruction": "Below is a description of a material.",
    "alpaca_prompt": "### Instruction:\n{}\n### Input:\n{}\n### Output:\n{}",
    "output_prompt": " Generate atomic structure description with lattice lengths, angles, coordinates and atom types."
}

```

This data was generated with example script: `atomgpt/scripts/gen_data.py`

```
python atomgpt/inverse_models/inverse_predict.py --output_dir atomgpt/examples/inverse_model_multi/tmp  --pred_csv atomgpt/examples/inverse_model_multi/tmp/pred_list.csv
```


More detailed examples/case-studies would be added here soon.

# Google colab/Jupyter notebook


| Notebooks                                                                                                                                      | Google&nbsp;Colab                                                                                                                                        | Descriptions                                                                                                                                                                                                                                                                                                                                                                                              |
| ---------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [Forward Model training](https://colab.research.google.com/github/knc6/jarvis-tools-notebooks/blob/master/jarvis-tools-notebooks/atomgpt_forward_example.ipynb)                                                       | [![Open in Google Colab]](https://colab.research.google.com/github/knc6/jarvis-tools-notebooks/blob/master/jarvis-tools-notebooks/atomgpt_forward_example.ipynb)                                 | Example of forward model training for exfoliation energy.                                                                                                                                                                                                                                                                       |
| [Inverse Model training](https://colab.research.google.com/github/knc6/jarvis-tools-notebooks/blob/master/jarvis-tools-notebooks/atomgpt_example.ipynb)                                                       | [![Open in Google Colab]](https://colab.research.google.com/github/knc6/jarvis-tools-notebooks/blob/master/jarvis-tools-notebooks/atomgpt_example.ipynb)                                 | Example of installing AtomGPT, inverse model training for 5 sample materials, using the trained model for inference, relaxing structures with ALIGNN-FF, generating a database of atomic structures.                                                                                                                                                                                                                                                                       |
 [HuggingFace AtomGPT model inference](https://colab.research.google.com/github/knc6/jarvis-tools-notebooks/blob/master/jarvis-tools-notebooks/atomgpt_example_huggingface.ipynb)                                                  | [![Open in Google Colab]](https://colab.research.google.com/github/knc6/jarvis-tools-notebooks/blob/master/jarvis-tools-notebooks/atomgpt_example_huggingface.ipynb)                            | AtomGPT Structure Generation/Inference example with a model hosted on Huggingface.                                                                                                  | 
 [Inverse Model DiffractGPT inference](https://colab.research.google.com/github/knc6/jarvis-tools-notebooks/blob/master/jarvis-tools-notebooks/DiffractGPT_example.ipynb)                                                       | [![Open in Google Colab]](https://colab.research.google.com/github/knc6/jarvis-tools-notebooks/blob/master/jarvis-tools-notebooks/DiffractGPT_example.ipynb)                                 | Example of predicting crystal structure from X-ray diffraction data.                                                                                                                                                                                                                       |                                                                                                                                  |


[Open in Google Colab]: https://colab.research.google.com/assets/colab-badge.svg




For similar other notebook examples, see [JARVIS-Tools-Notebook Collection](https://github.com/JARVIS-Materials-Design/jarvis-tools-notebooks)

# HuggingFace link :hugs:

https://huggingface.co/knc6


# Referenes:

1. [AtomGPT: Atomistic Generative Pretrained Transformer for Forward and Inverse Materials Design](https://pubs.acs.org/doi/full/10.1021/acs.jpclett.4c01126)
2. [DiffractGPT: Atomic Structure Determination from X-ray Diffraction Patterns using Generative Pre-trained Transformer](https://pubs.acs.org/doi/10.1021/acs.jpclett.4c03137)
3. [ChemNLP: A Natural Language Processing based Library for Materials Chemistry Text Data](https://github.com/usnistgov/chemnlp)
4. [JARVIS-Leaderboard](https://pages.nist.gov/jarvis_leaderboard)
5. [NIST-JARVIS Infrastructure](https://jarvis.nist.gov/)
6. [Unsloth AI](https://github.com/unslothai/unsloth)
   


<a name="contrib"></a>
How to contribute
-----------------

For detailed instructions, please see [Contribution instructions](https://github.com/usnistgov/jarvis/blob/master/Contribution.rst)

<a name="corres"></a>
Correspondence
--------------------

Please report bugs as Github issues (https://github.com/usnistgov/atomgpt/issues) or email to kamal.choudhary@nist.gov.

<a name="fund"></a>
Funding support
--------------------

NIST-MGI (https://www.nist.gov/mgi) and CHIPS (https://www.nist.gov/chips)

Code of conduct
--------------------

Please see [Code of conduct](https://github.com/usnistgov/jarvis/blob/master/CODE_OF_CONDUCT.md)
