Metadata-Version: 2.1
Name: pdbx2df
Version: 0.2.2
Summary: A python package to parse PDBx file into Pandas DataFrames.
Home-page: https://github.com/Ruibin-Liu/pdbx2df
Author: Ruibin Liu
Author-email: ruibinliuphd@gmail.com
Project-URL: Bug Tracker, https://github.com/Ruibin-Liu/pdbx2df/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas (>=1.0)
Provides-Extra: testing
Requires-Dist: tox (>=3.24) ; extra == 'testing'
Requires-Dist: pytest (>=7.0) ; extra == 'testing'
Requires-Dist: pytest-cov (>=3.0) ; extra == 'testing'

# pdbx2df

Parse a PDBx file (mmCIF file: pdb_id.cif) into a python dict with PDBx category names as keys and contents belonging to the category as the corresponding values. Each category content is parsed as a Pandas DataFrame whose columns are the attribute names.

## Requirements

. Pandas (>=1.0)

## Install

```
pip install pdbx2df
```

## Usage examples

1. If you want to read the 3D coordinates for PDB `1vii` into a Pandas DataFrame, and you have downloaded the `1vii.cif` file to your current working directory `./`, you can:

```python
from pdbx2df import read_pdbx
pdbx_file = './1vii.cif'
pdbx = read_pdbx(pdbx_file, category_names=['_atom_site'])
atoms_df = pdbx['_atom_site']
# 'atoms_df' is a Pandas DataFrame containing the '_atom_site' category which has the detailed 3D coordinates for each atom.
```

2. If you want to read the FASTA sequence of `1vii`, you can:

```python
from pdbx2df import read_pdbx
pdbx_file = './1vii.cif'
pdbx = read_pdbx(pdbx_file, category_names=['_entity_poly'])
fasta_df = pdbx['_entity_poly']
fasta = fasta_df['pdbx_seq_one_letter_code_can'].to_list()[0]  # 1vii only has one sequence
# fasta == 'MLSDEDFKAVFGMTRSAFANLPLWKQQNLKKEKGLF'
```

3. You can read them simutanously:

```python
from pdbx2df import read_pdbx
pdbx_file = './1vii.cif'
pdbx = read_pdbx(pdbx_file, category_names=['_entity_poly', '_atom_site'])
atoms_df = pdbx['_atom_site']
fasta_df = pdbx['_entity_poly']
```

Putting a list of category names to `category_names`, you will get them if they are in the PDBx file.

4. You can parse the whole file by using 'all':

```python
from pdbx2df import read_pdbx
pdbx_file = './1vii.cif'
pdbx = read_pdbx(pdbx_file, category_names=['all'])
atoms_df = pdbx['_atom_site']
fasta_df = pdbx['_entity_poly']
# and more
```
