Metadata-Version: 2.1
Name: rxiv-types
Version: 0.1.0
Summary: Rxiv XML/JSON parsing and typehints.
Home-page: https://github.com/nicholas-schaub/rxiv-types
License: MIT
Author: Nick Schaub
Author-email: nick.schaub@nih.gov
Requires-Python: >=3.9,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Topic :: Text Processing :: Markup :: XML
Requires-Dist: xsdata-pydantic[cli,lxml,soap] (>=22.10,<23.0)
Project-URL: Issues, https://github.com/nicholas-schaub/rxiv-types/issues
Project-URL: Repository, https://github.com/nicholas-schaub/rxiv-types
Description-Content-Type: text/markdown

# rxiv-types (v0.1.0)

<p align="center">
    <img src="https://img.shields.io/pypi/dm/rxiv-types?style=flat-square" />
    <img src="https://img.shields.io/pypi/l/rxiv-types?style=flat-square"/>
    <img src="https://img.shields.io/pypi/v/rxiv-types?style=flat-square"/>
    <a href="https://github.com/tefra/xsdata-pydantic">
        <img alt="Built with: xsdata-pydantic" src="https://img.shields.io/badge/Built%20with-xsdata--pydantic-blue">
    </a>
    <a href="https://github.com/dbrgn/coverage-badge">
        <img src="./images/coverage.svg">
    </a>
</p>

## Introduction

A complete implementation of the XML/JSON schema for *Rxiv preprint servers. This
covers arXiv, medrXiv, biorXiv, chemrXiv, and DOAJ.

This package helps to parse XML/JSON data into Pydantic models. This validates
the input xml data and provides typehints for working with the complex XML structures
present in PubMed data.

## Why do I need this?

Parsing XML on its own is challenging. Add to it the feature rich data inside of each
citation, and you will find yourself with hours or days of navigating the XML structure.

The approach here was to autogenerate Pydantic classes to parse the XML using the
`xsdata-pydantic` tool. This approach has the benefit of making sure every piece of data
is parsed properly, and an error is thrown if something is missing or incorrect. Instead
of using dictionaries to hold the data, Pydantic classes have the benefit of providing
type hints with tab completion for IDEs, making it easier to navigate the complex
structure of the citation data.

## How do I use it?

It is possible to use `xsdata-pydantic` and the autogenerated classes directly to parse
an XML file, but we provide a convenience function to easily open PubMed XMl citations
and PMC open access articles.

### Example 1: Parse ChemRxiv Data

```python
from pathlib import Path

import requests

from rxiv_types import chemrxiv_records

chemrxiv_url = "https://chemrxiv.org/engage/chemrxiv/public-api/v1/oai?verb=ListRecords&metadataPrefix=oai_dc&from=2000-01-01"

# 1. Get some chemrxiv data from the API
result = requests.get(chemrxiv_url)
destination = Path(f"downloads/data/chemrxiv.xml")
destination.parent.mkdir(parents=True, exist_ok=True)
with open(destination, "wb") as fw:
    fw.write(result.content)

# 2. Parse the data, and display the first article title
result = chemrxiv_records(destination)

# 3. Print some information about the first record
print("Paper 1:")
print(f"Title: {''.join(result.list_records.record[0].metadata.dc.title)}")
print(f"Authors: {'; '.join(result.list_records.record[0].metadata.dc.creator)}")
print(f"Abstract: {''.join(result.list_records.record[0].metadata.dc.description)}")
```

Output:

```bash
Paper 1:
Title: Excitonics: A universal set of binary gates for molecular exciton processing and signaling
Authors: Nicolas, Sawaya; Dmitrij, Rappoport; Daniel, Tabor; Alan, Aspuru-Guzik
Abstract: The ability to regulate energy transfer pathways through materials is an
 important goal of nanotechnology, as a greater degree of control is 
crucial for developing sensing, solar energy, and bioimaging 
applications. Such control necessitates a toolbox of actuation methods 
that can direct energy transfer based on user input. Here we propose a 
novel molecular exciton gate, analogous to a traditional transistor, for
 controlling exciton migration in chromophoric systems. The gate may be 
activated with an input of light or an input flow of excitons. Unlike 
previous gates and switches that control exciton transfer, our proposal 
does not require isomerization or molecular rearrangement, instead 
relying on excitation migration via the second singlet (S2) state of the
 gate molecule--hence the system is named an "S2 exciton gate." After 
presenting a set of system properties required for proper function of 
the S2 exciton gate, we show how one would overcome the two possible 
challenges: short-lived excited states and suppression of false 
positives. Precision and error rates are studied computationally in a 
model system with respect to excited-state decay rates and variations in
 molecular orientation. Finally, we demonstrate that the S2 exciton gate
 gate can be used to produce binary logical AND, OR, and NOT operations,
 providing a universal excitonic computation platform with a range of 
potential applications, including e.g. in signal processing for 
microscopy.
```

## FAQ

### Why are the return structures so complicated?

The return structures are a direct reflection of the XML format defined by OAI and any
customizations from the hosting preprint servers. In the future some utility classes
might be made for common components (title, authors, etc), but for now this is intended
to be an unbiased way of parsing the XML.

