Metadata-Version: 2.1
Name: pubmed-mapper
Version: 0.1.1
Summary: PubMed Mapper: A Python library that map PubMed XML to Python object
Home-page: https://github.com/soultoolman/pubmed-mapper
Author: soultoolman
Author-email: soultooman@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.6.0
Description-Content-Type: text/markdown
Requires-Dist: lxml
Requires-Dist: click
Requires-Dist: rich


# pubmed-mapper: A Python Library that map PubMed XML to Python object

[中文文档](https://zhuanlan.zhihu.com/p/357273904)

## 1. Philosophy

[view UML](pubmed-mapper.png)

Programmatically access PubMed article is a common task for me.
Luckily, with the help of [eutils](https://www.ncbi.nlm.nih.gov/books/NBK25500/),
we can access full article data in XML format.
What I need is Python objects, not just XML strings, so pubmed-mapper was born.

## 2. Installation

```shell
pip install pubmed-mapper
```

## 3. Usage

### 3.1 use as library

#### 3.1.1 parse a PubMed ID

```python
from pubmed_mapper import Article


article = Article.parse_pmid('32329900')

# PubMed ID
print(article.pmid)  # 32329900

# ids
print(article.ids)  # [pubmed: 32329900, doi: 10.1111/jgs.16467]
print(article.ids[1].id_type)  # doi
print(article.ids[1].id_value)  # 10.1111/jgs.16467

# title
print(article.title)  # Associations of Coffee...

# abstract
print(article.abstract)  # <p><strong>Background: </strong>Coffee and tea...

# keywords
print(article.keywords)  # ['aging', 'coffee; diet; longevity', 'tea']

# MeSH headings
print(article.mesh_headings)  # ['Aged', 'Body Mass Index', '...']

# authors
print(article.authors)  # [hadyab AH Aladdin H, Manson JE JoAnn E, ...]
print(article.authors[0].last_name)  # Shadyab
print(article.authors[0].forename)  # Aladdin H
print(article.authors[0].initials)  # AH
print(article.authors[0].affiliation)  # Department of Family...

# journal
print(article.journal)  # Journal of the American Geriatrics Society
print(article.journal.issn)  # 1532-5415
print(article.journal.issn_type)  # Electronic
print(article.journal.title)  # Journal of the American Geriatrics Society
print(article.journal.abbr)  # J Am Geriatr Soc

# volume
print(article.volume)  # 68

# issue
print(article.issue)  # 9

# references
print(article.references)  # [n. 2013;129:643-659....]
print(article.references[0].citation)  # Lotfield E, Freedman ND...
print(article.references[0].ids)  # []

# pubdate
print(article.pubdate)  # 2020-09-01
```

#### 3.1.2 parse a downloaded XML file

```python
from lxml import etree
from pubmed_mapper import Article


infile = 'xxx.xml'
with open(infile) as fp:
    root = etree.parse(fp)


articles = []
for pubmed_article_element in root.xpath('/PubmedArticleSet/PubmedArticle'):
    article =  Article.parse_element(pubmed_article_element)
    articles.append(article)
```

### 3.2 use as command line software

#### 3.2.1 parse PubMed ID

```
pubmed-mapper pmid -p 32329900
```

#### 3.2.2 parse single PubMed XML file

```
pubmed-mapper file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl
```

#### 3.2.3 parse a directory who contains multiple PubMed XML files

```
pubmed-mapper directory -i data/ -o output/pubmed-mapper.jl
```

## 4. FAQs

### 4.1 There many types of PubMed article publication date, how do you convert it to datetime.date object?

Parse publication date is a hard work, until now pubmed-mapper can't parse all types of them.
The types pubmed-mapper can be parsed and the parsed value are:

| type                  | value      |
|:----------------------|:-----------|
| 2021-03-13            | 2021-03-13 |
| 2021-03               | 2021-03-01 |
| 2021 Spring           | 2021-04-01 |
| 2021                  | 2021-01-01 |
| 2021 Jan-Feb          | 2021-01-01 |
| 2021 Mar 13-15        | 2021-03-13 |
| 2021 Mar-2022 Jan     | 2021-03-01 |
| 2021-2022             | 2021-01-01 |
| 2021 Mar 13-Dec 15    | 2021-03-13 |
| 1976-1977 Winter      | 1976-01-01 |
| 1977-1978 Fall-Winter | 1977-10-01 |

#### 4.2 What is pubmed-mapper.log generated by pubmed-mapper?

pubmed-mapper.log is the default log file generate by pubmed-mapper,
you can change the file by using *--log-file* options:

```
pubmed-mapper --log-file my-custom.log file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl
```

You can go to this log file to find out more parsing details.

#### 4.3 I want log detail message in my log file?

Using *--log-level* can log more detail message:

```
pubmed-mapper --log-file my-custom.log --log-level DEBUG file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl
```


