Metadata-Version: 2.1
Name: pysradb
Version: 2.2.0
Summary: A Python package for interacting with SRAdb and downloading datasets from SRA/ENA/GEO
Home-page: https://github.com/saketkc/pysradb
Author: Saket Choudhary
Author-email: saketkc@gmail.com
License: BSD license
Keywords: pysradb
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Healthcare Industry
Classifier: License :: OSI Approved :: BSD License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.7
License-File: LICENSE
License-File: AUTHORS.md
Requires-Dist: lxml (>=4.6.3)
Requires-Dist: pandas (>=1.3.2)
Requires-Dist: requests (>=2.26.0)
Requires-Dist: requests-ftp (>=0.3.1)
Requires-Dist: tqdm (>=4.62.1)
Requires-Dist: xmltodict (>=0.12.0)

# A Python package for retrieving metadata from SRA/ENA/GEO

[![image](https://img.shields.io/pypi/v/pysradb.svg?style=flat-square)](https://pypi.python.org/pypi/pysradb)
[![image](https://anaconda.org/bioconda/pysradb/badges/version.svg)](https://anaconda.org/bioconda/pysradb/badges/version.svg)
[![image](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square)](http://bioconda.github.io/recipes/pysradb/README.html)
[![image](https://static.pepy.tech/personalized-badge/pysradb?period=month&units=international_system&left_color=black&right_color=brightgreen&left_text=Downloads/month)](https://pepy.tech/project/pysradb)
[![image](https://anaconda.org/bioconda/pysradb/badges/downloads.svg)](https://anaconda.org/bioconda/pysradb)
[![image](https://zenodo.org/badge/159590788.svg)](https://zenodo.org/badge/latestdoi/159590788)
[![image](https://github.com/saketkc/pysradb/workflows/push/badge.svg)](https://github.com/saketkc/pysradb/actions)

## Documentation

<https://saketkc.github.io/pysradb>

## CLI Usage

`pysradb` supports command line usage. See
[CLI](https://saket-choudhary.me/pysradb/cmdline.html) instructions or
[quickstart
guide](https://www.saket-choudhary.me/pysradb/quickstart.html).

    $ pysradb
     usage: pysradb [-h] [--version] [--citation]
                    {metadata,download,search,gse-to-gsm,gse-to-srp,gsm-to-gse,gsm-to-srp,gsm-to-srr,gsm-to-srs,gsm-to-srx,srp-to-gse,srp-to-srr,srp-to-srs,srp-to-srx,srr-to-gsm,srr-to-srp,srr-to-srs,srr-to-srx,srs-to-gsm,srs-to-srx,srx-to-srp,srx-to-srr,srx-to-srs}
                    ...

     pysradb: Query NGS metadata and data from NCBI Sequence Read Archive.
     version: 2.0.1
     Citation: 10.12688/f1000research.18676.1

     optional arguments:
       -h, --help            show this help message and exit
       --version             show program's version number and exit
       --citation            how to cite

     subcommands:
       {metadata,download,search,gse-to-gsm,gse-to-srp,gsm-to-gse,gsm-to-srp,gsm-to-srr,gsm-to-srs,gsm-to-srx,srp-to-gse,srp-to-srr,srp-to-srs,srp-to-srx,srr-to-gsm,srr-to-srp,srr-to-srs,srr-to-srx,srs-to-gsm,srs-to-srx,srx-to-srp,srx-to-srr,srx-to-srs}
         metadata            Fetch metadata for SRA project (SRPnnnn)
         download            Download SRA project (SRPnnnn)
         search              Search SRA for matching text
         gse-to-gsm          Get GSM for a GSE
         gse-to-srp          Get SRP for a GSE
         gsm-to-gse          Get GSE for a GSM
         gsm-to-srp          Get SRP for a GSM
         gsm-to-srr          Get SRR for a GSM
         gsm-to-srs          Get SRS for a GSM
         gsm-to-srx          Get SRX for a GSM
         srp-to-gse          Get GSE for a SRP
         srp-to-srr          Get SRR for a SRP
         srp-to-srs          Get SRS for a SRP
         srp-to-srx          Get SRX for a SRP
         srr-to-gsm          Get GSM for a SRR
         srr-to-srp          Get SRP for a SRR
         srr-to-srs          Get SRS for a SRR
         srr-to-srx          Get SRX for a SRR
         srs-to-gsm          Get GSM for a SRS
         srs-to-srx          Get SRX for a SRS
         srx-to-srp          Get SRP for a SRX
         srx-to-srr          Get SRR for a SRX
         srx-to-srs          Get SRS for a SRX

## Quickstart

A Google Colaboratory version of most used commands are available in
this [Colab
Notebook](https://colab.research.google.com/drive/1C60V-jkcNZiaCra_V5iEyFs318jgVoUR)
. Note that this requires only an active internet connection (no
additional downloads are made).

The following notebooks document all the possible features of
\`pysradb\`:

1.  [Python
    API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/01.Python-API_demo.ipynb)
2.  [Downloading datasets from SRA - command
    line](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/02.Commandline_download.ipynb)
3.  [Parallely download multiple datasets - Python
    API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/03.ParallelDownload.ipynb)
4.  [Converting SRA-to-fastq - command line (requires
    conda)](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/04.SRA_to_fastq_conda.ipynb)
5.  [Downloading subsets of a project - Python
    API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/05.Downloading_subsets_of_a_project.ipynb)
6.  [Download
    BAMs](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/06.Download_BAMs.ipynb)
7.  [Metadata for multiple
    SRPs](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/07.Multiple_SRPs.ipynb)
8.  [Multithreaded fastq downloads using Aspera
    Client](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/08.pysradb_ascp_multithreaded.ipynb)
9.  [Searching
    SRA/GEO/ENA](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/09.Query_Search.ipynb)

## Installation

To install stable version using \`pip\`:

``` bash
pip install pysradb
```

Alternatively, if you use conda:

``` bash
conda install -c bioconda pysradb
```

This step will install all the dependencies. If you have an existing
environment with a lot of pre-installed packages, conda might be
[slow](https://github.com/bioconda/bioconda-recipes/issues/13774).
Please consider creating a new enviroment for `pysradb`:

``` bash
conda create -c bioconda -n pysradb PYTHON=3.10 pysradb
```

### Dependencies

    pandas
    requests
    tqdm
    xmltodict

### Installing pysradb in development mode

    git clone https://github.com/saketkc/pysradb.git
    cd pysradb && pip install -r requirements.txt
    pip install -e .

## Using pysradb

### Obtaining SRA metadata

    $ pysradb metadata SRP000941 | head

    study_accession experiment_accession experiment_title                                                                                                                 experiment_desc                                                                                                                  organism_taxid  organism_name library_strategy library_source  library_selection sample_accession sample_title instrument                    total_spots total_size    run_accession run_total_spots run_total_bases
    SRP000941       SRX056722                                                                         Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells                                                               Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC    ChIP            SRS184466                              Illumina HiSeq 2000    26900401     531654480   SRR179707     26900401         807012030
    SRP000941       SRX027889                                                                            Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells                                                                  Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC    ChIP            SRS116481                      Illumina Genome Analyzer II    37528590     779578968   SRR067978     37528590        1351029240
    SRP000941       SRX027888                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS116483                      Illumina Genome Analyzer II    13603127    3232309537   SRR067977     13603127         489712572
    SRP000941       SRX027887                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS116562                      Illumina Genome Analyzer II    22430523     506327844   SRR067976     22430523         807498828
    SRP000941       SRX027886                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS116560                      Illumina Genome Analyzer II    15342951     301720436   SRR067975     15342951         552346236
    SRP000941       SRX027885                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS116482                      Illumina Genome Analyzer II    39725232     851429082   SRR067974     39725232        1430108352
    SRP000941       SRX027884                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS116481                      Illumina Genome Analyzer II    32633277     544478483   SRR067973     32633277        1174797972
    SRP000941       SRX027883                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS004118                      Illumina Genome Analyzer II    22150965    3262293717   SRR067972      9357767         336879612
    SRP000941       SRX027883                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS004118                      Illumina Genome Analyzer II    22150965    3262293717   SRR067971     12793198         460555128

### Obtaining detailed SRA metadata

    $ pysradb metadata SRP075720 --detailed | head

    study_accession experiment_accession experiment_title                                  experiment_desc                                   organism_taxid  organism_name library_strategy library_source  library_selection sample_accession sample_title instrument           total_spots total_size run_accession run_total_spots run_total_bases
    SRP075720       SRX1800476            GSM2177569: Kcng4_2la_H9; Mus musculus; RNA-Seq   GSM2177569: Kcng4_2la_H9; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467643                    Illumina HiSeq 2500  2547148      97658407  SRR3587912    2547148         127357400
    SRP075720       SRX1800475            GSM2177568: Kcng4_2la_H8; Mus musculus; RNA-Seq   GSM2177568: Kcng4_2la_H8; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467642                    Illumina HiSeq 2500  2676053     101904264  SRR3587911    2676053         133802650
    SRP075720       SRX1800474            GSM2177567: Kcng4_2la_H7; Mus musculus; RNA-Seq   GSM2177567: Kcng4_2la_H7; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467641                    Illumina HiSeq 2500  1603567      61729014  SRR3587910    1603567          80178350
    SRP075720       SRX1800473            GSM2177566: Kcng4_2la_H6; Mus musculus; RNA-Seq   GSM2177566: Kcng4_2la_H6; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467640                    Illumina HiSeq 2500  2498920      94977329  SRR3587909    2498920         124946000
    SRP075720       SRX1800472            GSM2177565: Kcng4_2la_H5; Mus musculus; RNA-Seq   GSM2177565: Kcng4_2la_H5; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467639                    Illumina HiSeq 2500  2226670      83473957  SRR3587908    2226670         111333500
    SRP075720       SRX1800471            GSM2177564: Kcng4_2la_H4; Mus musculus; RNA-Seq   GSM2177564: Kcng4_2la_H4; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467638                    Illumina HiSeq 2500  2269546      87486278  SRR3587907    2269546         113477300
    SRP075720       SRX1800470            GSM2177563: Kcng4_2la_H3; Mus musculus; RNA-Seq   GSM2177563: Kcng4_2la_H3; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467636                    Illumina HiSeq 2500  2333284      88669838  SRR3587906    2333284         116664200
    SRP075720       SRX1800469            GSM2177562: Kcng4_2la_H2; Mus musculus; RNA-Seq   GSM2177562: Kcng4_2la_H2; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467637                    Illumina HiSeq 2500  2071159      79689296  SRR3587905    2071159         103557950
    SRP075720       SRX1800468            GSM2177561: Kcng4_2la_H1; Mus musculus; RNA-Seq   GSM2177561: Kcng4_2la_H1; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467635                    Illumina HiSeq 2500  2321657      89307894  SRR3587904    2321657         116082850

### Converting SRP to GSE

    $ pysradb srp-to-gse SRP075720

    study_accession study_alias
    SRP075720       GSE81903

### Converting GSM to SRP

    $ pysradb gsm-to-srp GSM2177186

    experiment_alias study_accession
    GSM2177186       SRP075720

### Converting GSM to GSE

    $ pysradb gsm-to-gse GSM2177186

    experiment_alias study_alias
    GSM2177186       GSE81903

### Converting GSM to SRX

    $ pysradb gsm-to-srx GSM2177186

    experiment_alias experiment_accession
    GSM2177186       SRX1800089

### Converting GSM to SRR

    $ pysradb gsm-to-srr GSM2177186

    experiment_alias run_accession
    GSM2177186       SRR3587529

### Downloading supplementary files from GEO

    $ pysradb download -g GSE161707

### Downloading an entire SRA/ENA project (multithreaded)

`pysradb` makes it super easy to download datasets from SRA parallely:
Using 8 threads to download:

    $ pysradb download -y -t 8 --out-dir ./pysradb_downloads -p SRP063852

Downloads are organized by `SRP/SRX/SRR` mimicking the hierarchy of SRA
projects.

### Downloading only certain samples of interest

    $ pysradb metadata SRP000941 --detailed | grep 'study\|RNA-Seq' | pysradb download

This will download all `RNA-seq` samples coming from this project.

### Ultrafast fastq downloads

With
[aspera-client](https://downloads.asperasoft.com/en/downloads/8?list)
installed, [pysradb]{.title-ref} can perform ultra fast downloads:

To download all original fastqs with [aspera-client]{.title-ref}
installed utilizing 8 threads:

    $ pysradb download -t 8 --use_ascp -p SRP002605

Refer to the notebook for [(shallow) time
benchmarks](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/08.pysradb_ascp_multithreaded.ipynb).

## Publication

> [pysradb: A Python package to query next-generation sequencing
> metadata and data from NCBI Sequence Read
> Archive](https://f1000research.com/articles/8-532/v1)
>
> Presentation slides from BOSC (ISMB-ECCB) 2019:
> <https://f1000research.com/slides/8-1183>

## Citation

Choudhary, Saket. \"pysradb: A Python Package to Query next-Generation
Sequencing Metadata and Data from NCBI Sequence Read Archive.\"
F1000Research, vol. 8, F1000 (Faculty of 1000 Ltd), Apr. 2019, p. 532
(<https://f1000research.com/articles/8-532/v1>)

    @article{Choudhary2019,
    doi = {10.12688/f1000research.18676.1},
    url = {https://doi.org/10.12688/f1000research.18676.1},
    year = {2019},
    month = apr,
    publisher = {F1000 (Faculty of 1000 Ltd)},
    volume = {8},
    pages = {532},
    author = {Saket Choudhary},
    title = {pysradb: A {P}ython package to query next-generation sequencing metadata and data from {NCBI} {S}equence {R}ead {A}rchive},
    journal = {F1000Research}
    }

Zenodo archive: <https://zenodo.org/badge/latestdoi/159590788>

Zenodo DOI: 10.5281/zenodo.2306881

## Questions?

Open an [issue](https://github.com/saketkc/pysradb/issues) or join our
[Slack
Channel](https://join.slack.com/t/pysradb/shared_invite/zt-f01jndpy-KflPu3Be5Aq3FzRh5wj1Ug).


# History

# 2.2.0 (2023-09-17)

- Add support for Biosamples and bioproject [#199](https://github.com/saketkc/pysradb/pull/198)
- Use retmode xml for Geo search [#200](https://github.com/saketkc/pysradb/pull/200)
- Documentation fixes

## 2.1.0 (2023-05-16)

-   Fix for [gse-to-srp]{.title-ref} returning unrequested GSEs ([#186
    \<https://github.com/saketkc/pysradb/issues/190\>]{.title-ref})
-   Fix for [download]{.title-ref} using [public_urls]{.title-ref}
-   Fix for [gsm-to-srx]{.title-ref} returning false positives ([#165
    \<https://github.com/saketkc/pysradb/issues/165\>]{.title-ref})
-   Fix for delimiter not being consistent when metadata is printed on
    terminal ([#147
    \<https://github.com/saketkc/pysradb/issues/147\>]{.title-ref})
-   ENA search is currently broken because of an API change

## 2.0.2 (2023-04-09)

-   Fix for [gse-to-srp]{.title-ref} to handle cases where a project is
    missing but SRXs are returned ([#186
    \<https://github.com/saketkc/pysradb/issues/186\>]{.title-ref})
-   Fix gse-to-gsm ([#187
    \<https://github.com/saketkc/pysradb/issues/187\>]{.title-ref})

## 2.0.1 (2023-03-18)

-   Fix for [pysradb download]{.title-ref} - using
    [public_url]{.title-ref}
-   Fix for SRX -\> SRR and related conversions ([#183
    \<https://github.com/saketkc/pysradb/pull/183\>]{.title-ref})

## 2.0.0 (2023-02-23)

-   BREAKING change: Overhaul of how urls and associated metadata are
    returned (not backward compatible); all column names are lower cased
    by default
-   Fix extra space in \"organism_taxid\" column
-   Added support for Experiment attributes ([#89
    \<https://github.com/saketkc/pysradb/issues/89#issuecomment-1439319532\>]{.title-ref})

## 1.4.2 (06-17-2022)

-   Fix ENA fastq fetching ([#163
    \<https://github.com/saketkc/pysradb/issues/163\>]{.title-ref})

## 1.4.1 (06-04-2022)

-   Fix for fetchin alternative URLs

## 1.4.0 (06-04-2022)

-   Added ability to fetch alternative URLs (GCP/AWS) for metadata
    ([#161
    \<https://github.com/saketkc/pysradb/issues/161\>]{.title-ref})
-   Fix for xmldict 0.13.0 no longer defaulting to OrderedDict ([#159
    \<https://github.com/saketkc/pysradb/pull/159\>]{.title-ref})
-   Fix for missing experiment model and description in metadata ([#160
    \<https://github.com/saketkc/pysradb/issues/160\>]{.title-ref})

## 1.3.0 (02-18-2022)

-   Add [study_title]{.title-ref} to [\--detailed]{.title-ref} flag
    ([#152](https://github.com/saketkc/pysradb/issues/152))
-   Fix [KeyError]{.title-ref} in [metadata]{.title-ref} where some new
    IDs do not have any metadata
    ([#151](https://github.com/saketkc/pysradb/issues/151))

## 1.2.0 (01-10-2022)

-   Do not exit if a qeury returns no hits ([#149
    \<https://github.com/saketkc/pysradb/pull/149\>]{.title-ref})

## 1.1.0 (12-12-2021)

-   Fixed [gsm-to-gse]{.title-ref} failure
    ([#128](https://github.com/saketkc/pysradb/pull/128))
-   Fixed case sensitivity bug for ENA search
    ([#144](https://github.com/saketkc/pysradb/pull/144))
-   Fixed publication date bug for search
    ([#146](https://github.com/saketkc/pysradb/pull/146))
-   Added support for downloading data from GEO [pysradb dowload -g
    \<GSE\>]{.title-ref}
    ([#129](https://github.com/saketkc/pysradb/pull/129))

## 1.0.1 (01-10-2021)

-   Dropped Python 3.6 since pandas 1.2 is not supported

## 1.0.0 (01-09-2021)

-   Retired `metadb` and `SRAdb` based search through CLI - everything
    defaults to `SRAweb`
-   `SRAweb` now supports
    [search](https://saket-choudhary.me/pysradb/quickstart.html#search)
-   [N/A]{.title-ref} is now replaced with [pd.NA]{.title-ref}
-   Two new fields in \`\--detailed\`: [instrument_model]{.title-ref}
    and [instrument_model_desc]{.title-ref}
    [#75](https://github.com/saketkc/pysradb/issues/75)
-   Updated documentation

## 0.11.1 (09-18-2020)

-   [library_layout]{.title-ref} is now outputted in metadata #56
-   [-detailed]{.title-ref} unifies columns for ENA fastq links instead
    of appending \_x/\_y #59
-   bugfix for parsing namespace in xml outputs #65
-   XML errors from NCBI are now handled more gracefully #69
-   Documentation and dependency updates

## 0.11.0 (09-04-2020)

-   [pysradb download]{.title-ref} now supports multiple threads for
    paralle downloads
-   [pysradb download]{.title-ref} also supports ultra fast downloads of
    FASTQs from ENA using aspera-client

## 0.10.3 (03-26-2020)

-   Added test cases for SRAweb
-   API limit exceeding errors are automagically handled
-   Bug fixes for GSE \<=\> SRR
-   Bug fix for metadata - supports multiple SRPs

Contributors

-   Dibya Gautam
-   Marius van den Beek

## 0.10.2 (02-05-2020)

-   Bug fix: Handle API-rate limit exceeding =\> Retries
-   Enhancement: \'Alternatives\' URLs are now part of
    [\--detailed]{.title-ref}

## 0.10.1 (02-04-2020)

-   Bug fix: Handle Python3.6 for capture_output in subprocess.run

## 0.10.0 (01-31-2020)

-   All the subcommands (srx-to-srr, srx-to-srs) will now print
    additional columns where the first two columns represent the
    relevant conversion
-   Fixed a bug where for fetching entries with single efetch record

## 0.9.9 (01-15-2020)

-   Major fix: some SRRs would go missing as the experiment dict was
    being created only once per SRR (See #15)
-   Features: More detailed metadata by default in the SRAweb mode
-   See notebook: <https://colab.research.google.com/drive/1C60V->

## 0.9.7 (01-20-2020)

-   Feature: instrument, run size and total spots are now printed in the
    metadata by default (SRAweb mode only)
-   Issue: Fixed an issue with srapath failing on SRP. srapath is now
    run on individual SRRs.

## 0.9.6 (07-20-2019)

-   Introduced [SRAweb]{.title-ref} to perform queries over the web if
    the SQLite is missing or does not contain the relevant record.

## 0.9.0 (02-27-2019)

### Others

-   This release completely changes the command line interface replacing
    click with argparse (<https://github.com/saketkc/pysradb/pull/3>)
-   Removed Python 2 comptaible stale code

## 0.8.0 (02-26-2019)

### New methods/functionality

-   \`srr-to-gsm\`: convert SRR to GSM
-   SRAmetadb.sqlite.gz file is deleted by default after extraction
-   When SRAmetadb is not found a confirmation is seeked before
    downloading
-   Confirmation option before SRA downloads

### Bugfix

-   download() works with wget

### Others

-   [\--out_dir]{.title-ref} is now [out-dir]{.title-ref}

## 0.7.1 (02-18-2019)

Important: Python2 is no longer supported. Please consider moving to
Python3.

### Bugfix

-   Included docs in the index whihch were missed out in the previous
    release

## 0.7.0 (02-08-2019)

### New methods/functionality

-   \`gsm-to-srr\`: convert GSM to SRR
-   \`gsm-to-srx\`: convert GSM to SRX
-   \`gsm-to-gse\`: convert GSM to GSE

### Renamed methods

The following commad line options have been renamed and the changes are
not compatible with 0.6.0 release:

-   [sra-metadata]{.title-ref} -\> [metadata]{.title-ref}.
-   [sra-search]{.title-ref} -\> [search]{.title-ref}.
-   [srametadb]{.title-ref} -\> [metadb]{.title-ref}.

## 0.6.0 (12-25-2018)

### Bugfix

-   Fixed bugs introduced in 0.5.0 with API changes where multiple
    redundant columns were output in [sra-metadata]{.title-ref}

### New methods/functionality

-   [download]{.title-ref} now allows piped inputs

## 0.5.0 (12-24-2018)

### New methods/functionality

-   Support for filtering by SRX Id for SRA downloads.
-   \`srr_to_srx\`: Convert SRR to SRX/SRP
-   \`srp_to_srx\`: Convert SRP to SRX
-   Stripped down [sra-metadata]{.title-ref} to give minimal information
-   Added [\--assay]{.title-ref}, [\--desc]{.title-ref},
    [\--detailed]{.title-ref} flag for [sra-metadata]{.title-ref}
-   Improved table printing on terminal

## 0.4.2 (12-16-2018)

### Bugfix

-   Fixed unicode error in tests for Python2

## 0.4.0 (12-12-2018)

### New methods/functionality

-   Added a new [BASEdb]{.title-ref} class to handle common database
    connections
-   Initial support for GEOmetadb through GEOdb class
-   Initial support or a command line interface:
    -   download Download SRA project (SRPnnnn)
    -   gse-metadata Fetch metadata for GEO ID (GSEnnnn)
    -   gse-to-gsm Get GSM(s) for GSE
    -   gsm-metadata Fetch metadata for GSM ID (GSMnnnn)
    -   sra-metadata Fetch metadata for SRA project (SRPnnnn)
-   Added three separate notebooks for SRAdb, GEOdb, CLI usage

## 0.3.0 (12-05-2018)

### New methods/functionality

-   [sample_attribute]{.title-ref} and
    [experiment_attribute]{.title-ref} are now included by default in
    the df returned by [sra_metadata()]{.title-ref}
-   [expand_sample_attribute_columns: expand metadata dataframe based on
    attributes in \`sample_attribute]{.title-ref} column
-   New methods to guess cell/tissue/strain:
    [guess_cell_type()]{.title-ref}/[guess_tissue_type()]{.title-ref}/[guess_strain_type()]{.title-ref}
-   Improved README and usage instructions

## 0.2.2 (12-03-2018)

### New methods/functionality

-   [search_sra()]{.title-ref} allows full text search on SRA metadata.

## 0.2.0 (12-03-2018)

### Renamed methods

The following methods have been renamed and the changes are not
compatible with 0.1.0 release:

-   [get_query()]{.title-ref} -\> [query()]{.title-ref}.
-   [sra_convert()]{.title-ref} -\> [sra_metadata()]{.title-ref}.
-   [get_table_counts()]{.title-ref} -\> [all_row_counts()]{.title-ref}.

### New methods/functionality

-   [download_sradb_file()]{.title-ref} makes fetching
    [SRAmetadb.sqlite]{.title-ref} file easy; wget is no longer
    required.
-   [ftp]{.title-ref} protocol is now supported besides
    [fsp]{.title-ref} and hence [aspera-client]{.title-ref} is now
    optional. We however, strongly recommend [aspera-client]{.title-ref}
    for faster downloads.

### Bug fixes

-   Silenced [SettingWithCopyWarning]{.title-ref} by excplicitly doing
    operations on a copy of the dataframe instead of the original.

Besides these, all methods now follow a [numpydoc]{.title-ref}
compatible documentation.

## 0.1.0 (12-01-2018)

-   First release on PyPI.
