Metadata-Version: 2.4
Name: gnomonicus
Version: 3.1.1
Summary: Python code to integrate results of tb-pipeline and provide an antibiogram, mutations and variants
Author-email: Jeremy Westhead <jeremy.westhead@ndm.ox.ac.uk>, Philip W Fowler <philip.fowler@ndm.ox.ac.uk>
License: LICENCE TERMS FOR ACCESS TO SOFTWARE AND DATABASE FOR ACADEMIC USE
        
        These licence terms apply to all licences granted by THE CHANCELLOR, MASTERS AND SCHOLARS OF THE UNIVERSITY
        OF OXFORD whose administrative offices are at University Offices, Wellington Square, Oxford OX1 2JD,
        United Kingdom (the "University") for use of or access to gnomonicus software ("the Software")
        through this website (the "Website").
        
        PLEASE READ THESE LICENCE TERMS CAREFULLY BEFORE USING THE DATABASE OR THE SOFTWARE THROUGH THIS WEBSITE.
        IF YOU DO NOT AGREE TO THESE LICENCE TERMS YOU SHOULD NOT DOWNLOAD OR USE THE SOFTWARE.
        
        THE SOFTWARE ARE INTENDED FOR ACADEMICS CARRYING OUT RESEARCH AND NOT FOR USE BY CONSUMERS OR COMMERCIAL BUSINESSES.
        
        1.	Academic Use Licence
        
        1.1	The User is granted a limited non-exclusive and non-transferable royalty free licence to access and
        use the Software provided that the User will:
        
        (a)	limit their use of the Software to their own internal academic non-commercial research which is
        undertaken for the purposes of education or other scholarly use;
        
        (b)	not use the Software for or on behalf of any third party or to provide a service or integrate all
        or part of the Software into a product for sale or license to third parties;
        
        (c)	use the Software in accordance with the prevailing instructions and guidance for use given on the
        Website and comply with procedures on the Website for user identification, authentication and access;
        
        (d)	comply with all applicable laws and regulations with respect to their use of the Software;
        
        (e)	except to the extent expressly permitted under these terms, not attempt to: reverse compile, disassemble
        or copy, modify, duplicate, create derivative works from, frame, mirror, republish, download, display, transmit,
        or distribute all or any portion of the Software or Website in any form or media or by any means; and
        
        (f)	ensure that the Copyright Notice “Copyright © 2022, University of Oxford” appears prominently wherever
        results from the Software are used, and is referenced or cited with the Copyright Notice when the Software
        is described in any research publication or on any documents or other material created using the Software.
        
        1.2	the University reserves the right at any time and without liability or prior notice to the User to
        revise, modify and replace the functionality and performance of the access to and operation of the Software.
        
        1.3	The User acknowledges and agrees that the University owns all intellectual property rights in the Software.
        The User shall not have any right, title or interest in or to any results or other output from the Software.
        
        1.4	This Licence will terminate immediately and the User will no longer have any right exercise any of the
        rights granted to the User upon any breach of the conditions in Section 1.1 of this Licence.
        
        2.	Indemnity and Liability
        
        2.1	The User shall defend, indemnify and hold harmless the University against any claims, actions, proceedings,
        losses, damages, expenses and costs (including without limitation court costs and reasonable legal fees) arising
        out of or in connection with the User's possession or use of the Software, or any breach of these terms by the User.
        
        2.2	The Software are provided on an ‘as is’ basis and the User uses the Software at their own risk. No representations,
        conditions, warranties or other terms of any kind are given in respect of the Software and all statutory warranties
        and conditions are excluded to the fullest extent permitted by law. Without affecting the generality of the previous
        sentences, the University gives no implied or express warranty and makes no representation that the Software or
        any part of them: (a) will enable specific results to be obtained; or (b) meets a particular specification or is
        comprehensive within its field or that it is error free or will operate without interruption; or (c) is suitable
        for any particular, or the User's specific purposes.
        
        2.3	Except in relation to fraud, death or personal injury, the University's liability to the User for any use of
        the Software, in negligence or arising in any other way out of the subject matter of these licence terms, will not
        extend to any incidental or consequential damages or losses, or any loss of profits, loss of revenue, loss of data,
        loss of contracts or opportunity, whether direct or indirect.
        
        2.4	The User hereby irrevocably undertakes to the University not to make any claim against any employee, student,
        researcher or other individual engaged by the University, being a claim which seeks to enforce against any of them
        any liability whatsoever in connection with this agreement or its subject-matter.
        
        3.	General
        
        3.1	Severability - If any provision (or part of a provision) of these licence terms is found by any court or
        administrative body of competent jurisdiction to be invalid, unenforceable or illegal, the other provisions shall
        remain in force.
        
        3.2	Entire Agreement - These licence terms and any documents referred to in them, constitute the whole agreement
        between the parties and supersede any previous arrangement, understanding or agreement between them relating to
        the Software.
        
        3.3	Law and Jurisdiction - These licence terms and any disputes or claims arising out of or in connection with them
        shall be governed by, and construed in accordance with, the law of England. The User irrevocably submits to the
        exclusive jurisdiction of the English courts for any dispute or claim that arises out of or in connection with
        these licence terms.
        
        If you are interested in using the Software commercially, please contact Oxford University Innovation Limited to
        negotiate a licence. Contact details are enquiries@innovation.ox.ac.uk quoting reference (TBD).
License-File: LICENSE
Keywords: TB,antimicrobial resistance,bioinformatics,clockwork,gnomonicus,lodestone,piezo
Requires-Python: >=3.10
Requires-Dist: bio-grumpy>=1.1.4
Requires-Dist: numpy==1.26.1
Requires-Dist: pandas==2.1.1
Requires-Dist: piezo>=0.9.1
Requires-Dist: tqdm==4.66.1
Requires-Dist: vcf-subset>=2.0.0
Provides-Extra: dev
Requires-Dist: black; extra == 'dev'
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pandas-stubs; extra == 'dev'
Requires-Dist: recursive-diff==1.1.0; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Requires-Dist: types-pytz; extra == 'dev'
Requires-Dist: types-tqdm; extra == 'dev'
Requires-Dist: typing-extensions; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs; extra == 'docs'
Requires-Dist: mkdocs-gen-files>=0.5.0; extra == 'docs'
Requires-Dist: mkdocs-include-markdown-plugin>=4.0.4; extra == 'docs'
Requires-Dist: mkdocs-literate-nav>=0.6.0; extra == 'docs'
Requires-Dist: mkdocs-material>=9.1.8; extra == 'docs'
Requires-Dist: mkdocs-section-index>=0.3.5; extra == 'docs'
Requires-Dist: mkdocstrings-python>=1.0.0; extra == 'docs'
Requires-Dist: mkdocstrings>=0.21.2; extra == 'docs'
Requires-Dist: requests==2.29.0; extra == 'docs'
Description-Content-Type: text/markdown

[![Tests](https://github.com/oxfordmmm/gnomonicus/actions/workflows/tests.yaml/badge.svg)](https://github.com/oxfordmmm/gnomonicus/actions/workflows/tests.yaml) 
[![Build and release Docker](https://github.com/oxfordmmm/gnomonicus/actions/workflows/build.yaml/badge.svg)](https://github.com/oxfordmmm/gnomonicus/actions/workflows/build.yaml) 
[![PyPI version](https://badge.fury.io/py/gnomonicus.svg)](https://badge.fury.io/py/gnomonicus)
[![Docs](https://github.com/oxfordmmm/gnomonicus/actions/workflows/docs.yaml/badge.svg)](https://oxfordmmm.github.io/gnomonicus/)

# gnomonicus
Python code to integrate results of tb-pipeline and provide an antibiogram, mutations and variations

Provides a library of functions for use within scripts, as well as a CLI tool for linking the functions together to produce output

## Documentation
API reference for developers, and CLI instructions can be found here: https://oxfordmmm.github.io/gnomonicus/ 
## Usage
```
usage: gnomonicus [-h] [-v] --vcf_file VCF_FILE --genome_object GENOME_OBJECT [--catalogue_file CATALOGUE_FILE] [--ignore_vcf_filter] [--output_dir OUTPUT_DIR] [--json] [--csvs CSVS [CSVS ...]] [--debug]
                  [--resistance_genes] --min_dp MIN_DP

options:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  --vcf_file VCF_FILE   the path to a single VCF file
  --genome_object GENOME_OBJECT
                        the path to a genbank file
  --catalogue_file CATALOGUE_FILE
                        the path to the resistance catalogue
  --ignore_vcf_filter   whether to ignore the FILTER field in the vcf (e.g. necessary for some versions of Clockwork VCFs)
  --output_dir OUTPUT_DIR
                        Directory to save output files to. Defaults to wherever the script is run from.
  --json                Flag to create a single JSON output as well as the CSVs
  --csvs CSVS [CSVS ...]
                        Types of CSV to produce. Accepted values are [variants, mutations, effects, predictions, all]. `all` produces all of the CSVs
  --debug               Whether to log debugging messages to the log. Defaults to False
  --resistance_genes    Flag to filter mutations and variants to only include genes present in the resistance catalogue
  --min_dp MIN_DP       Minimum depth for a variant to be considered in the VCF. Below this value, rows are interpreted as null calls.
```

## Install
Simple install using pip for the latest release
```
pip install gnomonicus
```

Install from source
```
git clone https://github.com/oxfordmmm/gnomonicus.git
cd gnomonicus
pip install -e .
```

## Docker
A Docker image should be built on releases. To open a shell with gnomonicus installed:
```
docker run -it oxfordmmm/gnomonicus:latest
```

## Notes
When generating mutations, in cases of synonymous amino acid mutation, the nucelotides changed are also included. This can lead to a mix of nucleotides and amino acids for coding genes, but these are excluded from generating effects unless specified in the catalogue. This means that the default rule of `gene@*= --> S` is still in place regardless of the introduced `gene@*?` which would otherwise take precedence. For example:
```
  'MUTATIONS': [
      {
          'MUTATION': 'F2F',
          'GENE': 'S',
          'GENE_POSITION': 2
      },
      {
          'MUTATION': 't6c',
          'GENE': 'S',
          'GENE_POSITION': 6
      },
  ],
  'EFFECTS': {
      'AAA': [
          {
              'GENE': 'S',
              'MUTATION': 'F2F',
              'PREDICTION': 'S'
          },
          {
              'PHENOTYPE': 'S'
          }
      ],
  }
```
The nucelotide variation is included in the the `MUTATIONS`, but explictly removed from the `EFFECTS` unless it is specified within the catalogue.
In order for this variation to be included, a line in the catalogue of `S@F2F&S@t6c` would have to be present.

## User stories

1. As a bioinformatician, I want to be able to run `gnomonicus` on the command line, passing it (i) a GenBank file ~~(or pickled `gumpy.Genome` object)~~, (ii) a resistance catalogue and (iii) a VCF file, and get back `pandas.DataFrames` of the genetic variants, mutations, effects and predictions/antibiogram. The latter is for all the drugs described in the passed resistance catalogue.

2. As a GPAS developer, I want to be able to embed `gnomonicus` in a Docker image/NextFlow pipeline that consumes the outputs of [tb-pipeline](https://github.com/Pathogen-Genomics-Cymru/tb-pipeline) and emits a structured, well-designed `JSON` object describing the genetic variants, mutations, effects and predictions/antibiogram.

3. In general, I would also like the option to output fixed- and variable-length FASTA files (the latter takes into account insertions and deletions described in any input VCF file).

## Unit testing

For speed, rather than use NC_000962.3 (i.e. H37Rv *M. tuberculosis*), we shall use SARS-CoV-2 and have created a fictious drug resistance catalogue, along with some `vcf` files and the expected outputs in `tests/`.

These can be run with `pytest -vv`