Metadata-Version: 2.1
Name: extremevariantfilter
Version: 0.0a2
Summary: A set of tools to aid in the identification of false positive variants in Variant Call Files.
Home-page: https://github.com/Ellis-Anderson/extremevariantfilter
Author: Complete Genomics
Author-email: eanderson@genomics.cn
License: UNKNOWN
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 2.7
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Description-Content-Type: text/markdown
Requires-Dist: scikit-learn
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: xgboost
Requires-Dist: docopt

# ExtremeVariantFilter

Extreme Variant Filter is a set of tools developed to aid in the identification of false positive variants in Genomic Variant Call Files based on XGBoost.

### Functions

__apply_filter__

    Usage:
        apply_filter (--vcf STR) (--snp-model STR) (--indel-model STR) [--verbose]

    Description:
        Apply models generated by train_model to a VCF.

    Arguments:
        --vcf STR                     VCF to be filtered
        --snp-model STR               Model for applying to SNPs
        --indel-model INT             Model for applying to InDels

    Options:
        -h, --help                      Show this help message and exit.
        -v, --version                   Show version and exit.
        --verbose                       Log output

    Examples:
        apply_filter --vcf <table> --snp-model <snp.pickle.dat> --indel-model <indel.pickle.dat>

__train_model__

    Usage:
        train_model (--true-pos STR) (--false-pos STR) (--type STR) [--out STR] [--njobs INT] [--verbose]

    Description:
        Train a model to be saved and used to filter VCFs.

    Arguments:
        --true-pos STR          Path to true-positive VCF from VCFeval or comma-seperated list of paths
        --false-pos STR         Path to false-positive VCF from VCFeval or comma-seperated list of paths
        --type STR              SNP or INDEL

    Options:
        -o, --out <STR>                 Outfile name for writing model [default: (type).filter.pickle.dat]
        -n, --njobs <INT>               Number of threads to run in parallel [default: 2]
        -h, --help                      Show this help message and exit.
        -v, --version                   Show version and exit.
        --verbose                       Log output

    Examples:
        train_table --true-pos <path/to/tp/vcf(s)> --false-pos <path/to/fp/vcf(s)> --type [SNP, INDEL] --njobs 20

### Install

To install and run EVF simply type:

    pip install extremevariantfilter

### stLFR Paper Results

If you'd like to use this tool to corroborate the results from the 
[stLFR Paper on Bioarxiv](https://www.biorxiv.org/content/early/2018/05/17/324392.1) paper, 
the models used for variant filtering are available within the `models/` directory. 
In order to get identical results, after installation, use the command
`pip install -r requirements.txt` from within this directory to ensure your 
environment matches the one we used for our results. 
Different versions of certain packages will result in variable results.  


