
Java >= 5 (aka v1.5) is required.

NINJA is software for large-scale neighbor-joining phylogeny inference. It
expects inputs to be either alignments (in fasta format) or pairwise distance
matrices (in phylip format), and can produce both a distance matrix (phylip)
and a tree file (newick format).

Details of the algorithms used in NINJA are available in the original paper.  See
website (http://nimbletwist.com/software/ninja/) for links.


Quick-start
-----------

(1) After downloading Ninja.tgz, unpack with the command
  tar -xzf Ninja.tgz
This will create a folder ninja_x.y.z, containing three files:
  - 00README (this file)
  - Ninja.jar (the heart of NINJA)
  - ninja (a shell script that runs Ninja.jar with appropriate Java flags)

(2) To build a tree, use one of the following commands:

  ./ninja alignment.fasta > tree.newick
  (or)
  ./ninja --in alignment.fasta --out tree.newick

NINJA can also output a phylip formatted distance matrix, or accept such a
matrix as input, with the following commands, respectively:
  ./ninja --out_type d alignment.fasta > distance_file
  (and)
  ./ninja --in_type d distance_file > tree.newick


For a list of ninja arguments, use the --help flag:
  ./ninja --help

If you wish to bypass the shell script, you can run Ninja.jar directly with
a command like:
  java -server -Xmx2G -jar Ninja.jar --in alignment.fasta --out tree.newick
(If you do this because of some failing in the ninja shell script, please 
e-mail me so I can improve the shell script)


(3) The external-memory variant of NINJA, used for inputs of many thousands
of sequence, makes moderate use of the disk. That proves to be a negligible 
problem for other applications running concurrently, since most apps don't 
use the disk all that much, but can become a problem if several instances of
NINJA are hitting the same disk. This is particularly a concern for clusters 
where the compute nodes all share a common disk, e.g. though NFS.  NINJA 
allows you to manage this issue by specifying where it should place the 
temporary folder in which it holds all the temporary files used to store 
data structures on disk.  If you have a cluster, each of your compute nodes 
likely has a local disk drive - you'd just tell NINJA to do all its 
temporary work in a directory that maps to that local disk. The flag "-t" 
gets you there :
  ./ninja -t /local/disk alignment.fasta > tree.newick


Source code is included in the jar. Extract the jar ('jar xf Ninja.jar'), then 
enter the src directory.

NINJA has been tested on unix and linux operating systems, and should work on 
cygwin. It should also work in Windows, though I've had a report of a problem 
that appears to be an error in the Windows Java BufferedReader (yuck).


Common arguments (optional)
---------------------------

--in (or -i) filename 
    Specify file (phylip format) containing precomputed pairwise distances.
    (default is to read the filename from the first argument not associated
    with a -- flag, as shown above) 

--out (or -o) filename 
    Specify file to which the tree will be written (default = stdout)

--method (or -m) meth  [default | inmem | extmem]
    'inmem' method uses in-memory algorithm, which is roughly 3x faster, but 
         will fail for large inputs. For example, with -Xmx2G (the default
         in the ninja shell script), inmem will work for up to around 7,000 
         sequences.
    'extmem' method forces use of the external-memory algorithm, which 
         is required for large inputs.        
    'default' will first try the inmem method, then failover to extmem if
         inmem runs out of memory. 

--in_type type [a | d]
    Type of input, alignment or distance matrix
    'a' type causes NINJA to accept an alignment in fasta format
    'd' type causes NINJA to accept a (phylip format) file containing 
    precomputed pairwise distances.
    Default = a
    
--out_type type [t | d]   
    Type of output, tree or distance matrix
    't' type computes the tree from the input file (whether an alignment
         or a distance matrix), and writes a tree to the file specified with
         the --out flag
    'd' type computes the distance matrix from the input alignment, and
         writes the matrix to the file specified with the --out flag
    Default = t           
         
--alph_type type [a | d]
    'a' if input consists of amino acid sequence
    'd' if input consists of dna sequence
    Default = checks sequence, choses 'a' if any characters other than [acgt]
    are seen.  
    
--corr_type type [n | j | k | s]
    Correction for multiple same-site substitutions.
    'n' no correction
    'j' jukes-cantor correction  { dist = -3/4 * ln (1 - 4/3 * dist ) }
    'k' kimura 2-parameter correction { dist = -1/2 * ln ( (1-2p-q)*sqrt(1-2q) ) }    
    's' FastTree's scoredist-like correction { dist = -1.3 * ln (1.0 - dist) }
    Default: 'k' for DNA, 's' for amino acid

--verbose integer
	Controls verbosity. Range from 0 to 3, with default=1.
	
--quiet
	Set verbosity to 0.
	
--tmp_dir (or -t) /path/to/temp/directory
    Specifies which directory NINJA should use to store all the temporary files
    used to store data structures on disk. If NINJA is run on a cluster, it is 
    advised to store temporary files on the local compute node, to avoid the
    bottleneck imposed by all instances of NINJA using a shared disk.

--clust_size (or -s)  
    See paper for details.  Default = 30.

--rebuild_step_ratio (or -r)  
    See paper for details.  Default = 0.5.

--help
    Show this output.
    
--version
    Provide version information.
