Metadata-Version: 1.0
Name: psicons.core
Version: 0.1
Summary: A tool for documentable and reproducible analysis and research
Home-page: http://svn.plone.org/svn/collective/
Author: Paul Agapow
Author-email: pma@agapow.net
License: GPL
Description: ==================
        About psicons.core
        ==================
        
        .. |psicons| replace:: *psicons*
        
        
        Background
        ----------
        
        Scientific analysis can be problematic:
        
        * It may involve multiple steps, each using the results of the previous stage.
        Making a mistake often means repeating the whole series for safety.
        
        * Sometimes analysis chains have to be repeated on different datasets.
        Sometimes, even within a single analysis, the same manipulation or test has
        to be repeated with slightly different parameters.
        
        * Even immediately after the fact, it's easy to forget what was done. 9 months
        later when responding to a referee's report, it may be impossible.
        
        * Collaborators, clients or bosses may demand accountability.
        
        * With long but routine tasks, it's easy to get bored and make mistakes.
        
        It this light, |psicons| is a quick and dirty hack to subvert the scons build
        system for scientific analysis. Every stage of analysis is a command-line call
        to a script or executable that takes *inputs* and produces *outputs*. When
        scons is called, dependencies between outputs and inputs are tested and only
        those stages are run that are necessary to update outputs. In addition, the
        exact sequence of analyses is recorded by the build file.
        
        In summary, |psicons| provides:
        
        * Repeatability: running a build file, reruns the same analysis
        
        * Reproducibility: the build file (and custom scripts) document the steps of
        the analysis
        * Minimization of effort: if inputs or analysis steps are changed, only the
        necessary (dependent) steps of the analysis are rerun
        
        * Mistake-resistant: errors don't derail analysis due to reproducibility ("what
        did I do") and minimization of effort (only dependent steps are repeated)
        
        * Programmability: analysis tasks may be constructed with programatically
        ("repeat the analysis across this parameter range")
        
        
        Status
        ------
        
        *psicons* is very much a hack-and-see project, having been produced in the
        aftermath of the 2009 H1N1 pandemic from the need for complex processing of
        large amounts of sequence data using ad-hoc scripts and formats. It worked
        well in that limited role, but is still an early release exploring the
        approach. Functionality is limited and the API may change. There are other
        more developed (and more specialised) alternatives. Comment is invited.
        
        
        Installation
        ------------
        
        The simplest way to install *psicons* is via ``easy_install`` [setuptools]_ or
        an equivalent program::
        
        % easy_install psicons.core
        
        Alternatively the tarball can be downloaded, unpacked and ``setup.py`` run::
        
        % tar zxvf psicons-core.tgz
        % cd psicons-core
        % python set.py install
        
        It should work with just about any  version of Python.
        
        *psicons* requires that scons is installed, which is where things get tricky.
        Scons by default installs itself in a sandboxed way with multiple versions
        living side-by-side and thus being non-importable. Of course, |psicons| needs
        to use the scons library, so a conventional installation must be forced.
        Download the scons tarball, unpack it and install it like thus::
        
        % python setup.py install --standard-lib
        
        
        Using psicons
        -------------
        
        A full API is included in the source distribution.
        
        
        Examples
        ~~~~~~~~
        
        Psicons works just like scons. In fact, it is scons. More details are available
        elsewhere but briefly, you run scons like this::
        
        # look for a build file called "Sconstruct" by default
        % scons
        # looks for a named build file
        % scons -f mybuildfile
        
        This causes scons to execute the build file, which is just a Python script,
        defining a series of tasks or *commands*::
        
        # an scons build file
        # some necessary administration - set up the build environment
        env = Environment()
        
        # compile two libraries and then combine into one program
        first_libs = env.Object ('hello.c', CCFLAGS='-DHELLO')
        second_libs = env.Object ('goodbye.c', CCFLAGS='-DGOODBYE')
        env.Program (first_libs + second_libs)
        
        The first time this file is executed, the first two commands build libraries,
        while the third combines the libraries into a single executable. Dependencies
        between the steps are automatically tracked: should one of the original source
        files be changed (e.g. hello.c), when the file is rerun only the steps
        "downstream" of it (e.g. recompilation of the first library, and the final
        linking) are rerun.
        
        Scons has a large number of commands for all sorts of software builds. Psicons
        adds two new commands, so that local scripts or external programs can be used
        to in a build. In this way, complex multi-step analyses can be constructed from
        a series of interdependent commands, that "build" intermediate data and final
        results::
        
        from psicons.core import *
        env = Environment()
        
        # call a local script
        IN_DATA = 'jg_08-10_2010.csv'
        CLEAN_DATA = 'jg_08-10_2010-cleaned.csv'
        make_clean_data = Script (env, 'clean_seqs.py',
        args = ['--save-as', CLEAN_DATA],
        infiles = [IN_DATA],
        output = CLEAN_DATA,
        )
        
        # call an external command
        EPI_DATA = 'jg-types.txt'
        RESULT_DATA = 'results.tab'
        type_data = External (env, 'treemaker',
        args = ['--save-as', RESULT_DATA],
        infiles = [CLEAN_DATA, EPI_DATA],
        output = [RESULT_DATA],
        )
        
        The interfaces of these two commands are similar:
        
        * what is being called?
        * what inputs does it use (depend on)?
        * what outputs does it produce?
        
        When scons is run on this build file, it calls the script 'clean_seqs.py' on
        `IN_DATA` to produce `CLEAN_DATA`. Then the external program `treemaker` is
        called on `CLEAN_DATA` and `EPI_DATA` to produce `RESULT_DATA`. Should
        `EPI_DATA` be edited, when scons is called again, only the second external
        step will be run again as the first step and it's results is still up to date.
        Thus:
        
        * Analyses may be run (and rerun) easily
        
        * If data changes (or scripts change - bug fixes), only the necessary steps are
        rerun
        
        * The actions taken are recorded in the build file
        
        To ease renaming intermediate or output files in a rational way, *psicons*
        offers a few utility functions for interpolating file names from parameters. To
        illustrate::
        
        # generate a new string from a template
        >>> d = {'foo': '123', 'bar': '456'}
        >>> interpolate ('ab{foo}cd{bar}ef', d)
        'ab123cd456ef'
        # name new file name from old by adding suffix to name
        >>> interpolate_from_path ('mydata.csv', '{stem}-cleaned{ext}')
        'mydata-cleaned.csv'
        
        
        Limitations
        -----------
        
        Certainly, far, far more complicated reproducibility tools are out there (see `
        here <http://csdl2.computer.org/comp/mags/cs/2009/01/mcs2009010005.pdf>`__) but
        many are based around certain disciplines (e.g. geophysics, computational math),
        require working through web interfaces or using very standard sets of analysis
        tools. *psicons* is written from the point of view of a bioinformatician doing
        sequence and phylogenetic analysis, working on the commandline using a lot of
        custom scripts and an endlessly changing lineup of supplied tools. As sometimes
        happens, other tools didn't fit, so I wrote one that did.
        
        As with many quick hack tools, documentation is currently a bit thin.
        
        The need for a modified scons installation is a blemish. Future versions of
        |psicons| may need to directly incorporate scons for ease of installation.
        
        Clearly, a set of standard tools for extracting, transforming and plotting
        data would be a powerful addition to *psicons*. This doesn't exist as yet.
        
        The process of installing a new tool or module for use by SCons is fiddly
        [scons-tools]_, involves copying libraries in a tool directory, registering
        their use in build files, voids the ease of using ``easy_install`` and makes
        development a pain. Thus, the additional "commands" are real scons commands, so
        much as functions that generate commands. But they are easy to use.
        
        
        Credit
        ------
        
        Thanks to the architects of Scons, of course.
        
        While this project was started before encountering Madagascar [madagascar]_, it
        has inevitably shaped development. It's a remarkably powerful system, although
        ill-suited to my current purposes. You should check it out.
        
        Only when writing this document did I become aware of sconstools [sconstools]_,
        which seems to be following exactly the same direction as |psicons|.
        
        
        References
        ----------
        
        .. [psicons-home] `psicons home page <http://www.agapow.net/software/psicons-core>`__
        
        .. [psicons-pypi] `psicons on PyPi <http://pypi.python.org/pypi/psicons-core>`__
        
        .. [setuptools] `Setuptools & easy_install <http://packages.python.org/distribute/easy_install.html>`__
        
        .. [psicons-github] `psicons on github <https://github.com/agapow/psicons.core>`__
        
        .. [scons] `Scons <http://www.scons.org>`__
        
        .. [madagascar] `Madagscar and Scons for reproducibility <http://reproducibility.org/wiki/Reproducible_computational_experiments_using_SCons>`__
        
        .. [scons-custom] `Where To Put Your Custom Builders and Tools <http://scons.org/doc/production/HTML/scons-user/x3697.html>`__
        
        .. [sconstools] `Sconstools <http://code.google.com/p/sconstools/>`__
        
        
        =========
        Changelog
        =========
        
        0.2dev (29072011)
        -----------------
        
        - First widely released version
        - Added documentation
        
        
        0.1dev (unreleased)
        -------------------
        
        - Initial release
        
Keywords: reproducibility documentable
Platform: UNKNOWN
Classifier: Programming Language :: Python
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Build Tools
