Metadata-Version: 2.1
Name: produce
Version: 0.4.3
Summary: Replacement for Make geared towards processing data rather than compiling code
Home-page: https://github.com/texttheater/produce
Author: Kilian Evang
Author-email: kilian.evang@gmail.com
License: MIT
Description: ![Produce logo](https://raw.githubusercontent.com/texttheater/produce/master/img/logo/Produce_Logo_300.png)
        ==============================================
        
        Produce is an incremental build system for the command line, like Make or redo,
        but different: it is scriptable in Python and it supports multiple variable
        parts in file names. This makes it ideal for doing things beyond compiling
        code, like setting up replicable scientific experiments.
        
        <!-- START doctoc generated TOC please keep comment here to allow auto update -->
        <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
        **Table of Contents** 
        
        - [Requirements](#requirements)
        - [Installing Produce](#installing-produce)
        - [Usage](#usage)
        - [Motivation](#motivation)
        - [Build automation: basic requirements](#build-automation-basic-requirements)
        - [Make syntax vs. Produce syntax and a tour of the basic features](#make-syntax-vs-produce-syntax-and-a-tour-of-the-basic-features)
          - [Rules, expansions, escaping and comments](#rules-expansions-escaping-and-comments)
          - [Named and unnamed dependencies](#named-and-unnamed-dependencies)
          - [Multiple wildcards, regular expressions and matching conditions](#multiple-wildcards-regular-expressions-and-matching-conditions)
          - [Special targets vs. special attributes](#special-targets-vs-special-attributes)
          - [Python expressions and global variables](#python-expressions-and-global-variables)
        - [Running Produce](#running-produce)
          - [Status and debugging messages](#status-and-debugging-messages)
          - [Error handling and aborting](#error-handling-and-aborting)
          - [How targets are matched against rules](#how-targets-are-matched-against-rules)
        - [Advanced usage](#advanced-usage)
          - [Whitespace and indentation in values](#whitespace-and-indentation-in-values)
          - [The prelude](#the-prelude)
          - [`shell`: choosing the recipe interpreter](#shell-choosing-the-recipe-interpreter)
          - [Running jobs in parallel](#running-jobs-in-parallel)
          - [Dependency files](#dependency-files)
          - [Rules with multiple outputs](#rules-with-multiple-outputs)
            - [“Sideways” dependencies](#sideways-dependencies)
          - [Producing the outputs for all inputs](#producing-the-outputs-for-all-inputs)
        - [All special attributes at a glance](#all-special-attributes-at-a-glance)
          - [In rules](#in-rules)
          - [In the global section](#in-the-global-section)
        - [Getting in touch](#getting-in-touch)
        - [Acknowledgments](#acknowledgments)
        
        <!-- END doctoc generated TOC please keep comment here to allow auto update -->
        
        Requirements
        ------------
        
        * A Unix-like operating system such as Linux or Mac OS X. Windows Subsystem for
          Linux may also work.
        * Python 3.4 or higher
        * Git (for downloading Produce)
        
        Installing Produce
        ------------------
        
        Install the latest release using pip:
        
            pip3 install produce
        
        Or get the development version by running the following command in a convenient
        location:
        
            git clone https://github.com/texttheater/produce
        
        This will create a directory called `produce`. To update to the latest version
        of Produce later, you can just go into that directory and run:
        
            git pull
        
        The `produce` directory contains an executable Python script also called
        `produce`. This is all you need to run Produce. Just make sure it is in your
        `PATH`, e.g. by copying it to `/usr/local/bin` or by linking to it from your
        `$HOME/bin` directory.
        
        Usage
        -----
        
        When invoked, Produce will first look for a file called `produce.ini` in the
        current working directory. Its format is documented in this document. If you
        want a quick start, have a look at
        [an example project](https://github.com/texttheater/produce/tree/master/doc/samples/tokenization).
        
        You may also have a look at the
        [PyGrunn 2014 slides](https://texttheater.github.io/produce-pygrunn2014)
        for a quick introduction.
        
        Motivation
        ----------
        
        Produce is a build automation tool. Build automation is useful whenever you
        have one or several input files from which one or several output files are
        generated automatically – possibly in multiple steps, so that you have
        intermediate files.
        
        The classic case for this is compiling C programs, where a simple project might
        look like this:
        
        ![example dependency chart for compiling a C program](img/compiling.png)
        
        But build automation is also useful in other areas, such as science. For
        example, in the [Groningen Meaning Bank](http://gmb.let.rug.nl/) project, a
        Natural Language Processing pipeline is combined with corrections from human
        experts to build a collection of texts with linguistic annotations in a
        bootstraping fashion.
        
        In the following simplified setup, processing starts with a text file
        (`en.txt`) which is first part-of-speech-tagged (`en.pos`), then analyzed
        syntactically (`en.syn`) by a parser and finally analyzed semantically
        (`en.sem`). Each step is first carried out automatically by an NLP tool
        (`*.auto`) but then corrections by human annotators (`*.corr`) are applied
        to build the main version of the file which then serves as input to further
        processing. Every time a new human correction is added, parts of the
        pipeline must be re-run:
        
        ![example dependency chart for running an NLP pipeline](img/pipeline.png)
        
        Or take running machine learning experiments: we have a collection of labeled
        data, split into a training portion and testing portions. We have various
        feature sets and want to know which one produces the best model. So we train a
        separate model based on each feature set and on the training data, and generate
        corresponding labeled outputs and evaluation reports based on the development
        test data:
        
        ![example dependency chart for running machine learning experiments](img/ml.png)
        
        A [number](http://kbroman.github.io/minimal_make/)
        [of](http://bost.ocks.org/mike/make/) [articles](http://zmjones.com/make/)
        point out that build automation is an invaluable help in setting up experiments
        in a self-documenting manner, so that they can still be understood, replicated
        and modified months or years later, by you, your colleagues or other
        researchers. Many people use Make for this purpose, and so did I, for a while.
        I specifically liked:
        
        * *The declarative notation.* Every step of the workflow is expressed as a
          _rule_, listing the _target_, its direct dependencies and the command to run
          (the _recipe_). Together with a good file naming scheme, this almost
          eliminates the need for documentation.
        * *The Unix philosophy.* Make is, at its core, a thin wrapper around shell
          scripts. For orchestrating the steps, you use Make, and for executing them,
          you use the full power of shell scripts. Each tool does one thing, and does
          it well. This reliance on shell scripts is something that sets Make apart
          from specialized build tools such as Ant or A-A-P.
        * *The wide availability.* Make is installed by default on almost every Unix
          system, making it ideal for disseminating and exchanging code because the
          Makefile format is widely known and can be run everywhere.
        
        So, if Make has so many advantages, why yet another build automation tool?
        There are two reasons:
        
        * *Make’s syntax.* Although the basic syntax is extremely simple, as soon as
          you want to go a _little bit_ beyond what it offers and use more advanced
          features, things get quite arcane very quickly.
        * *Wildcards are quite limited.* If you want to match on the name of a specific
          target to generate its dependencies dynamically, you can only use one
          wildcard. If your names are a bit more complex than that, you have to resort
          to black magic like Make’s built-in string manipulation functions that don’t
          compare favorably to languages like Python or even Perl, or rely on external
          tools. In either case, your Makefiles become extremely hard to read, bugs
          slip in easily and the simplicity afforded by the declarative paradigm is
          largely lost.
        
        Produce is thus designed as a tool that copies Make’s virtues and improves a
        great deal on its deficiencies by using a still simple, but much more powerful
        syntax for mapping targets to dependencies. Only the core functionality of Make
        is mimicked – advanced functions of Make such as built-in rules specific to
        compiling C programs are not covered. Produce is general-purpose.
        
        Produce is written in Python 3 and scriptable in Python 3. Whenever I write
        Python below, I mean Python 3.
        
        Build automation: basic requirements
        ------------------------------------
        
        Let’s review the basic functionality we expect of a build automation tool:
        
        * Allows you to run multiple steps of a workflow with a single command, in the
          right order.
        * Notices when inputs have changed and runs exactly those steps again that are
          needed to bring the outputs up to speed, no more or less.
        
        In addition, some build automation tools satisfy the following requirement
        (Produce currently doesn’t):
        
        * Intermediate files can be deleted without affecting up-to-dateness – if the
          outputs are newer than the inputs, the workflow will not be re-run.
        
        Make syntax vs. Produce syntax and a tour of the basic features
        ---------------------------------------------------------------
        
        When you run the `produce` command (usually followed by the targets you want
        built), Produce will look for a file in the current directory, called
        `produce.ini` by default. This is the “Producefile”. Let’s introduce
        Producefile syntax by comparing it to Makefile syntax.
        
        ### Rules, expansions, escaping and comments
        
        Here is a Makefile for a tiny C project:
        
            # Compile
            %.o : %.c
            	cc -c $<
            
            # Link
            % : %.o
            	cc -o $@ $<
        
        And here is the corresponding `produce.ini`:
        
            # Compile
            [%{name}.o]
            dep.c = %{name}.c
            recipe = cc -c %{c}
            
            # Link
            [%{name}]
            dep.o = %{name}.o
            recipe = cc -o %{target} %{o}
        
        Easy enough, right? Produce syntax is a dialect of the widely known INI syntax,
        consisting of sections with headings in square brackets, followed by
        attribute-value pairs separated by `=`. In Produce’s case, sections represent
        _rules_, the section headings are _target patterns_ matching _targets_ to
        build, and the attribute-value pairs specify the target’s direct dependencies
        and the recipe to run it.
        
        Dependencies are typically listed each as one attribute of the form `dep.name`
        where `name` stands for a name you give to the dependency – e.g., its file
        type. This way, you can refer to it in the recipe using an _expansion_.
        
        Expansions have the form `%{...}`. In the target pattern, they are used as
        wildcards. When the rule is invoked on a specific target, they match any string
        and assign it to the variable name specified between the curly braces. In
        attribute values, they are used like variables, expanding to the value
        associated with the variable name. Besides target matching, values can also be
        assigned to variable names by attribute-value pairs, as with e.g.
        `dep.c = %{name}.c`. Here, `c` is the variable name; the `dep.` prefix just
        tells Produce that this particular value is also a dependency.
        
        If you need a literal percent sign in some attribute value, you need to escape
        it as `%%`.
        
        The `target` variable is automatically available when the rule is invoked,
        containing the target matched by the target pattern.
        
        Lines starting with `#` are for comments and ignored.
        
        So far, so good – a readable syntax, I hope, but a bit more verbose than that
        of Makefiles. What does this added verbosity buy us? We will see in the next
        subsections.
        
        ### Named and unnamed dependencies
        
        To see why naming dependencies is a good idea, consider the following Makefile
        rule:
        
            out/%.pos : out/%.pos.auto out/%.pos.corr
            	./src/scripts/apply_corrections $< \
                    --corrections out/$*.pos.corr > $@
        
        This could be from the Natural Language Processing project we saw as the second
        example above: the rule is for making the final `pos` file from the
        automatically generated `pos.auto` file and the `pos.corr` file with manual
        corrections, thus it has two direct dependencies, specified on the first line.
        The recipe refers to the first dependency using the shorthand `$<`, but there
        is no such shorthand for other dependencies. So we have to type out the second
        dependency again in the recipe, taking care to replace the wildcard `%` with
        the magic variable `$*`. This is ugly because it violates the golden principle
        “Don’t repeat yourself!” If we write something twice in a Makefile, not only is
        it more work to type, but also if we want to change it later, we have to change
        it in two places, and there’s a good chance we’ll forget that.
        
        Produce’s named dependencies avoid this problem: once specified, you can refer
        to every dependency using its name. Here is the Produce rule corresponding to
        the above Makefile rule:
        
            [out/%{name}.pos]
            dep.auto = %{name}.pos.auto
            dep.corr = %{name}.pos.corr
            recipe = ./src/scripts/apply_corrections %{auto} %{corr} > %{target}
        
        Note that you don’t _have_ to name dependencies. Sometimes you don’t need to
        refer back to them. Here is an example rule that compiles a LaTeX document:
        
            [%{name}.pdf]
            deps = %{name}.tex bibliography.bib
            recipe =
            	pdflatex %{name}
            	bibtex %{name}
            	pdflatex %{name}
            	pdflatex %{name}
        
        The TeX tools are smart enough to fill in the file name extension if we just
        give them the basename that we got by matching the target. In such cases, it
        can be more convenient not to name the dependencies and list them all on one
        line. This is what the `deps` attribute is for. It is parsed using Python’s
        [`shlex.split`](https://docs.python.org/3/library/shlex.html?highlight=shlex#shlex.split)
        function – consult the Python documentation for escaping rules and such. You
        can also mix `dep.*` attributes and `deps` in one rule.
        
        Note that, as in many INI dialects, attribute values (here: the recipe) can
        span multiple lines as long as each line after the first is indented. See
        [Whitespace and indentation in values](#whitespace-and-indentation-in-values)
        below for details.
        
        Note also that dependency lists can also be generated dynamically – see the
        section on [dependency files](#dependency-files) below.
        
        ### Multiple wildcards, regular expressions and matching conditions
        
        The ability to use more than one wildcard in target patterns is Produce’s
        killer feature because not many other build automations tools offer it.
        The only one I know of so far is [plmake](https://github.com/cmungall/plmake).
        Rake and others do offer full regular expressions which are strictly more
        powerful but not as easy to read. Don’t worry, Produce supports them too and
        more, we will come to that. But first consider the following Produce rule,
        which might stem from the third example project we saw in the introduction,
        the machine learning one:
        
            [out/%{corpus}.%{portion}.%{fset}.labeled]
            dep.model = out/%{corpus}.train.%{fset}.model
            dep.input = out/%{corpus}.%{portion}.feat
            recipe = wapiti label -m %{model} %{input} > %{target}
        
        Labeled output files here follow a certain naming convention: four parts,
        separated by periods. The first one specifies the data collection (e.g. a
        linguistic corpus), the second one the portion of the data that is
        automatically labeled in this step (either the development portion or the test
        portion), the third one specifies the feature set used and the fourth one is
        the extension `labeled`. For each of the three first parts, we use a wildcard
        to match it. We can then freely use these three wildcards to specify the
        dependencies: the model we use for labelling depends on the corpus and on the
        feature set but not on the portion to label: the portion used for training the
        model is always the training portion. The input to labelling is a file
        containing the data portion to label, together with the extracted features. We
        assume that this file always contains all features we can extract even if we’re
        not going to use them in a particular model, so this dependency does not depend
        on the feature set.
        
        A Makefile rule to achieve something similar would look something like this:
        
            .SECONDEXPANSION:
            out/%.labeled : out/$$(subst test,train,$$(subst dev,train,$$*)).model \
                            out/$$(basename $$*).feat
                    wapiti label -m $< out/$(basename $*).feat > $@
        
        If you are like me, this is orders of magnitude less readable than the Produce
        version. Getting a Makefile rule like this to function properly will certainly
        make you feel smart, but hopefully also feel miserable about the brain cycles
        wasted getting your head around the bizarre syntax, the double dollars and the
        second expansion.
        
        A wildcard will match _anything_. If you need more control about which targets
        are matched, you can use a
        [Python regular expression](https://docs.python.org/3/library/re.html?highlight=re#module-re)
        between slashes as the target pattern. For example, if we want to make sure
        that our rule only matches targets where the second part of the filename is
        either `dev` or `test`, we could do it like this:
        
            [/out/(?P<corpus>.*)\.(?P<portion>dev|test)\.(?P<fset>.*)\.labeled/]
            dep.model = out/%{corpus}.train.%{fset}.model
            dep.input = out/%{corpus}.%{portion}.feat
            recipe = wapiti label -m %{model} %{input} > %{target}
        
        The regular expression in this rule’s header is almost precisely what the above
        header with three wildcards is translated to by Produce internally, with the
        difference that the subexpression matching the second part is now `dev|test`
        rather than `.*`. We are using a little-known feature of regular expressions
        here, namely the `(?P<...>)` syntax that allows us to assign names to
        subexpressions by which you can refer to the matched part later.
        
        Note the slashes at the beginning and end are just a signal to Produce to
        interpret what is in-between as a regular expressions. You do not have to
        escape slashes within your regular expression.
        
        While regular expressions are powerful, they make your Producefile less
        readable. A better way to write the above rule is by sticking to ordinary
        wildcards and using a separate _matching condition_ to check for `dev|test`:
        
            [out/%{corpus}.%{portion}.%{fset}.labeled]
            cond = %{portion in ('dev', 'test')}
            dep.model = out/%{corpus}.train.%{fset}.model
            dep.input = out/%{corpus}.%{portion}.feat
            recipe = wapiti label -m %{model} %{input} > %{target}
        
        A matching condition is specified as the `cond` attribute. We can use any
        Python expression. It is evaluated only if the target pattern matches the
        requested target. If it evaluates to a “truthy” value, the rule matches and
        the recipe is executed. If it evaluates to a “falsy” value, the rule does
        not match, and Produce moves on, trying to match the next rule in the
        Producefile.
        
        Note that the Python expression is given as an expansion. At this point we
        should explain a few fine points:
        
        1. Whenever we used expansions so far, the variable names inside were actually
           Python expressions, albeit of a simple kind: single variable names. But as
           we see now, we can use arbitrary Python expressions. Expansions used as
           wildcards in the target pattern are an exception, of course: they can only
           consist of a single variable name.
        2. The variables we use in rules are actually Python variables.
        3. Attribute values are always strings, so if a Python expression is used to
           generate (part of) an attribute value, not the value of the expression
           itself is used but whatever its `__str__` method returns. Thus, in the
           above rule, the value of the `cond` variable is not `True` or `False`, but
           `'True'` or `'False'`. In order to interpret the value as a Boolean, Produce
           calls
           [ast.literal\_eval](https://docs.python.org/3/library/ast.html?highlight=literal_eval#ast.literal_eval)
           on the string. So if the string contains anything other than a literal
           Python expression, this is an error.
        
        As an exception to what we said about `__str__`, if an expansion evaluates to
        something that is not a string but has an `__iter__` method, it will be treated
        as a sequence and rendered as a white-space separated list, the elements
        properly shell-quoted and escaped. Note also that parentheses are automatically
        added around an expansion so it is very convenient to use generator expressions
        for expansions. All of this is illustrated in the following rule:
        
            [Whole.txt]
            deps = %{'Part {}.txt'.format(i) for i in range(4)}
            recipe = cat %{deps} > %{target}
        
        ### Special targets vs. special attributes
        
        Besides not naming all dependencies, there is another reason why Make’s syntax
        is too simple for its own good. When some rule needs to have a special
        property, Make usually requires a “special target” that syntactically looks
        like a target but is actually a declaration and has no obvious visual
        connection to the rule(s) it applies to. We have already seen an example of the
        dreaded `.SECONDEXPANSION`. Another common special target is `.PHONY`, marking
        targets that are just jobs to be run, without producing an output file. For
        example:
        
            .PHONY: clean
            clean:
            	rm *.o temp
        
        It would be easier and more logical if the “phoniness” was declared as part of
        the rule rather than some external declaration. This is was Produce does. The
        Produce equivalent of declaring targets phony is to set the `type` attribute of
        their rule to `task` (the default is `file`). With this the rule above is
        written as follows:
        
            [vacuum]
            type = task
            recipe = rm *.o temp
        
        Note that since it is ungrammatical to “produce a clean”, I invented a naming
        convention according to which the task that cleans up your project directory is
        called `vacuum` because it produces a vacuum. It’s silly, I know.
        
        For other special attributes besides `task`, see [All special attributes at a
        glance](#all-special-attributes-at-a-glance) below.
        
        ### Python expressions and global variables
        
        As we have already seen, Produce’s expansions can contain arbitrary Python
        expressions. This is not only useful for specifying Boolean matching
        conditions, but also for string manipulation, in particular for playing with
        dependencies. This is a pain in Make, because Make implements its own string
        manipulation language which from today’s perspective (since we have Python)
        not only reinvents the wheel, but reinvents it poorly, with a rather dangerous
        syntax. Consider the following (contrived) example from the GNU Make manual
        where you have a list of dependencies in a global variable and filter them to
        retain only those ending in `.c` or `.s`:
        
            sources := foo.c bar.c baz.s ugh.h
            foo: $(sources)
            	cc $(filter %.c %.s,$(sources)) -o foo
        
        With Produce, we can just hand the string manipulation to Python, a language
        we already know and (hopefully) like:
        
            []
            sources = foo.c bar.c baz.s ugh.h
        
            [foo]
            deps = %{sources}
            recipe = cc %{f for f in sources.split() \
            		if f.endswith('.c') or f.endswith('.s')}
        
        This example also introduces the _global section_, a section headed by `[]`,
        thus named with the empty string. The attributes here define global variables
        accessible from all rules. The global section may only appear once and only at
        the beginning of a Producefile.
        
        Running Produce
        ---------------
        
        Produce is invoked from the command line by the command `produce`, usually
        followed by the target(s) to produce. These can be omitted if the Producefile
        specifies one or more default targets. By default, Produce will look for
        `produce.ini` in the current working directory and complain if it does not
        exist.
        
        A number of options can be used to control Produce’s behavior, as listed in its
        help message:
        
            usage: produce [-h] [-B | -b] [-d] [-f FILE] [-j JOBS] [-n] [-u FILE]
                           [target [target ...]]
            
            positional arguments:
              target                The target(s) to produce - if omitted, default target
                                    from Producefile is used
            
            optional arguments:
              -h, --help            show this help message and exit
              -B, --always-build    Unconditionally build all specified targets and their
                                    dependencies
              -b, --always-build-specified
                                    Unconditionally build all specified targets, but treat
                                    their dependencies normally (only build if out of
                                    date)
              -d, --debug           Print debugging information. Give this option multiple
                                    times for more information.
              -f FILE, --file FILE  Use FILE as a Producefile
              -j JOBS, --jobs JOBS  Specifies the number of jobs (recipes) to run
                                    simultaneously
              -n, --dry-run         Print status messages, but do not run recipes
              -u FILE, --pretend-up-to-date FILE
                                    Do not rebuild FILE or its dependencies (unless they
                                    are also depended on by other targets) even if out of
                                    date, but make sure that future invocations of Produce
                                    will still treat them as out of date by increasing the
                                    modification times of their changed dependencies as
                                    necessary.
        
        ### Status and debugging messages
        
        When it starts (re)building a target, Produce will tell you so with a status
        message in green where the target is indented according to how deep in the
        dependency graph it is. On successful completion of a target, a similar message
        with `complete` is printed. If an error occurs while a target is being built,
        Produce instead prints an `incomplete` message in red. The latter indicates
        controlled shutdown: the recipe has been killed and incomplete outputs have
        been renamed (see below). If you see a `(re)building` message but no
        `(in)complete` message for some target, something went really wrong – this
        should never happen. In that case, better check for yourself if any incomplete
        outputs are still hanging around.
        
        Giving the `-d`/`--debug` option one, two or three times will cause Produce to
        additionally flood your terminal with a few, some more or lots of messages that
        may be helpful for debugging.
        
        ### Error handling and aborting
        
        When a recipe fails, i.e. its interpreter returns an exit status other than 0,
        the corresponding target file (if any) may already have been created or
        touched, potentially leading the next invocation of Produce to believe that it
        is up to date, even though it probably doesn’t have the correct contents. Such
        inconsistencies can lead to users tearing their hair out. In order to avoid
        this, Produce will, when a recipe fails, make sure that the target file does
        not stay there. It could just delete it, but that might be unwise because the
        user might want to inspect the output file of the erroneous recipe for
        debugging. So, Produce renames the target file by appending a `~` to the
        filename (a common naming convention for short-lived “backups”).
        
        If multiple recipes are running in parallel and one fails, Produce will kill
        all of them, do the renaming and abort immediately.
        
        The same is true if Produce receives an interrupt signal. So you can safely
        abort a production process in your terminal by pressing `Ctrl+C`.
        
        ### How targets are matched against rules
        
        When producing a target, either because asked to by the user or because the
        target is required by another one, Produce will always work through the
        Producefile from top to bottom and use the first rule that matches the target.
        A rule matches a target if both the target pattern matches and the matching
        condition (if any) subsequently evaluates to true.
        
        Note that unlike most INI dialects, Produce allows for multiple sections with
        the same heading. It makes sense to have the same target pattern multiple times
        when there are matching conditions to make subdistinctions.
        
        If no rule matches a target, Produce aborts with an error message.
        
        Advanced usage
        --------------
        
        ### Whitespace and indentation in values
        
        An attribute value can span multiple lines as long as each line after the first
        is indented with some whitespace. The recommended indentation is either one tab
        or four spaces. If you make use of this, it is recommended to leave the first
        line (after the attribute name and the `=`) blank so all lines of the value are
        consistently aligned.
        
        The _second_ line of a value (i.e. the first indented one) determines the kind
        and amount of whitespace expected to start each subsequent line. This
        whitespace will _not_ be part of the attribute value. _Additional_ whitespace
        after the initial amount is, however, preserved. This is important e.g. for
        Python code and the reason why Produce is no longer using Python’s
        `configparser` module.
        
        All whitespace at the very beginning and at the very end of an attribute value
        will be stripped away.
        
        For example, in the following rule, the recipe spans two lines:
        
            [paper.pdf]
            dep.tex = paper.tex
            dep.bib = paper.bib
            recipe =
                pdflatex paper
                pdflatex paper
        
        ### The prelude
        
        If you use Python expressions in your recipes, you will often need to import
        Python modules or define functions to use in these expressions. You can do this
        by putting the imports, function definitions and other Python code into the
        special `prelude` attribute in the [global
        section](#python-expressions-and-global-variables). For example, put this at
        the beginning of your Producefile to import the `errno`, `glob` and `os`
        modules and define a helper function for creating directories.
        
            []
            prelude =
                import errno
                import glob
                import os
        
                def makedirs(path):
                    try:
                        os.makedirs(path)
                    except OSError, error:
                        if error.errno != errno.EEXIST:
                            raise error
        
        ### `shell`: choosing the recipe interpreter
        
        By default, recipes are (after doing expansions) handed to the `bash` command
        for execution. If you would rather write your recipe in `zsh`, `perl`, `python`
        or any other language, that’s no problem. Just specify the interpreter in the
        `shell` attribute of the rule.
        
        ### Running jobs in parallel
        
        Use the `-j JOBS` command line option to specify the number of jobs Produce
        runs in parallel. By default, Produce reserves one job slot for each recipe.
        For recipes that run multiple parallel jobs themselves, it is recommended to
        specify the number of jobs via the `jobs` attribute. Produce will then reserve
        that many job slots for this recipe (but no more than `JOBS`).
        
        Here is an example where the target `b` is created by a recipe that runs in
        parallel:
        
            [a]
            deps = b c d
            recipe = touch %{target}
        
            [b]
            dep.input = input.txt
            dep.my_script = ./my_script.sh
            jobs = 8
            recipe = parallel --gnu -n %{jobs} -k %{my_script} %{input} > %{target}
        
            [c]
            dep.my_script = ./my_script.sh
            recipe = %{my_script} c > %{target}
        
            [d]
            dep.my_script = ./my_script.sh
            recipe = %{my_script} d > %{target}
        
        Running `produce -j 8 a` will run up to 8 jobs in parallel. In this example,
        the recipes for `c` and `d` may run in parallel. The recipe for `b` will not
        run in parallel with any other recipe because it uses all 8 job slots.
        
        ### Dependency files
        
        Sometimes the question which other files a file depends on is more complex and
        may change frequently over the lifetime of a project, e.g. in the cases of
        source files that import other header files, modules etc. In such cases, it
        would be nice to have the dependencies automatically listed by a script.
        Produce supports this via the `depfile` attribute in rules: here, you can
        specify the name of a _dependency file_, a text file that contains
        dependencies, one per line. Produce will read them and add them to the list of
        dependencies for the matched target. Also, Produce will try to produce the
        dependency file (i.e. make it up to date) _prior_ to reading it. So you can
        write another rule that tells Produce how to generate each dependency file, and
        the rest is automatic.
        
        For example, the following rule might be used to generate a dependency file
        listing the source file and header files required for compiling a C object.
        This example uses `.d` as the extension for dependency files. It runs `cc -MM`
        to use the C compiler’s dependency discovery feature and then some shell magic
        to convert the output from a Makefile rule into a simple dependency list:
        
            [%{name}.d]
            dep.c = %{name}.c
            recipe =
                cc -MM -I. %{name} | sed -e 's/.*: //' | sed -e 's/^ *//' | \
                perl -pe 's/ (\\\n)?/\n/g' > %{target}
        
        The following rule could then be used to create the actual object file. The
        `depfile` attribute makes sure that whenever an included header file changes,
        the object file will be rebuilt:
        
            [%{name}.o]
            dep.src = %{name}.c
            depfile = %{name}.d
            recipe =
                cc -c -o %{target} %{src}
        
        Note that the `.c` file will end up in the dependency list twice, once from
        `dep.src` and once from the dependency file. This does not matter, Produce is
        smart enough not to do the same thing twice.
        
        Warning: dependency files are made up to date even in dry-run mode!
        
        ### Rules with multiple outputs
        
        Sometimes you have a command that creates multiple files at once because their
        creation is inherently linked to the same process – it wouldn’t make sense to
        try and create them in neatly separated steps. Splitting a file up into
        multiple chunks is such a case:
        
            split -n 4 data.txt
        
        This command creates four files called `xaa`, `xab`, `xac` and `xad`. It gets
        complicated when these output files individually are dependencies of further
        targets, as in this example:
        
            [split_and_zip]
            type = task
            deps = xaa.zip xab.zip xac.zip xad.zip
        
            [%{name}.zip]
            dep.file = %{name}
            recipe = zip %{target} %{file}
        
            [%{chunk}]
            dep.txt = data.txt
            recipe = split -n 4 %{txt}
        
        If we run the task `split_and_zip`, it will try to create its (indirect)
        dependencies `xaa`, `xab`, `xac` and `xad` independently of each other. Each
        time, the last rule will match, and each time, the exact same recipe will be
        executed. This is unncecessary work, one time would be sufficient because it
        creates all four files in each case. Worse, if we run Produce in parallel,
        multiple instances of the recipe may run in parallel and corrupt the data.
        
        The solution is to explicitly declare which files a rule produces, other than
        the target. The `outputs` attribute serves this purpose. With it, the last rule
        is rewritten as follows:
        
            [%{chunk}]
            outputs = xaa xab xac xad
            dep.txt = data.txt
            recipe = split -n 4 %{txt}
        
        Additionally, it is good style to add a matching condition to prevent that the
        rule accidentally matches something that is not its output:
        
            [%{chunk}]
            outputs = xaa xab xac xad
            cond = %{target in outputs.split()}
            dep.txt = data.txt
            recipe = split -n 4 %{txt}
        
        Instead of a single `outputs` attribute, separate attributes with the `out.`
        prefix can be used, and both styles can also be mixed, similar to
        `dep.`/`deps`. Here is an example of a rule using the `out.` style to declare
        that while producing a `.pdf` file it will also produce an `.aux` file:
        
            [%{name}.pdf]
            dep.tex = %{name}.tex
            out.aux = %{name}.aux
            recipe =
                pdflatex %{tex}
        
        #### “Sideways” dependencies
        
        Suppose there is a target A that has some additional output file B. What if a
        target C wants to declare a dependency on B? For this to work, there must be a
        rule matching B. B, of course, is produced when A is produced. So, effectively,
        in order to produce B, A must be produced. We can express this as a dependency:
        B depends on A. You can write a rule that will tell Produce to produce A when B
        is requested:
        
            [B]
            dep.a = A
        
        (TODO: What if A is up to date but B does not exist?)
        
        Such a rule only serves to “guide” Produce from B to A. It cannot contain its
        own recipe. This would not make the sense as it is the rule for A that creates
        B. If you included a recipe, Produce would complain about a cyclic dependency.
        
        Here is a more concrete example: the rule for `paper.pdf` produces an
        additional output `paper.aux`. Another rule, for `paper.info`, depends on
        `paper.aux`. In order for Produce to be able to satisfy this dependency,
        `paper.aux` is declared as depending on `paper.pdf`.
        
            [paper.info]
            dep.aux = paper.aux
            recipe = cat %{aux} | ./my_tool > %{target}
        
            [paper.aux]
            dep.pdf = paper.pdf
        
            [paper.pdf]
            dep.tex = paper.tex
            outputs = paper.aux
            recipe =
                pdflatex paper
        
        There is one final problem here: after running the recipe for `paper.pdf`, the
        modification time of `paper.pdf` may well be greater than that of `paper.aux`.
        Since we declared `paper.aux` dependent on `paper.pdf`, this means that
        `paper.aux` appears as out of date to Produce even though we just produced it.
        A simple and effective way to prevent this is to include `touch %{outputs}` as 
        the last line of any rule with multiple outputs. The last rule above thus
        becomes:
        
            [paper.pdf]
            dep.tex = paper.tex
            outputs = paper.aux
            recipe =
                pdflatex paper
                touch %{outputs}
        
        ### Producing the outputs for all inputs
        
        Suppose you have a number of input files (say `inputs/input001.txt` to
        `inputs/input100.txt`). Each input can be processed to yield an output file
        (say `models/model001` to `models/model100`) – for example, by the following
        rule:
        
            [models/model%{num}]
            dep.input = inputs/input%{num}.txt
            dep.train = bin/train
            recipe = ./%{train} %{input} %{target}
        
        Now you would like to automatically produce the model for every input that is
        there. You can do this by writing a _task_, i.e., a rule for a target that is
        not a file but is just invoked. The task for the example might look like this:
        
            [all_models]
            type = task
            deps = %{'models/{}'.format(i.replace('input', 'model').replace('.txt, \
                     '') for i in os.listdir('inputs')}
        
        This task does not need a recipe because all it does is pull in all the models
        through its dependencies. The dependencies are specified through an arbitrary
        Python expression, in this case it looks at the inputs directory and returns
        the names of the models corresponding to each input. It uses the `os` module,
        which needs to be imported. So let’s add a global section with a prelude to do
        this. The whole Producefile then looks like this:
        
            []
            prelude =
                import os
        
            [models/model%{num}]
            dep.input = inputs/input%{num}.txt
            dep.train = bin/train
            recipe = ./%{train} %{input} %{target}
        
            [all_models]
            type = task
            deps = %{'models/{}'.format(i.replace('input', 'model').replace('.txt, \
                     '') for i in os.listdir('inputs')}
        
        And to produce all models, all you need to do is tell Produce to produce the
        `all_models` task:
        
            $ produce all_models
        
        ## All special attributes at a glance
        
        For your reference, here are all the rule attributes that currently have a
        special meaning to Produce:
        
        ### In rules
        
        <dl>
            <dt><code>target</code></dt>
            <dd>When a rule matches a target, this variable is always set to that
            target, mainly so you can refer to it in the recipe. It is illegal to set
            the <code>target</code> attribute yourself. Also see
            <a href="#rules-expansions-escaping-and-comments">Rules, expansions, escaping and comments</a>.</dd>
            <dt><code>cond</code></dt>
            <dd>Allows to specify a _matching condition_ in addition to the target
            pattern. Typically it is given as a single expansion with a boolean Python
            expression. It is expanded immediately after a target matches the rule. The
            resulting string must be a Python literal. If “truthy”, the rule matches
            and its expansion/execution continues. If “falsy”, the rule does not match
            the target and Produce proceeds with the next rule, trying to match the
            target. Also see <a href="#multiple-wildcards-regular-expressions-and-matching-conditions">Multiple wildcards, regular expressions and matching conditions</a>.</dd>
            <dt><code>dep.*</code></dt>
            <dd>The asterisk stands for a name chosen by you, which is the actual name
            of the variable the attribute value will be assigned to. The <code>dep.</code> prefix,
            not part of the variable name, tells Produce that this is a dependency,
            i.e. that the target given by the value must be made up to date before the
            recipe of this rule can be run. Also see
            <a href="#named-and-unnamed-dependencies">Named an unnamed depenencies</a>.</dd>
            <dt><code>deps</code></dt>
            <dd>Like <code>dep.*</code>, but allows for specifying multiple unnamed dependencies
            in one attribute value. The format is roughly a space-separated list. For
            details, see
            <a href="https://docs.python.org/3/library/shlex.html?highlight=shlex#shlex.split"><code>shlex.split</code></a>.
            Also see <a href="#named-and-unnamed-dependencies">Named an unnamed depenencies</a>.</dd>
            <dt><code>depfile</code></dt>
            <dd>Another way to specify (additional) dependencies: the name of a file
            from which dependencies are read, one per line. Additionally, Produce will
            try to make that file up to date prior to reading it. Also see
            <a href="#dependency-files">Dependency files</a>.</dd>
            <dt><code>type</code></dt>
            <dd>Is either <code>file</code> (default) or <code>task</code>. If <code>file</code>, the target is supposed
            to be a file that the recipe creates/updates if it runs successfully. If
            <code>task</code>, the target is an arbitrary name given to some task that the recipe
            executes. Crucially, task-type targets are always assumed to be out of
            date, regardless of the possible existence and age of a file with the same
            name. Also see
            <a href="#special-targets-vs-special-attributes">Special targets vs. special attributes</a></dd>
            <dt><code>recipe</code></dt>
            <dd>The command(s) to run to build the target, typically a single shell
            command or a short shell script. Unlike Make, each line is not run in
            isolation, but the whole script is passed to the interpreter as a whole,
            after doing expansions. This way, you can e.g. define a shell variable
            on one line and use it on the next. Also see
            <a href="#rules-expansions-escaping-and-comments">Rules, expansions, escaping and comments</a>.</dd>
            <dt><code>shell</code></dt>
            <dd>See <a href="#shell-choosing-the-recipe-interpreter"><code>shell</code>: choosing the recipe interpreter</a></dd>
            <dt><code>out.*</code></dt>
            <dd>See <a href="#rules-with-multiple-outputs">Rules with multiple outputs</a></dd>
            <dt><code>outputs</code></dt>
            <dd>See <a href="#rules-with-multiple-outputs">Rules with multiple outputs</a></dd>
            <dt><code>jobs</code></dt>
            <dd>See <a href="#running-jobs-in-parallel">Running jobs in parallel</a></dd>
        </dl>
        
        ### In the global section
        
        <dl>
            <dt><code>default</code></dt>
            <dd>A list
            (parsed by <a href="https://docs.python.org/3/library/shlex.html?highlight=shlex#shlex.split"><code>shlex.split</code></a>)
            of default targets that are produced if the user does not specify any
            targets when calling Produce.</dd>
            <dt><code>prelude</code></dt>
            <dd>See <a href="#the-prelude">The prelude</a></dd>
        </dl>
        
        Getting in touch
        ----------------
        
        Produce is being developed by Kilian Evang <%{firstname}@%{lastname}.name>.
        I would love to hear from you if you find it useful, if you have questions, bug
        reports or feature requests.
        
        Acknowledgments
        ---------------
        
        The Produce logo was designed by [Valerio Basile](https://valeriobasile.github.io).
        
Keywords: make,builder,automation
Platform: UNKNOWN
Description-Content-Type: text/markdown
