Metadata-Version: 2.1
Name: powergrasp
Version: 0.8.10
Summary: compress graphs with answer-set-programming
Home-page: https://github.com/aluriak/powergrasp
Author: Lucas Bourneuf
Author-email: lucas.bourneuf@inria.fr
License: GPL
Keywords: graph,Answer Set Programming
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: ASP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: bubbletools (>=0.6.1)
Requires-Dist: clyngor (>=0.3.2)
Requires-Dist: networkx (==2.1)
Requires-Dist: phasme (==0.0.11)
Requires-Dist: pytest (==3.5.0)

# PowerGrASP
Graph compression.

Note that this is a full reimplementation of PowerGrASP,
taking advantage of ASP and Python lifting and simplifications.
For the version published in 2017, see [this repository](https://github.com/aluriak/PowerGrASP-1).


## CLI

    python -m powergrasp mygraph.gml -o compressed.bbl

### help !

    python -m powergrasp --help

## API

```python
import powergrasp
with open('compressed.bbl', 'w') as fd:
    for line in powergrasp.compress_by_cc('mygraph.gml'):
        fd.write(line + '\n')
```

### help !
Sorry, no technical doc for the moment.


## Configuration
PowerGrASP has some [configuration values](powergrasp/constants.py),
that can be overwritten by a `powergrasp.cfg` file placed in the working directory.

The configuration may be printed in stdout using:

    python -m powergrasp --config

The config file may be either in ini or json format, as shown in [`test/powergrasp.oneshot.cfg`](test/powergrasp.oneshot.cfg) and [`test/powergrasp.manyoptions.cfg`](test/powergrasp.manyoptions.cfg).

Configuration allow user to define how much text will be outputed by powergrasp,
and to tune the core compression and related optimizations.

See complete list in the [options section](#Options).


## installation

    pip install powergrasp

On random error, try adding `--no-cache-dir` at the end of the command.


## TODO
- [x] unit tests
- [x] CLI
- [x] run on big graph
- [x] search multiple motif in the same run (speed up observed on bio graph)
- [x] timers for solving, full run, output writing
- [ ] timers for extraction
- [ ] technical documentation
- [ ] search and specific compression of trees
- [ ] search and specific compression of triangle-free graphs
- [x] search multiple motif in the same run (speed up observed on bio graph)


## Changelog

- 8.10
    - option [parallel cc compression](#parallel-cc-compression) to enable parallel compression of connected components
    - option [bubble embeds cc](#bubble-embeds-cc), to put each cc in a dedicated powernode
    - option [bubble with simple edges](#bubble-with-simple-edges), to keep or discard the simple edges from output
    - statistics on time and compression metrics are computed by cc and for all graph. See new options.
    - option [cc statistic file](#cc-statistic-file), to retrieve stats and metrics for connected components.
    - option [global statistics](#global-statistics), to compute statistics/metrics over all connected components.
    - option [bubble with statistics](#bubble-with-statistics), to include stats and metrics in output bubble file
    - [motif type order](#motif-type-order) option, allowing user to tune the order in which motifs are searched.
    - [parallel motif search](#parallel-motif-search) option, making motifs search in different threads, and the overhaul compression much slower.
    - CLI: --config flag to print the full configuration in stdout
    - handling spaces instead of _ in fields names for INI format
    - do not filter out nodes alone in their connected component (cf [keep-single-nodes option](#keep-single-nodes))
    - perf gain: the first motif to reach the best score is compressed, instead of the last
    - improve handling of statistic files, that now contains cc idx and motif bounds
    - expose `__version__` attribute in the package
    - add a list of available options in the README
    - graph filtering: optimization leading to general performance gain when enabled
    - improvements on {low,upp}erbounds computations
- 8.9
    - many bugfixes
    - CLINGO_OPTIONS can be a dict, mapping motif name to arbitrary parameters
- 8.8
    - bugfix: lowerbound can't go under 2 (3 for cliques)
    - clique search: lowerbound is initially computed as the maximal degree of node of clustering coefficient equals to 1
    - use new versions of clyngor and bubbletools
- 8.7
    - raise error when an invalid key is found in config file
    - StarSearch is dedicated to find stars, simplifying BicliqueSearch's job (enabled by default, user may use Biclique instead of both Star and NonStarBiclique)
    - specify a prefix for all (power) nodes with OUTPUT_NODE_PREFIX config field
    - arbitrary options for clingo using CLINGO_OPTIONS config field
    - bugfix on string options given in ini file, when unecessary quotes are kept
- 8.6
    - bugfixes for INI format
- 8.5
    - config may now be given in INI format
    - improve logging
    - config allow optimization target between memory or CPU (currently define a constraint implementation in ASP)
    - bugfix about edges yielded by ASP and multithreading parameter
    - no statistics file to write by default
    - timer for output writing
    - config allow user to define the number of CPU for clingo to use
    - simplify ASP code by removing useless arg in block/4 atoms
- 8.4
    - replace ASP code constraint to achieve much more efficient grounding
    - configuration allow for improved lowerbound computation of bicliques
    - generated bubble allow client to give heading comments
    - handle keyboard interruption during search with grace
    - implement timers and statistics recording
    - enable multishot motif search by default
    - various bugfixes

## Options
Description of all powergrasp options.
All values example are given for INI format.

### test integrity
Run some integrity tests during runtime to ensure that compression is working well.
May slow the compression a lot, and is mainly a debugging tool.

Default value:

    test_integrity = true

### show story
Print in stdout the main steps of compression.
Default value:

    show_story = true

### show motif handling
Print in stdout the motif transformation.
Default value:

    show_motif_handling = false

### timers
Maintain and print in stdout some timers.
Default value:

    timers = true

### statistics file
A file in which some statistics will be written in CSV format, giving among others size of compressed motifs, and their compression times (if *timers* option is enabled).
Default value:

    statistics_file = None

Example value:

    statistics_file = statistics.csv

### cc statistic file
Filename in which the statistics about each connected component will be written.
Data will contain one row for each connected component, with the following data:

- connected component index
- the convertion rate
- the edge reduction
- the compression rate
- time needed to compress the connected component (if TIMERS is enabled)

Default value stands for *no file*:

    cc_statistic_file = None

Example value:

    cc_statistic_file = statistics-cc.csv

### global statistics
When enabled, statistics and metrics computation on the overall graph will be performed at the end of compression.
Default value:

    global_statistics = True

### bubble with statistics
When enabled, output bubble will contain statistics about each connected component
and the overall graph (if global_statistics is enabled).
Default value:

    bubble_with_statistics = True

### bubble for each step
Generate and save a bubble representation of the graph at each step.
Mainly used for debugging.
Default value:

    bubble_for_each_step = false

### output node prefix
Prefix to add to all (power)nodes names in output.
Default value:

    output_node_prefix = ''

### show debug
Show full trace of the compression. Useful for debugging.
Default value:

    show_debug = False

### covered edges from ASP
Recover covered edges from ASP. If falsy, will ask motif searcher to compute the edges, which may be quicker.
Default value:

    covered_edges_from_asp = False

### bubble with nodes
Nodes and sets are optional in the output bubble.
Default value:

    bubble_with_nodes = True
    bubble_with_sets = True

### bubble poweredge factor
Edges in bubble are associated to a factor.
Default value:

    bubble_poweredge_factor = '1.0'
    bubble_edge_factor = '1.0'

### bubble embeds cc
If enabled, put each connected component in a dedicated powernode.
Default value:

    bubble_embeds_cc = no


### bubble simplify quotes
When possible, delete the quotes around identifiers in the bubble. May lead to node name collision.
Default value:

    bubble_simplify_quotes = True

### bubble with simple edges
If disabled, will discard simple (i.e. non-power) edges of output.
Default value:
    bubble_with_simple_edges = True

### config file
Load options found in given config file, if it exists.
Default value:

    config_file = 'powergrasp.cfg'

### multishot motif search
Search for multiple motif in a single search.
Accelerate the solving for graph with lots of equivalent motifs. It is generally a good option to enable.
Default value:

    multishot_motif_search = True

### biclique lowerbound maxnei
Optimization on biclique lowerbound computation. Can be costly. Deactivate with 2. With value at n, up to n neighbors are considered.
Default value:

    biclique_lowerbound_maxnei = 2

### clingo options
Arbitrary parameters to give to clingo (note that some, like multithreading or optmode, may already be set by other options).
Default value:

    clingo_options = {}

Give a particular configuration for bicliques search:

    clingo_options = {'no-star-biclique': '--configuration=handy'}

### clingo multithreading
Number of CPU available to clingo, or 0 for autodetect number of CPU.
Default value:

    clingo_multithreading = 1

Use as many thread there are available CPU:

    clingo_multithreading = 0

Specify a competing search between 4 threads:

    clingo_multithreading = '4,compete'

### parallel cc compression
To use to compress connected components in different processes.
Default value (optimized in memory):

    parallel_cc_compression = 1

Compress with 4 processes :

    parallel_cc_compression = 4

Compress with one process for each connected component :

    parallel_cc_compression = 0

### use star motif
Two different motifs for stars and bicliques, so the search space for bicliques is smaller. Yields good performance improvements on big graphs.
Default value:

    use_star_motif = True

### optimize for memory
When possible, prefer memory over CPU.
Default value:

    optimize_for_memory = False

### graph filtering
Ignore edges dynamically determined as impossible to compress.
Default value:

    graph_filtering = True

### keep single nodes
If a node is found to be alone in its connected component, it will be kept nonetheless.
Default value:

    keep_single_nodes = True

### parallel motif search
Use multithreading to search for all motifs in the same time, instead of sequentially.
Interestingly, this yields very poor results, and it's even worse with multiprocessing.

    parallel_motif_search = False

### motif type order
Define in which order the motifs are searched, e.g. cliques, then bicliques then stars.

    motif_type_order = star,clique,non-star-biclique,biclique

Instead of expliciting the exact motif names and order, it is also possible to specify a bound-based order:

    motif_type_order = greatest-upperbound-first

Or any variation with `worst`, `lowerbound` and `last`.


