Metadata-Version: 2.1
Name: hawks
Version: 0.0.2
Summary: A package for generating synthetic clusters, with parameters to customize different aspects of the complexity of the cluster structure
Home-page: https://github.com/sea-shunned/hawks
Author: Cameron Shand
Author-email: cameron.shand@manchester.ac.uk
License: MIT License
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: deap (==1.2.2)
Requires-Dist: matplotlib (>=2.1)
Requires-Dist: numpy (>=1.15)
Requires-Dist: pandas (>=0.23)
Requires-Dist: scikit-learn (>=0.20)
Requires-Dist: scipy (>=1.1)
Requires-Dist: tqdm (>=4.15)

# HAWKS Data Generator

HAWKS is a tool for generating controllably difficult synthetic data, used primarily for clustering. This repo is associated with the following paper:

1. Shand, C, Allmendinger, R, Handl, J, Webb, A & Keane, J 2019, Evolving Controllably Difficult Datasets for Clustering. in Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO '19) . The Genetic and Evolutionary Computation Conference, Prague, Czech Republic, 13/07/19. [https://doi.org/10.1145/3321707.3321761](https://doi.org/10.1145/3321707.3321761)

The academic/technical details can be found there. What follows here is a practical guide to using this tool to generate synthetic data.

If you use this tool to generate data that forms part of a paper, please consider either linking to this work or citing the paper above.

## Installation
Installation is available through pip by:
```
pip install hawks
```
or by cloning this repo (and installing locally using `pip install .`). 

## Running HAWKS
Like any other package, you need to `import hawks` in order to use it. The parameters of hawks are configured via a config file system. Details of the parameters are found in the [user guide](https://github.com/sea-shunned/hawks/blob/master/user_guide.md). For any parameters that are not specified, default values will be used (as defined in `hawks/defaults.json`).

The example below illustrates how to run `hawks`. Either a dictionary or a path to a JSON config can be provided to override any of the default values.

```python
from pathlib import Path

import numpy as np
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score
import hawks

# Fix the seed number
config = {
    "hawks": {
        "seed_num": 42
    }
}
# Any missing parameters will take the default seen in configs/defaults.json
generator = hawks.create_generator(config)
# Run the generator
generator.run()
# Get the best dataset found and it's labels
data, labels = generator.get_best_dataset()
# # Plot the best dataset to see how it looks
# generator.plot_best_indiv()
# Run KMeans on the data
km = KMeans(
    n_clusters=len(np.unique(labels)), random_state=42
).fit(data)
# Get the Adjusted Rand Index for KMeans on the data
ari = adjusted_rand_score(labels, km.labels_)
print(f"ARI: {ari}")
```

## User Guide
For a more detailed explanation of the parameters and how to use HAWKS, please read the [user guide](https://github.com/sea-shunned/hawks/blob/master/user_guide.md).

## Issues
As this work is still in development, plain sailing is not guaranteed. If you encounter an issue, first ensure that `hawks` is running as intended by navigating to the tests directory, and running `python tests.py`. If any test fails, please add details of this alongside your original problem to an issue on the [github repo]().

## Feature Requests


