Metadata-Version: 2.1
Name: datasets-summarizer
Version: 0.1.2
Summary: Datasets Summary Viewer. Enables the exploration of dataset search results in Jupyter Notebooks
Home-page: https://github.com/soniacq/DatasetsVis
Author: Sonia Castelo
Author-email: s.castelo@nyu.edu
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: python-dateutil
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: scikit-learn
Requires-Dist: networkx
Requires-Dist: notebook
Requires-Dist: datamart-profiler (>=0.8)
Requires-Dist: pandas

# DatasetsSummarizer

Datasets Summarizer is compatible with Jupyter Notebooks. Need the x and y values based on any similarity metric to generated the similarity plot between datasets. Supports the metadata format generated by [datamart-profiler](https://docs.auctus.vida-nyu.org/python/datamart-profiler.html#) library to generate the Detail View to explore each dataset.


![System screen](https://github.com/soniacq/DatasetsVis/blob/main/DatasetsSummarizer/imgs/datasets_summarizer_view.png)

( Click one dataset from the list of results to open the Detail View.)

## Demo

Live demo (Google Colab):
- [Dataset results for Taxi query](https://colab.research.google.com/drive/11NT0qudr2di9NXlR-DxZa0buzGreOqxs?usp=sharing)

In Jupyter Notebook:
```Python
import DatasetsSummarizer
data = DatasetsSummarizer.get_taxi_data()
DatasetsSummarizer.plot_datasets_summary(data)
```

## Install

### Option 1: install via pip:
~~~~
pip install datasets-summarizer
~~~~

## Custom similarity metric

Use a subset or add a new entry (`x` and `y` values ) based on a different similatiry metric. For example, here we added `x` and `y` values based on a similarity metric using a modified version of the titles. Note that `modif_title_x` and `modif_title_y` must be included in the dataframe.

```Python
new_similarity_metrics = [{'name': 'Title', 'x': 'title_x', 'y': 'title_y'},
                          {'name': 'ModifiedTitle', 'x': 'modif_title_x', 'y': 'modif_title_y'}
                         ]
```

Then, we can pass this new similarity metrics as a parameter of our visualization
```Python
DatasetsSummarizer.plot_datasets_summary(dataframe, new_similarity_metrics)
```


