Metadata-Version: 2.1
Name: vizdataquality
Version: 1.0.0
Summary: Visualize data quality
Home-page: https://github.com/royruddle/vizdataquality
License: Apache-2.0
Author: Roy Ruddle
Author-email: R.A.Ruddle@leeds.ac.uk
Requires-Python: >=3.9,<3.12
Classifier: Framework :: Jupyter
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Visualization
Provides-Extra: all
Provides-Extra: doc
Provides-Extra: notebooks
Provides-Extra: test
Requires-Dist: Sphinx (>=7.0,<8.0) ; extra == "doc" or extra == "all"
Requires-Dist: chardet (>=5.2.0,<6.0.0)
Requires-Dist: matplotlib (>=3.7.2,<4.0.0) ; extra == "notebooks" or extra == "doc" or extra == "all"
Requires-Dist: myst-parser ; extra == "doc" or extra == "all"
Requires-Dist: notebook (>=6.4,<7.0) ; extra == "notebooks" or extra == "all"
Requires-Dist: numpy (>=1.24,<2.0)
Requires-Dist: pandas (>=2,<3)
Requires-Dist: pydata-sphinx-theme (>=0.14,<0.15) ; extra == "doc" or extra == "all"
Requires-Dist: pytest (>=6.2,<7.0) ; extra == "test" or extra == "all"
Requires-Dist: tomli (>=2.0,<3.0)
Project-URL: Documentation, https://vizdataquality.readthedocs.io/en/latest/
Project-URL: Repository, https://github.com/royruddle/vizdataquality
Description-Content-Type: text/markdown

[![Python Package](https://github.com/royruddle/vizdataquality/actions/workflows/main.yml/badge.svg)](https://github.com/royruddle/vizdataquality/actions/workflows/main.yml)
# vizdataquality
This is a Python package for visualizing data quality, and includes this six-step workflow:
1. Look at your data (is anything obviously wrong?)
2. Watch out for special values
3. Is any data missing?
4. Check each variable
5. Check combinations of variables
6. Profile the cleaned data

## Documentation
[The vizdataquality documentation](https://vizdataquality.readthedocs.io/en/latest/index.html) is hosted on Read the Docs.

## Installation
We recommend installing vizdataquality in a python virtual environment or Conda environment.

To install [vizdataquality](https://pypi.org/project/vizdataquality/), most users should run:

```
pip install 'vizdataquality'
```

## Tutorials
The package includes notebooks that show you how to:
- [Calculate a set of data quality attributes and output them to a file](https://github.com/royruddle/vizdataquality/blob/main/notebooks/Simple%20example.ipynb)
- Use each type of plot, e.g., [datetime value distribution](https://github.com/royruddle/vizdataquality/blob/main/notebooks/Datetime%20value%20distribution.ipynb)
- [Create a report](https://github.com/royruddle/vizdataquality/blob/main/notebooks/Report.ipynb) while you investigate data quality and profile a dataset
- [Apply the six-step workflow to an open parking fines dataset](https://github.com/royruddle/vizdataquality/blob/main/notebooks/Workflow%20(parking%20fines).ipynb)

After installing vizdataquality, to follow theses tutorials interactively you will need to clone or download this repository. Then start jupyter from within it:

```
python -m jupyter notebook notebooks
```

## Development
- Documentation is built on readthedocs.com from main branch
- PyPi pulls on creating a release on project repository on GitHub.

## Notice
The vizdataquality software is released under the Apache Licence, version 2.0. See [LICENCE](./LICENCE) for details.

## Acknowledgements
The development of the vizdataquality software was supported by funding from the Engineering and Physical Sciences Research Council (EP/N013980/1; EP/R511717/1) and the Alan Turing Institute.

