Metadata-Version: 2.4
Name: poppusher
Version: 0.1.0
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering
Classifier: Typing :: Typed
Requires-Dist: dagster >=1.6.11, <1.7.0
Requires-Dist: dagster-cloud >=1.6.11, <1.7.0
Requires-Dist: dagster-azure >=0.22.11
Requires-Dist: pandas >=2.1, <2.2
Requires-Dist: geopandas >=0.14.1, <1
Requires-Dist: docker
Requires-Dist: lxml >=4.9.3, <5.1.1
Requires-Dist: pyarrow
Requires-Dist: fsspec
Requires-Dist: aiohttp
Requires-Dist: matplotlib >=3.8.2
Requires-Dist: pydantic >=2.7.1
Requires-Dist: rdflib >=7.0.0
Requires-Dist: icecream >=2.1.3
Requires-Dist: python-slugify >=8.0.4
Requires-Dist: zipfile-deflate64 >=0.1.0
Requires-Dist: jcs >=0.2.1
Requires-Dist: beautifulsoup4 >=4.12.3
Requires-Dist: openpyxl >=3.1.3
Requires-Dist: xlrd >=2.0.1
Requires-Dist: iso639-lang >=2.2.3
Requires-Dist: aiohttp >=3.9.5
Requires-Dist: more-itertools >=10.3.0
Requires-Dist: pytest >=6 ; extra == 'test'
Requires-Dist: pytest-cov >=3 ; extra == 'test'
Requires-Dist: dagster-webserver >=1.6.11 ; extra == 'dev'
Requires-Dist: pytest >=6 ; extra == 'dev'
Requires-Dist: pytest-cov >=3 ; extra == 'dev'
Requires-Dist: maturin >=1.3.2 ; extra == 'dev'
Requires-Dist: pandas-stubs ; extra == 'dev'
Requires-Dist: types-requests ; extra == 'dev'
Requires-Dist: urllib3 <2 ; extra == 'dev'
Requires-Dist: pre-commit ; extra == 'dev'
Requires-Dist: pyright >=1.1.339 ; extra == 'dev'
Requires-Dist: mkdocs >=1.6.0 ; extra == 'docs'
Requires-Dist: mkdocs-mermaid2-plugin >=1.2.1 ; extra == 'docs'
Provides-Extra: test
Provides-Extra: dev
Provides-Extra: docs
License-File: LICENSE
Summary: A convenience tool for downloading census data from different countries
Author: dabreegster, stuartlynn, sgreenbury
Author-email: Andy Smith <andrewphilipsmith@users.noreply.github.com>, Penelope Yong <penelopeysm@users.noreply.github.com>
Requires-Python: >=3.11
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/Urban-Analytics-Technology-Platform/poppusher
Project-URL: Bug Tracker, https://github.com/Urban-Analytics-Technology-Platform/poppusher/issues
Project-URL: Discussions, https://github.com/Urban-Analytics-Technology-Platform/poppusher/discussions
Project-URL: Changelog, https://github.com/Urban-Analytics-Technology-Platform/poppusher/releases

# poppusher

[![Actions Status][actions-badge]][actions-link]
[![Documentation Status][rtd-badge]][rtd-link]

[![PyPI version][pypi-version]][pypi-link]
[![Conda-Forge][conda-badge]][conda-link]
[![PyPI platforms][pypi-platforms]][pypi-link]

[![GitHub Issues][github-issues-badge]][github-issues-link]

<!-- SPHINX-START -->

<!-- prettier-ignore-start -->
[actions-badge]:            https://github.com/Urban-Analytics-Technology-Platform/poppusher/workflows/CI/badge.svg
[actions-link]:             https://github.com/Urban-Analytics-Technology-Platform/poppusher/actions
[conda-badge]:              https://img.shields.io/conda/vn/conda-forge/poppusher
[conda-link]:               https://github.com/conda-forge/poppusher-feedstock
[github-discussions-badge]: https://img.shields.io/static/v1?label=Discussions&message=Ask&color=blue&logo=github
[github-discussions-link]:  https://github.com/Urban-Analytics-Technology-Platform/poppusher/discussions
[github-issues-badge]: https://img.shields.io/static/v1?label=GitHub&message=Issues&color=blue&logo=github
[github-issues-link]:  https://github.com/Urban-Analytics-Technology-Platform/poppusher/issues
[pypi-link]:                https://pypi.org/project/poppusher/
[pypi-platforms]:           https://img.shields.io/pypi/pyversions/poppusher
[pypi-version]:             https://img.shields.io/pypi/v/poppusher
[rtd-badge]:                https://readthedocs.org/projects/poppusher/badge/?version=latest
[rtd-link]:                 https://poppusher.readthedocs.io/en/latest/?badge=latest

<!-- prettier-ignore-end -->

# What is popgetter?

Popgetter is a collection of tools, designed to make it convenient to download
census data from a number of different jurisdictions and coercing the data into
common formats. The aim is that city or region scale analysis can be easily
[replicated](https://the-turing-way.netlify.app/reproducible-research/overview/overview-definitions.html#table-of-definitions-for-reproducibility)
for different geographies, using the most detailed, locally available data.

# What is poppusher?

This repo is "poppusher", which is one component of the popgetter project.
Poppusher is a pipeline which downloads data from a number of different
jurisdictions and then processes it into a common format. The data is then
stored in a cloud-based data store, which can be accessed by other components of
the popgetter project.

See the [flow diagram](flow-diagram.md) for more details.

## What the popgetter system does and doesn't do

**Popgetter DOES:**

For each of the implemented countries:

- Download the most detailed geometries, for which census data is available.
- Download the most detailed census available for most, if not all, variables
  published by the census.
- Ensures that the geometries and census data join correctly.
- Presents some standard metadata to allow the user to explore which variables
  are available.
- publish the data in a set of common file types (eg CloudGeoBuff, Parquet,
  PMtiles).

**Popgetter DOES NOT:**

- It does not attempt to ensure that census variables are comparable between
  different jurisdictions. Nor does it attempt to ensure that the results of any
  analysis can be directly compared across multiple countries.

## Getting started

At present, this is still a development project, so the first step is to clone
the repo and then install using pip, with the `--editable` option:

1. Create a virtual environment and activate it (you should be able to use your
   own choice of environment manager, such as `conda` or `venv`, but so far
   `pyenv` is the most tested with popgetter). eg:

```bash
python  -m venv poppusher_venv    # create a virtual environment called `poppusher_venv`
source poppusher_venv/bin/activate  # activate the virtual environment
```

2. Clone the repo and then do an 'editable' install:

```bash
git clone https://github.com/Urban-Analytics-Technology-Platform/poppusher.git
cd poppusher
pip install -e ".[dev]"
```

3. Then, start the Dagster UI web server:

```bash
dagster dev
```

Open http://localhost:3000 with your browser to see the project.

## Development

You can start writing assets in `poppusher/assets/` directory. New assets and
jobs will need to be added to the `poppusher/__init__.py` file.

### Adding new Python dependencies

You can specify new Python dependencies in `pyproject.toml`.

### Unit testing

Tests are in the `poppusher_tests` directory and you can run tests using
`pytest`:

```bash
pytest
```

### Repo structure

This is a [Dagster](https://dagster.io/) project. The repo layout was initially
created with the
[`dagster project scaffold`](https://docs.dagster.io/getting-started/create-new-project)
command. It has been subsequently updated using the
[copier](https://copier.readthedocs.io/en/stable/) command and the
[Scientific Python template](https://github.com/scientific-python/cookie).

There is code, which predates the migration to Dagster in the `previous_code`
directory. In due course, this will be removed as the remaining countries are
migrated to Dagster. There are usage instructions for this old code in
`previous_code/previous_code_usage.md`.

