Metadata-Version: 2.1
Name: mozilla-bigquery-etl
Version: 0.0.2
Summary: Tooling for building derived datasets in BigQuery
Author-email: Mozilla Corporation <fx-data-dev@mozilla.org>
Project-URL: Homepage, https://github.com/mozilla/bigquery-etl
Project-URL: Issues, https://github.com/mozilla/bigquery-etl/issues
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: attrs ==23.1.0
Requires-Dist: authlib ==1.3.0
Requires-Dist: black ==23.10.1
Requires-Dist: cattrs ==23.2.3
Requires-Dist: click ==8.1.7
Requires-Dist: exceptiongroup ==1.2.0
Requires-Dist: flake8 <5
Requires-Dist: gcsfs ==2023.10.0
Requires-Dist: gcloud ==0.18.3
Requires-Dist: gitpython ==3.1.40
Requires-Dist: google-cloud-bigquery ==3.14.1
Requires-Dist: google-cloud-bigquery-storage[fastavro] ==2.23.0
Requires-Dist: google-cloud-storage ==2.10.0
Requires-Dist: Jinja2 ==3.1.3
Requires-Dist: jsonschema ==4.19.2
Requires-Dist: markdown-include ==0.8.1
Requires-Dist: mdx-truly-sane-lists ==1.3
Requires-Dist: mkdocs ==1.5.3
Requires-Dist: mkdocs-material ==9.5.3
Requires-Dist: mkdocs-awesome-pages-plugin ==2.9.2
Requires-Dist: mozilla-metric-config-parser ==2023.10.2
Requires-Dist: mozilla-schema-generator ==0.5.1
Requires-Dist: pandas ==2.1.0
Requires-Dist: pathos ==0.3.1
Requires-Dist: pip-tools ==7.3.0
Requires-Dist: pre-commit ==3.6.0
Requires-Dist: pyarrow ==14.0.2
Requires-Dist: pytest-black ==0.3.12
Requires-Dist: pytest-flake8 ==1.1.1
Requires-Dist: pytest-isort ==3.1.0
Requires-Dist: pytest-mypy ==0.10.3
Requires-Dist: pytest-pydocstyle ==2.3.2
Requires-Dist: pytest-xdist ==3.3.1
Requires-Dist: pytest ==7.4.3
Requires-Dist: PyYAML ==6.0.1
Requires-Dist: rich-click ==1.7.2
Requires-Dist: smart-open ==6.4.0
Requires-Dist: sqlglot ==20.4.0
Requires-Dist: sqlparse ==0.4.4
Requires-Dist: stripe ==6.4.0
Requires-Dist: symbolic ==12.4.1
Requires-Dist: siggen ==2.0.20231009
Requires-Dist: tomli ==2.0.1
Requires-Dist: types-python-dateutil ==2.8.19.14
Requires-Dist: types-pytz ==2023.3.1.1
Requires-Dist: types-PyYAML ==6.0.12.11
Requires-Dist: types-requests ==2.31.0.10
Requires-Dist: types-ujson ==5.8.0.1
Requires-Dist: typing ==3.7.4.3
Requires-Dist: ujson ==5.9.0
Requires-Dist: yamllint ==1.32.0

[![CircleCI](https://dl.circleci.com/status-badge/img/gh/mozilla/bigquery-etl/tree/main.svg?style=svg&circle-token=1df4cefd991043d7d3f13243ea80f38e7aa18341)](https://dl.circleci.com/status-badge/redirect/gh/mozilla/bigquery-etl/tree/main)
# BigQuery ETL

This repository contains Mozilla Data Team's:

- Derived ETL jobs that do not require a custom container
- User-defined functions (UDFs)
- Airflow DAGs for scheduled bigquery-etl queries
- Tools for query & UDF deployment, management and scheduling

For more information, see [https://mozilla.github.io/bigquery-etl/](https://mozilla.github.io/bigquery-etl/)

## Quick Start

### Pre-requisites
- **Pyenv** (optional) Recommended if you want to install different versions of python, see instructions [here](https://github.com/pyenv/pyenv#basic-github-checkout). After the installation of pyenv, make sure that your terminal app is [configured to run the shell as a login shell](https://github.com/pyenv/pyenv/wiki/MacOS-login-shell).
- **Homebrew** (not required, but useful for Mac) - Follow the instructions [here](https://brew.sh/) to install homebrew on your Mac.
- **Python 3.10+** - (see [this guide](https://docs.python-guide.org/starting/install3/osx/) for instructions if you're on a mac and haven't installed anything other than the default system Python).

### GCP CLI tools

- **For Mozilla Employees or Contributors (not in Data Engineering)** - Set up GCP command line tools, [as described on docs.telemetry.mozilla.org](https://docs.telemetry.mozilla.org/cookbooks/bigquery/access.html#using-the-bq-command-line-tool). Note that some functionality (e.g. writing UDFs or backfilling queries) may not be allowed.
- **For Data Engineering** - In addition to setting up the command line tools, you will want to log in to `shared-prod` if making changes to production systems. Run `gcloud auth login --update-adc --project=moz-fx-data-shared-prod` (if you have not run it previously).

### Installing bqetl

1. Clone the repository
```bash
git clone git@github.com:mozilla/bigquery-etl.git
cd bigquery-etl
```

2. Install the `bqetl` command line tool
```bash
./bqetl bootstrap
```

3. Install standard pre-commit hooks
```bash
venv/bin/pre-commit install
```

Finally, if you are using Visual Studio Code, you may also wish to use our recommended defaults:
```bash
cp .vscode/settings.json.default .vscode/settings.json
cp .vscode/launch.json.default .vscode/launch.json
```

And you should now be set up to start working in the repo! The easiest way to do this is for many tasks is to use [`bqetl`](https://mozilla.github.io/bigquery-etl/bqetl/). You may also want to read up on [common workflows](https://mozilla.github.io/bigquery-etl/cookbooks/common_workflows/).
