Metadata-Version: 2.1
Name: dora-core
Version: 0.1.0
Summary: Dora Core Library
License: Apache-2.0
Author: Didone
Author-email: tiago.didone@compass.uol
Requires-Python: >=3.10,<3.13
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Build Tools
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: dagster-pipes (>=1.9.11,<2.0.0)
Requires-Dist: jinja2 (>=3.1.5,<4.0.0)
Requires-Dist: numpy (>=2.2.1,<3.0.0)
Requires-Dist: pyarrow (>=18.1.0,<19.0.0)
Requires-Dist: pyiceberg (>=0.8.1,<0.9.0)
Requires-Dist: pyyaml (>=6.0.2,<7.0.0)
Requires-Dist: sqlglot[rs] (>=26.1.3,<27.0.0)
Description-Content-Type: text/markdown

# Dora

Dora is a declarative ETL framework that simplifies and optimizes ETL workflows for Data Intelligence Platforms by automating task orchestration, computation management, monitoring, data quality, and error handling. It allows data teams to work entirely with SQL, seamlessly converting statements into tables and provisioning infrastructure as code. With efficient ingestion and a streamlined medallion architecture, Dora ensures high data quality and maximizes business value. By adopting best practices from software engineering, it enhances reliability and efficiency in data workflows, enabling engineers to focus on delivering high-quality data instead of managing pipelines.

## Features

- **Declarative**: Define data transformations effortlessly using SQL statements.
- **Automated**: Convert SQL statements into data assets and automatically provision the necessary infrastructure.
- **Orchestration**: Handle task orchestration, computation management, monitoring, data quality, and error handling seamlessly.
- **Cost-Efficient Ingestion**: Enable fast, scalable, and cost-effective data ingestion.
- **Infrastructure as Code**: Provision and manage infrastructure as a deployable, version-controlled code stack.
- **Simplified Operations**: Automate operational complexities, allowing teams to focus on delivering high-quality data.
- **Data Quality**: Ensure and continuously monitor data quality to maximize business value.
- **Comprehensive Monitoring**: Track all workflows as interconnected data assets, providing a unified, real-time view of the entire data ecosystem.
- **Best Practices**: Empower data analysts and engineers to apply the same robust practices used in software engineering for reliability, scalability, and maintainability.
- **Medallion Architecture**: Streamline data workflows with a medallion architecture that ensures high data quality and maximizes business value.
- **Efficiency**: Enhance reliability and efficiency in data workflows, enabling engineers to focus on delivering high-quality data instead of managing pipelines.
- **Global Data Asset Lineage**: Track data assets across the entire data ecosystem, providing a comprehensive view of data lineage.
- **Data Intelligence Platform**: Simplify and optimize ETL workflows for Data Intelligence Platforms.
