Metadata-Version: 2.4
Name: sqlflow-core
Version: 0.1.7
Summary: SQLFlow is a SQL-native engine for defining, orchestrating, and managing data workflows—empowering analysts and engineers to build robust, production-grade pipelines using only SQL.
Author-email: Chanh Le <giaosudau@gmail.com>
License:                                  Apache License
                                   Version 2.0, January 2004
                                http://www.apache.org/licenses/
        
           TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
        
           1. Definitions.
        
              "License" shall mean the terms and conditions for use, reproduction,
              and distribution as defined by Sections 1 through 9 of this document.
        
              "Licensor" shall mean the copyright owner or entity authorized by
              the copyright owner that is granting the License.
        
              "Legal Entity" shall mean the union of the acting entity and all
              other entities that control, are controlled by, or are under common
              control with that entity. For the purposes of this definition,
              "control" means (i) the power, direct or indirect, to cause the
              direction or management of such entity, whether by contract or
              otherwise, or (ii) ownership of fifty percent (50%) or more of the
              outstanding shares, or (iii) beneficial ownership of such entity.
        
              "You" (or "Your") shall mean an individual or Legal Entity
              exercising permissions granted by this License.
        
              "Source" form shall mean the preferred form for making modifications,
              including but not limited to software source code, documentation
              source, and configuration files.
        
              "Object" form shall mean any form resulting from mechanical
              transformation or translation of a Source form, including but
              not limited to compiled object code, generated documentation,
              and conversions to other media types.
        
              "Work" shall mean the work of authorship, whether in Source or
              Object form, made available under the License, as indicated by a
              copyright notice that is included in or attached to the work
              (an example is provided in the Appendix below).
        
              "Derivative Works" shall mean any work, whether in Source or Object
              form, that is based on (or derived from) the Work and for which the
              editorial revisions, annotations, elaborations, or other modifications
              represent, as a whole, an original work of authorship. For the purposes
              of this License, Derivative Works shall not include works that remain
              separable from, or merely link (or bind by name) to the interfaces of,
              the Work and Derivative Works thereof.
        
              "Contribution" shall mean any work of authorship, including
              the original version of the Work and any modifications or additions
              to that Work or Derivative Works thereof, that is intentionally
              submitted to Licensor for inclusion in the Work by the copyright owner
              or by an individual or Legal Entity authorized to submit on behalf of
              the copyright owner. For the purposes of this definition, "submitted"
              means any form of electronic, verbal, or written communication sent
              to the Licensor or its representatives, including but not limited to
              communication on electronic mailing lists, source code control systems,
              and issue tracking systems that are managed by, or on behalf of, the
              Licensor for the purpose of discussing and improving the Work, but
              excluding communication that is conspicuously marked or otherwise
              designated in writing by the copyright owner as "Not a Contribution."
        
              "Contributor" shall mean Licensor and any individual or Legal Entity
              on behalf of whom a Contribution has been received by Licensor and
              subsequently incorporated within the Work.
        
           2. Grant of Copyright License. Subject to the terms and conditions of
              this License, each Contributor hereby grants to You a perpetual,
              worldwide, non-exclusive, no-charge, royalty-free, irrevocable
              copyright license to reproduce, prepare Derivative Works of,
              publicly display, publicly perform, sublicense, and distribute the
              Work and such Derivative Works in Source or Object form.
        
           3. Grant of Patent License. Subject to the terms and conditions of
              this License, each Contributor hereby grants to You a perpetual,
              worldwide, non-exclusive, no-charge, royalty-free, irrevocable
              (except as stated in this section) patent license to make, have made,
              use, offer to sell, sell, import, and otherwise transfer the Work,
              where such license applies only to those patent claims licensable
              by such Contributor that are necessarily infringed by their
              Contribution(s) alone or by combination of their Contribution(s)
              with the Work to which such Contribution(s) was submitted. If You
              institute patent litigation against any entity (including a
              cross-claim or counterclaim in a lawsuit) alleging that the Work
              or a Contribution incorporated within the Work constitutes direct
              or contributory patent infringement, then any patent licenses
              granted to You under this License for that Work shall terminate
              as of the date such litigation is filed.
        
           4. Redistribution. You may reproduce and distribute copies of the
              Work or Derivative Works thereof in any medium, with or without
              modifications, and in Source or Object form, provided that You
              meet the following conditions:
        
              (a) You must give any other recipients of the Work or
                  Derivative Works a copy of this License; and
        
              (b) You must cause any modified files to carry prominent notices
                  stating that You changed the files; and
        
              (c) You must retain, in the Source form of any Derivative Works
                  that You distribute, all copyright, patent, trademark, and
                  attribution notices from the Source form of the Work,
                  excluding those notices that do not pertain to any part of
                  the Derivative Works; and
        
              (d) If the Work includes a "NOTICE" text file as part of its
                  distribution, then any Derivative Works that You distribute must
                  include a readable copy of the attribution notices contained
                  within such NOTICE file, excluding those notices that do not
                  pertain to any part of the Derivative Works, in at least one
                  of the following places: within a NOTICE text file distributed
                  as part of the Derivative Works; within the Source form or
                  documentation, if provided along with the Derivative Works; or,
                  within a display generated by the Derivative Works, if and
                  wherever such third-party notices normally appear. The contents
                  of the NOTICE file are for informational purposes only and
                  do not modify the License. You may add Your own attribution
                  notices within Derivative Works that You distribute, alongside
                  or as an addendum to the NOTICE text from the Work, provided
                  that such additional attribution notices cannot be construed
                  as modifying the License.
        
              You may add Your own copyright statement to Your modifications and
              may provide additional or different license terms and conditions
              for use, reproduction, or distribution of Your modifications, or
              for any such Derivative Works as a whole, provided Your use,
              reproduction, and distribution of the Work otherwise complies with
              the conditions stated in this License.
        
           5. Submission of Contributions. Unless You explicitly state otherwise,
              any Contribution intentionally submitted for inclusion in the Work
              by You to the Licensor shall be under the terms and conditions of
              this License, without any additional terms or conditions.
              Notwithstanding the above, nothing herein shall supersede or modify
              the terms of any separate license agreement you may have executed
              with Licensor regarding such Contributions.
        
           6. Trademarks. This License does not grant permission to use the trade
              names, trademarks, service marks, or product names of the Licensor,
              except as required for reasonable and customary use in describing the
              origin of the Work and reproducing the content of the NOTICE file.
        
           7. Disclaimer of Warranty. Unless required by applicable law or
              agreed to in writing, Licensor provides the Work (and each
              Contributor provides its Contributions) on an "AS IS" BASIS,
              WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
              implied, including, without limitation, any warranties or conditions
              of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
              PARTICULAR PURPOSE. You are solely responsible for determining the
              appropriateness of using or redistributing the Work and assume any
              risks associated with Your exercise of permissions under this License.
        
           8. Limitation of Liability. In no event and under no legal theory,
              whether in tort (including negligence), contract, or otherwise,
              unless required by applicable law (such as deliberate and grossly
              negligent acts) or agreed to in writing, shall any Contributor be
              liable to You for damages, including any direct, indirect, special,
              incidental, or consequential damages of any character arising as a
              result of this License or out of the use or inability to use the
              Work (including but not limited to damages for loss of goodwill,
              work stoppage, computer failure or malfunction, or any and all
              other commercial damages or losses), even if such Contributor
              has been advised of the possibility of such damages.
        
           9. Accepting Warranty or Additional Liability. While redistributing
              the Work or Derivative Works thereof, You may choose to offer,
              and charge a fee for, acceptance of support, warranty, indemnity,
              or other liability obligations and/or rights consistent with this
              License. However, in accepting such obligations, You may act only
              on Your own behalf and on Your sole responsibility, not on behalf
              of any other Contributor, and only if You agree to indemnify,
              defend, and hold each Contributor harmless for any liability
              incurred by, or claims asserted against, such Contributor by reason
              of your accepting any such warranty or additional liability.
        
           END OF TERMS AND CONDITIONS
        
           Copyright 2024 SQLFlow Contributors
        
           Licensed under the Apache License, Version 2.0 (the "License");
           you may not use this file except in compliance with the License.
           You may obtain a copy of the License at
        
               http://www.apache.org/licenses/LICENSE-2.0
        
           Unless required by applicable law or agreed to in writing, software
           distributed under the License is distributed on an "AS IS" BASIS,
           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
           See the License for the specific language governing permissions and
           limitations under the License. 
Project-URL: Homepage, https://github.com/sqlflow/sqlflow
Project-URL: Documentation, https://github.com/sqlflow/sqlflow#readme
Project-URL: Source, https://github.com/sqlflow/sqlflow
Project-URL: Tracker, https://github.com/sqlflow/sqlflow/issues
Keywords: sql,data-pipeline,etl,workflow,duckdb,analytics,data-engineering
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: OS Independent
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: duckdb>=1.2.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: click>=8.0.0
Requires-Dist: networkx>=3.0
Requires-Dist: typer==0.9.0
Requires-Dist: rich>=13.0.0
Requires-Dist: requests>=2.28.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: typing-extensions>=4.0.0
Requires-Dist: pyarrow>=10.0.0
Provides-Extra: postgres
Requires-Dist: psycopg2-binary>=2.9.0; platform_machine != "aarch64" and extra == "postgres"
Requires-Dist: psycopg2>=2.9.0; platform_machine == "aarch64" and extra == "postgres"
Provides-Extra: cloud
Requires-Dist: boto3>=1.26.0; extra == "cloud"
Requires-Dist: botocore>=1.29.0; extra == "cloud"
Requires-Dist: google-api-python-client>=2.0.0; extra == "cloud"
Requires-Dist: google-auth>=2.0.0; extra == "cloud"
Requires-Dist: google-auth-oauthlib>=1.0.0; extra == "cloud"
Requires-Dist: google-cloud-storage>=2.0.0; extra == "cloud"
Provides-Extra: all
Requires-Dist: psycopg2-binary>=2.9.0; platform_machine != "aarch64" and extra == "all"
Requires-Dist: psycopg2>=2.9.0; platform_machine == "aarch64" and extra == "all"
Requires-Dist: boto3>=1.26.0; extra == "all"
Requires-Dist: botocore>=1.29.0; extra == "all"
Requires-Dist: google-api-python-client>=2.0.0; extra == "all"
Requires-Dist: google-auth>=2.0.0; extra == "all"
Requires-Dist: google-auth-oauthlib>=1.0.0; extra == "all"
Requires-Dist: google-cloud-storage>=2.0.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pytest-xdist>=3.0.0; extra == "dev"
Requires-Dist: black>=22.1.0; extra == "dev"
Requires-Dist: isort>=5.10.1; extra == "dev"
Requires-Dist: flake8>=4.0.1; extra == "dev"
Requires-Dist: flake8-pyproject>=1.2.3; extra == "dev"
Requires-Dist: autoflake>=2.2.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest>=7.0.0; extra == "test"
Requires-Dist: pytest-cov>=4.1.0; extra == "test"
Requires-Dist: mock>=5.0.0; extra == "test"
Provides-Extra: docs
Requires-Dist: sphinx>=4.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "docs"
Requires-Dist: myst-parser>=1.0.0; extra == "docs"
Dynamic: license-file

# SQLFlow: The Complete SQL Data Pipeline Platform

<div align="center">

**Get working data analytics in under 2 minutes with pure SQL**

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![PyPI version](https://badge.fury.io/py/sqlflow-core.svg)](https://pypi.org/project/sqlflow-core/)
[![DuckDB Powered](https://img.shields.io/badge/powered%20by-DuckDB-DCA344.svg)](https://duckdb.org/)
[![codecov](https://codecov.io/github/giaosudau/sqlflow/graph/badge.svg?token=69WRMEYAAZ)](https://codecov.io/github/giaosudau/sqlflow)

</div>

```mermaid
flowchart LR
    A["🔌 SOURCE"] -->|Raw Data| B["📥 LOAD"]
    B -->|Tables| C["⚙️ SQL Transforms"]
    P["🐍 Python UDFs"] -.->|Enrich| C
    C -->|Results| D["📤 EXPORT"]
    
    style A fill:#3B82F6,color:white,stroke:#2563EB,stroke-width:2px
    style B fill:#10B981,color:white,stroke:#059669,stroke-width:2px
    style C fill:#8B5CF6,color:white,stroke:#7C3AED,stroke-width:2px
    style D fill:#F59E0B,color:white,stroke:#D97706,stroke-width:2px
    style P fill:#EC4899,color:white,stroke:#DB2777,stroke-width:2px
```

## 🚀 Get Started in 90 Seconds

```bash
# Install SQLFlow (includes everything for analytics)
pip install sqlflow-core

# Create project with realistic sample data + working pipelines
sqlflow init my_analytics

# See immediate results with 1,000 customers and 5,000 orders
cd my_analytics
sqlflow pipeline run customer_analytics

# View working customer analytics
cat output/customer_summary.csv
cat output/top_customers.csv
```

**That's it!** You now have working customer analytics with 1,000 customers, 5,000 orders, and 500 products.

## ⚡ Fastest Time to Value in the Industry

**SQLFlow: Under 2 minutes to working analytics**  
Competitors: 15-60 minutes of setup

| Framework | Time to Results | Setup Complexity | Sample Data |
|-----------|-----------------|------------------|-------------|
| **SQLFlow** | **1-2 minutes** | ✅ One command | ✅ Auto-generated |
| dbt | 15-20 minutes | ❌ Manual setup | ❌ Find your own |
| SQLMesh | 20-30 minutes | ❌ New concepts | ❌ Find your own |
| Airflow | 30-60 minutes | ❌ Complex DAGs | ❌ Find your own |

## Why Teams Switch to SQLFlow

**Before SQLFlow (Traditional Approach):**
- 15+ minutes to first results
- Multiple tools with different languages
- Manual data setup and configuration
- Context switching between ingestion, transformation, and export tools

**After SQLFlow:**
- Under 2 minutes to working analytics
- One tool with SQL you already know
- Instant realistic sample data (1,000 customers, 5,000 orders)
- Complete pipeline in a single file

## 📦 Installation Options

<details>
<summary><strong>Need database connectivity or cloud storage?</strong></summary>

```bash
# Basic installation (90% of users)
pip install sqlflow-core

# Add PostgreSQL support
pip install "sqlflow-core[postgres]"

# Add cloud storage (AWS S3 + Google Cloud)
pip install "sqlflow-core[cloud]"

# Everything included
pip install "sqlflow-core[all]"
```

**Having installation issues?** See our comprehensive [Installation Guide](INSTALLATION.md) for platform-specific instructions and troubleshooting.

</details>

## What Makes SQLFlow Different

Stop stitching together complex tools. SQLFlow unifies your entire data workflow in pure SQL with intelligent extensions.

### 🔄 Complete Data Workflow

* **Auto-Generated Sample Data:** 1,000 customers, 5,000 orders, 500 products ready to analyze
* **Ready-to-Run Pipelines:** Customer analytics, data quality monitoring, and basic examples
* **Source Connectors:** Ingest from CSV, PostgreSQL, and more
* **SQL Transformations:** Standard SQL with automatic dependency tracking
* **Python Integration:** Extend with Python UDFs when SQL isn't enough
* **Export Destinations:** Output to files, S3, and other targets

### 💪 Powerful Yet Simple

* **SQL-First:** Leverage the language data teams already know
* **Zero Configuration:** Profiles pre-configured for immediate use
* **Intuitive DSL:** Extended SQL with clear, purpose-built directives
* **Automatic DAG:** Dependencies automatically tracked and visualized
* **Clean Syntax:** No complex configuration or boilerplate

### 🛠️ Developer Experience

* **Instant Results:** Working analytics in under 2 minutes
* **Easy Environment Switching:** Dev to production in seconds with profiles
* **Fast Iteration:** Lightning-quick in-memory mode for development
* **Robust Production:** Persistent storage mode for deployment
* **Built-in Visualization:** Auto-generated pipeline diagrams

## Real-World Example

Here's what SQLFlow generates for you automatically:

**Customer Analytics Pipeline** (auto-created in every project):

```sql
-- Customer Analytics Pipeline (runs immediately!)
-- Load auto-generated realistic data
CREATE TABLE customers AS
SELECT * FROM read_csv_auto('data/customers.csv');

CREATE TABLE orders AS
SELECT * FROM read_csv_auto('data/orders.csv');

-- Analyze customer behavior by country and tier
CREATE TABLE customer_summary AS
SELECT 
    c.country,
    c.tier,
    COUNT(*) as customer_count,
    AVG(c.age) as avg_age,
    COUNT(o.order_id) as total_orders,
    COALESCE(SUM(o.price * o.quantity), 0) as total_revenue
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.country, c.tier
ORDER BY total_revenue DESC;

-- Export results (auto-saved to output/)
EXPORT SELECT * FROM customer_summary 
TO 'output/customer_summary.csv' 
TYPE CSV OPTIONS { "header": true };
```

**Python UDF Integration:**

```python
# python_udfs/metrics.py (when you need more than SQL)
from sqlflow.udfs.decorators import python_scalar_udf, python_table_udf
import pandas as pd

@python_scalar_udf
def calculate_score(value: float, weight: float = 1.0) -> float:
    """Calculate weighted score."""
    return value * weight

@python_table_udf
def add_metrics(df: pd.DataFrame) -> pd.DataFrame:
    """Add calculated metrics to the dataframe."""
    result = df.copy()
    result["total"] = result["quantity"] * result["price"]
    result["discount"] = result["total"] * 0.1
    return result
```

Use in your SQL:

```sql
-- Scalar UDF
SELECT
  product_id,
  price,
  PYTHON_FUNC("python_udfs.metrics.calculate_score", price, 1.5) AS weighted_price
FROM products;

-- Table UDF
CREATE TABLE enriched_orders AS
SELECT * FROM PYTHON_FUNC("python_udfs.metrics.add_metrics", orders);
```

## 🔍 Built-in Validation & Error Prevention

SQLFlow includes intelligent validation that catches errors before execution, saving you time and preventing pipeline failures.

### Catch Errors Early

```bash
# Validate any pipeline without running it
sqlflow pipeline validate customer_analytics

# Validate all pipelines in your project  
sqlflow pipeline validate
```

### Helpful Error Messages & Suggestions

When validation finds issues, you get clear, actionable feedback:

```bash
❌ Validation failed for my_pipeline.sf

📋 Pipeline: my_pipeline
❌ SOURCE missing_path: Missing required parameter 'path'
💡 Suggestion: Add "path": "your_file.csv" to the PARAMS

❌ SOURCE invalid_type: Unknown connector type 'unknown_connector'  
💡 Suggestion: Use one of: csv, postgresql, s3, bigquery

📊 Summary: 2 errors found
```

### Automatic Safety Checks

Validation runs automatically when you compile or run pipelines:

```bash
# These commands validate first, preventing bad deployments
sqlflow pipeline run my_pipeline    # ✅ Validates, then runs
sqlflow pipeline compile my_pipeline # ✅ Validates, then compiles
```

### What Gets Validated

- ✅ **Connector Types**: Ensures you're using valid connector types
- ✅ **Required Parameters**: Checks all required parameters are provided  
- ✅ **File Extensions**: Validates file extensions match connector types
- ✅ **Reference Integrity**: Ensures SOURCE references exist in LOAD statements
- ✅ **Schema Compliance**: Validates against connector schemas
- ✅ **Syntax Checking**: Catches SQL and SQLFlow syntax errors

**Result**: Catch configuration errors in seconds, not after long execution times.

## 📊 Feature Comparison

| Feature | SQLFlow | dbt | SQLMesh | Airflow |
|---------|---------|-----|---------|---------|
| **Time to first results** | **1-2 min** | 15-20 min | 20-30 min | 30-60 min |
| **Sample data included** | ✅ Auto-generated | ❌ Manual | ❌ Manual | ❌ Manual |
| **SQL-based pipelines** | ✅ Complete | ✅ Transform only | ✅ Models | ❌ Python DAGs |
| **Source connectors** | ✅ Built-in | ❌ No | ❌ Limited | ❌ No |
| **Export destinations** | ✅ Built-in | ❌ No | ❌ Limited | ❌ No |
| **Pipeline validation** | ✅ Built-in with suggestions | ❌ Basic syntax | ❌ Limited | ❌ Runtime only |
| **Python integration** | ✅ UDFs | ✅ Limited | ✅ Limited | ✅ Python-first |
| **Environment mgmt** | ✅ Profiles | ✅ Limited | ✅ Environments | ✅ Complex |
| **Learning curve** | ⭐ Low (SQL+) | ⭐⭐ Medium | ⭐⭐ Medium | ⭐⭐⭐ High |
| **Setup complexity** | ⭐ Minimal | ⭐⭐ Medium | ⭐⭐ Medium | ⭐⭐⭐ High |

## 🔍 Why Teams Choose SQLFlow

### For Data Analysts
* **Immediate Results:** Working analytics in 90 seconds
* **No New Tools:** Use SQL you already know for your entire workflow
* **Real Sample Data:** 1,000 customers and 5,000 orders ready to analyze
* **Focus on Insights:** No pipeline plumbing or configuration

### For Data Engineers
* **Faster Prototyping:** From idea to working pipeline in minutes
* **Unified Stack:** Simplify your data architecture
* **SQL Standardization:** One language across your organization
* **Python When Needed:** Extend without leaving your workflow

### For Startups & SMEs
* **Speed to Market:** Get data insights faster than competitors
* **Cost Effective:** Enterprise capabilities without enterprise complexity
* **Team Efficiency:** Leverage existing SQL skills instead of training on new tools

## 🧰 Core Concepts

### 1. Enhanced Project Initialization

SQLFlow creates everything you need to start analyzing data immediately:

```bash
# Default: Full analytics environment
sqlflow init my_project
# Creates: 1,000 customers, 5,000 orders, 500 products + 3 working pipelines

# Minimal: Basic structure only
sqlflow init my_project --minimal

# Demo: Full setup + immediate results
sqlflow init my_project --demo
```

### 2. Profiles for Environment Management

Switch between development and production with a single flag:

```bash
# Development (in-memory, fast)
sqlflow pipeline run customer_analytics

# Production (persistent storage)
sqlflow pipeline run customer_analytics --profile prod
```

### 3. DuckDB-Powered Execution

SQLFlow uses DuckDB as its core engine, offering:

* **In-memory mode** for lightning-fast development
* **Persistent mode** for production reliability  
* **High performance** SQL execution
* **Larger-than-memory** datasets supported

## 📖 Documentation

**New User?** Start here: [Getting Started Guide](docs/user/getting_started.md) - Get working results in under 2 minutes.

### For Users
* [Getting Started Guide](docs/user/getting_started.md) - 2-minute quickstart
* [Speed Comparison](docs/user/reference/speed_comparison.md) - Why SQLFlow is fastest
* [CLI Reference](docs/user/reference/cli.md) - Complete command reference
* [Python UDFs Guide](docs/user/reference/python_udfs.md) - Extend with Python

### For Developers
* [Contributing Guide](docs/developer/contributing.md)
* [Architecture Overview](docs/developer/architecture.md)

### Examples & Comparisons
* [Example Pipelines](examples/) - Real-world use cases
* [SQLFlow vs dbt](docs/comparison/vs_dbt.md) - Detailed comparison
* [SQLFlow vs Airflow](docs/comparison/vs_airflow.md)

## 🤝 Join the Community

SQLFlow is an open-source project built for data practitioners by data practitioners.

* ⭐ **Star us on GitHub!** Show your support and stay updated
* 🐞 [Report issues](https://github.com/sqlflow/sqlflow/issues) or suggest features
* 🧑‍💻 [Contribute code](docs/developer/contributing.md) - Look for 'good first issue' tags
* 💬 [Join discussions](https://github.com/sqlflow/sqlflow/discussions) - Share your use cases

## 📜 License

SQLFlow is released under the [Apache License 2.0](LICENSE).

## ❓ FAQ

**Q: How is SQLFlow different from dbt?**  
A: dbt focuses on transformation within your warehouse. SQLFlow provides end-to-end pipelines (ingestion → transformation → export) with auto-generated sample data for immediate results. [Full comparison](docs/comparison/vs_dbt.md)

**Q: Do I need a data warehouse to use SQLFlow?**  
A: No! SQLFlow uses DuckDB as its engine, working entirely local-first. You can connect to warehouses when needed, but it's not required.

**Q: How does SQLFlow prevent pipeline errors?**  
A: SQLFlow includes built-in validation that checks your pipelines before execution. It validates connector types, required parameters, file extensions, and more - catching errors in seconds instead of after long runs. Use `sqlflow pipeline validate` to check any pipeline.

**Q: Can SQLFlow handle large datasets?**  
A: Yes. DuckDB uses out-of-core algorithms for datasets larger than RAM, spilling to disk as needed. Performance scales well with proper indexing and partitioning.

**Q: How do I switch between development and production?**  
A: Use profiles: `sqlflow pipeline run my_pipeline --profile prod`. Each profile defines different settings, connections, and variables.

**Q: Are intermediate tables saved in persistent mode?**  
A: Yes. All tables are persisted to disk, making debugging and data examination easier.

**Q: Can I use SQLFlow in CI/CD?**  
A: Absolutely. SQLFlow is a CLI tool designed for automation. Use `sqlflow pipeline validate` and `sqlflow pipeline run` in your CI/CD scripts for automated testing and deployment.

---

<div align="center">
  <strong>
    Built with ❤️ for data teams who value speed and simplicity
  </strong>
</div>
