Metadata-Version: 2.1
Name: libsqlglot
Version: 0.4.0
Summary: Python bindings for libsqlglot, a high-performance SQL parser, transpiler, and optimiser written in C++
Author: libsqlglot contributors
License: Apache-2.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: C++
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Compilers
Project-URL: Homepage, https://github.com/richarah/libsqlglot
Project-URL: Documentation, https://github.com/richarah/libsqlglot#readme
Project-URL: Repository, https://github.com/richarah/libsqlglot
Requires-Python: >=3.9
Description-Content-Type: text/markdown

# libsqlglot

#### What this is

sqlglot, in C++. 45 dialects, 126× faster on benchmark SQL, 252× on the kind your ORM generates when nobody's looking. Performance gap scales with query complexity, see [Benchmarks](#benchmarks).

Supports stored procedures (PL/pgSQL, T-SQL, MySQL, PL/SQL): where sqlglot falls back to passthrough, libsqlglot parses them into the AST.

#### What it's for

Anywhere SQL hits a hot path: proxies, sidecars, migration tools, linters *inter alia*. Also a replacement for every regex that's pretending to parse SQL.

#### Why it exists

Because the hardest part of parsing SQL in Python is explaining to Python why you're doing it so often.

Inspired by the original [sqlglot](https://github.com/tobymao/sqlglot), which did the decade-long work of mapping 31+ SQL dialects into an elegant, universal AST. libsqlglot does the comparatively trivial work of compiling it. The algorithm was already O(n), the runtime was O(python).

## Contents

- [Functionality](#functionality)
- [Quickstart](#quickstart)
  - [C++](#c)
  - [Python](#python)
- [Differences from original sqlglot](#differences-from-original-sqlglot)
- [Building](#building)
- [Architecture](#architecture)
- [Benchmarks](#benchmarks)
- [Examples](#examples)
- [Supported SQL dialects](#supported-sql-dialects)
- [Licence](#licence)

## Functionality

Transpiles SQL dialects via sqlglot AST. Full stored procedure support. Fail-fast errors with exact line and column. Python bindings available at 95-98% of C++ speed (`import libsqlglot as sqlglot` and go).

Handles the full SQL surface: SELECT, INSERT, UPDATE, DELETE, CREATE TABLE, ALTER TABLE, DROP TABLE, TRUNCATE, MERGE, plus stored procedures (CALL, RETURN, DECLARE, IF/ELSEIF/ELSE, WHILE, FOR loops). Also handles CTEs, window functions, subqueries, and various JOIN types.

Compatible with 45 dialects (see [Supported SQL dialects](#supported-sql-dialects) for the full list).

## Quickstart

### C++

```cpp
#include <libsqlglot/transpiler.h>

using namespace libsqlglot;

// Transpile between dialects (MySQL → PostgreSQL)
std::string output = Transpiler::transpile(
    "SELECT `id`, `name` FROM `users` LIMIT 10",
    Dialect::MySQL,
    Dialect::PostgreSQL
);
// Returns: SELECT "users"."id", "users"."name" FROM "users" LIMIT 10

// Parse SQL into AST
Arena arena;
auto stmt = Transpiler::parse(arena, "SELECT name FROM users WHERE age > 18");
// Returns: AST with SelectStmt node

// Optimise AST (column qualification, predicate pushdown, constant folding)
Transpiler::optimize(arena, stmt);
// Modifies AST: name → users.name, age → users.age

// Generate SQL from AST
std::string sql = Transpiler::generate(stmt, Dialect::PostgreSQL);
// Returns: SELECT users.name FROM users WHERE users.age > 18
```

See [Supported SQL dialects](#supported-sql-dialects) for all available `Dialect::` values.

### Python

Available on PyPI: `pip install libsqlglot`

**System requirements**: Linux with glibc 2.35+ (Ubuntu 22.04+, Debian 12+, RHEL 9+)

```python
import libsqlglot as sqlglot

# Transpile
sql = "SELECT `id`, `name` FROM `users` LIMIT 10"
result = sqlglot.transpile(sql, read="mysql", write="postgres")
result = sqlglot.transpile(sql, "mysql", "postgres") # Both styles work

# Parse
stmt = sqlglot.parse_one(sql) # single statement
stmts = sqlglot.parse("SELECT 1; SELECT 2") # multiple statements

# Generate
sql = stmt.sql() # default ANSI
sql = stmt.sql(dialect="postgres", pretty=True) # pretty-print
sql = stmt.sql(dialect="sqlserver") # TRUE → 1

# AST traverse
columns = stmt.find_all(sqlglot.ExprType.COLUMN)
tables = stmt.find_all(sqlglot.ExprType.TABLE_REF)
stmt.walk(lambda n: print(n.type))

# Optimise
optimized = sqlglot.optimize(stmt)

# Builder
stmt = (sqlglot.select(["id", "name"])
        .from_("users")
        .where("active = TRUE")
        .order_by("name")
        .limit(10))

# Diff
diff = sqlglot.diff("SELECT id FROM users", "SELECT user_id FROM users")
```

See [Supported SQL dialects](#supported-sql-dialects) for all available dialect names.

**Python API**: `parse()`, `parse_one()`, `generate()`, `transpile()`, `optimize()`, `diff()`, `.sql()`, `.find_all()`, `.walk()`, `select()` builder.

**Performance**: 95-98% of C++ speed. Overhead is ~125ns per call (10ns function call + 100ns string marshal + 15ns misc). On typical 2.5μs parse, this is 5% overhead. On complex 25μs parse, 0.5% overhead.

**Memory**: Thread-local arenas, no runtime dependencies.

## Differences from original sqlglot

| | libsqlglot | Python sqlglot |
|---|---|---|
| **Performance** | 32-242× faster (126× avg) | Baseline |
| **Stored procedures** | Support for PL/pgSQL, T-SQL, MySQL, PL/SQL | Limited (`exp.Command` passthrough) |
| **Error handling** | Fail-fast with precise errors (line, column, context) | Error recovery (IDE-friendly, slower) |
| **Memory** | Arena allocation (O(1) cleanup) | Garbage collection |
| **Optimiser** | Column qualification, predicate pushdown, constant folding, subquery elimination | Same + additional passes + full execution engine |
| **Codebase** | Header-only C++26 library | 50,000+ lines Python |
| **Keywords** | C++26 reflection: auto-generated from enum (300+ keywords, zero maintenance) | Manually maintained dictionaries |
| **Binary** | C++ library 14KB, Python wheel 1.5MB | N/A |
| **Dialects** | 45 SQL dialects (14 unique to libsqlglot, including ANSI) | 32 dialects (including PRQL, which libsqlglot doesn't support) |
| **SQL coverage** | Same as Python sqlglot | Full SQL support |
| **Dependencies** | None (no runtime deps) | None |

## Building

Requires C++26 (GCC 14+ with `-freflection`) and CMake 3.21+.

**C++26 features used:**
- **Keyword reflection** (`std::meta`): Auto-generates 300+ keyword mappings from `TokenType` enum at compile time. Zero maintenance, impossible to desync.
- **Dialect reflection**: Build-time code generation parses the `Dialect` enum and generates compile-time mappings (CMake → Python script → generated header). When GCC fixes the reflection bug, will switch to pure C++26 reflection.
- **ExprType bindings**: Python bindings auto-generated from the C++ `ExprType` enum (124 values). Run `python3 scripts/generate_expr_type_bindings.py > src/python/expr_type_bindings_generated.h` after modifying `expression.h`.
- **Advanced constexpr**: Perfect hash tables, compile-time string processing.

### Docker (Recommended)

The easiest way to build with GCC trunk + reflection support:

```bash
# Build the project (first build takes 30-45 min to compile GCC trunk)
docker compose -f docker/docker-compose.yml run --rm build

# Run tests
docker compose -f docker/docker-compose.yml run --rm test

# Build Python wheel
docker compose -f docker/docker-compose.yml run --rm wheel

# Development shell
docker compose -f docker/docker-compose.yml run --rm dev
```

See `docker/README.md` for full documentation.

### C++ library (native)

Requires GCC trunk built from source with `-freflection` support:

```bash
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
sudo cmake --install build
```

### Python package

```bash
pip install -e .
```

### Building with tests and benchmarks

```bash
cmake -B build -DCMAKE_BUILD_TYPE=Release \
               -DLIBSQLGLOT_BUILD_TESTS=ON \
               -DLIBSQLGLOT_BUILD_BENCHMARKS=ON
cmake --build build -j$(nproc)
ctest --test-dir build
```

**Compiled sizes** (stripped, `-O3`): C++ library 14KB, Python extension 2.0MB.

**Code quality**: Compiles with `-Wall -Wextra -Wpedantic -Werror`. No runtime dependencies. No RTTI. Passes 27,040 assertions across 361 test cases. Fuzz-tested with `libFuzzer` + `AddressSanitizer`.

### Advanced optimisations

**Profile-Guided Optimisation (PGO)**: For production deployments requiring maximum performance, enable PGO in 3 steps:

```bash
# Step 1: Build with profiling instrumentation
cmake -B build -DCMAKE_BUILD_TYPE=Release -DLIBSQLGLOT_PGO_GENERATE=ON
cmake --build build

# Step 2: Run with representative workload to collect profile data
./build/benchmarks/bench_transpiler  # or your own queries

# Step 3: Rebuild using profile data for optimisation (10-30% faster)
cmake -B build -DCMAKE_BUILD_TYPE=Release -DLIBSQLGLOT_PGO_USE=ON
cmake --build build
```

**Compiler optimisations enabled** (Release builds): Link-Time Optimisation (LTO), constant merging, symbol visibility optimisation.

## Architecture

Header-only design: you only pay for what you use. 19 header files, no `.cpp`. See `include/libsqlglot/` for the full layout. Core files: `parser.h` (2958 lines), `generator.h` (1643), `expression.h` (1105, 105 expression types). Entry point is `transpiler.h` (86 lines).

### Memory management

Arena allocation: all AST nodes allocated in contiguous chunks, freed together in O(1) time. String interning deduplicates identifiers. Tokenisation is zero-copy via `string_view`. Everything uses RAII, no manual `delete` calls.

### Grammar pipeline
Grammar definitions from multiple sources (ANTLR and normalised JSON specs) are unified into a canonical IR, then compiled into a cache-efficient LUT used by the runtime parser.

### SQL support

**Statements**: SELECT, INSERT, UPDATE, DELETE, MERGE, CREATE TABLE, ALTER TABLE, DROP TABLE, DROP INDEX, TRUNCATE, CREATE VIEW, DROP VIEW, CREATE SCHEMA, DROP SCHEMA

**Stored procedures**: CALL, RETURN, DECLARE, IF/ELSEIF/ELSE/END IF, WHILE/DO/END WHILE, FOR/IN/LOOP/END LOOP

**Clauses**: WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, OFFSET, QUALIFY

**JOINs**: INNER, LEFT, RIGHT, FULL, CROSS, NATURAL, ASOF

**Advanced**: CTEs (WITH RECURSIVE), window functions (PARTITION BY, ORDER BY, ROWS/RANGE, named windows), subqueries, PIVOT/UNPIVOT

**Operators**: Arithmetic, comparison, logical, LIKE, ILIKE, IS NULL, BETWEEN, IN, EXISTS

**Expressions**: CASE WHEN, arrays, array indexing, set operations (UNION, INTERSECT, EXCEPT)

**Transactions**: BEGIN, COMMIT, ROLLBACK, SAVEPOINT

**Utilities**: SET, DESCRIBE, SHOW, EXPLAIN

## Testing

361 test cases, 27,040 assertions, all passing.

```bash
cd build
ctest --output-on-failure

# Run specific test suites
./tests/libsqlglot_tests "[parser]"
./tests/libsqlglot_tests "[security]"
./tests/libsqlglot_tests "[stored_procedures]"
./tests/libsqlglot_tests "[performance]"

# Run benchmarks
./benchmarks/bench_tokenizer
./benchmarks/bench_parser
./benchmarks/bench_transpiler
```

## Security

27,040 assertions covering SQL injection, buffer overflow, stack overflow (recursion depth at 256, adjustable via `Parser::kMaxRecursionDepth` in `parser.h`), memory corruption (arena prevents use-after-free and double-free), integer overflow, and encoding attacks (UTF-8 identifiers rejected, UTF-8 string literals accepted). All pass.

## Fuzzing

There is a `libFuzzer` target in `fuzzing/fuzz_parser.cpp`:

```bash
cd fuzzing
clang++ -fsanitize=fuzzer,address -std=c++26 -freflection -I../include fuzz_parser.cpp -o fuzz_parser
./fuzz_parser -max_len=10000 -timeout=10
```

## Compatibility testing

Test compatibility with Python sqlglot's test fixtures:

```bash
cd compat
python sqlglot_compat.py --sqlglot-path /path/to/sqlglot --limit 1000
```

## Errors

libsqlglot provides precise, actionable error messages with exact locations:

```cpp
Arena arena;
Parser parser(arena, "SELECT FROM users");  // Missing column list

// Throws: Line 1, column 8: Expected column list after SELECT (found: 'FROM')
```

All parse errors include line number, column number, the actual token that caused the error, and a human-readable explanation of what was expected. Fail-fast, no cascading secondary errors.

## Examples

### Dialect conversion

```cpp
// MySQL -> PostgreSQL
std::string mysql_sql = "SELECT `user_id`, `name` FROM `users` LIMIT 10";
std::string pg_sql = Transpiler::transpile(
    mysql_sql, Dialect::MySQL, Dialect::PostgreSQL
);
// Output: SELECT "users"."user_id", "users"."name" FROM "users" LIMIT 10
```

### Building AST by hand

```cpp
Arena arena;
auto stmt = arena.create<SelectStmt>();

// SELECT *
stmt->columns.push_back(arena.create<Star>());

// FROM users
stmt->from = arena.create<TableRef>("users");

// WHERE age > 18
auto age_col = arena.create<Column>("age");
auto threshold = arena.create<Literal>("18");
stmt->where = arena.create<BinaryOp>(ExprType::GT, age_col, threshold);

std::string sql = Generator::generate(stmt);
// "SELECT * FROM users WHERE age > 18"
```

### Optimisation

```cpp
Arena arena;
auto stmt = Transpiler::parse(arena,
    "SELECT name FROM users WHERE age > 18");

// Apply optimisation passes
Transpiler::optimize(arena, stmt);

// Optimisations include:
// - Column qualification (name -> users.name)
// - Predicate pushdown
// - Constant folding
// - Subquery elimination

std::string sql = Transpiler::generate(stmt);
// "SELECT users.name FROM users WHERE users.age > 18"
```

### Stored procedures (PL/pgSQL, T-SQL, MySQL, Oracle)

```cpp
Arena arena;
std::string plpgsql = R"(
    FOR i IN 1..100 LOOP
        IF i > 50 THEN
            RETURN i;
        END IF
    END LOOP
)";

auto stmt = Transpiler::parse(arena, plpgsql);
std::string output = Transpiler::generate(stmt, Dialect::PostgreSQL);
```

### Transpiling between dialects

```cpp
// Real-world analytics query: CTEs, window functions, multiple JOINs
std::string mysql_query = R"(
    WITH regional_sales AS (
        SELECT
            region,
            product_id,
            SUM(amount) as total_sales,
            COUNT(order_id) as order_count,
            ROW_NUMBER() OVER (PARTITION BY region ORDER BY SUM(amount) DESC) as sales_rank
        FROM orders
        WHERE order_date >= '2024-01-01'
        GROUP BY region, product_id
        HAVING SUM(amount) > 10000
    ),
    top_regions AS (
        SELECT region
        FROM regional_sales
        WHERE sales_rank <= 5
        GROUP BY region
        HAVING COUNT(*) > 3
    )
    SELECT
        r.region,
        r.product_id,
        p.product_name,
        r.total_sales,
        r.order_count,
        RANK() OVER (ORDER BY r.total_sales DESC) as overall_rank,
        LAG(r.total_sales) OVER (ORDER BY r.total_sales DESC) as prev_sales
    FROM regional_sales r
    INNER JOIN products p ON r.product_id = p.id
    INNER JOIN top_regions t ON r.region = t.region
    WHERE r.sales_rank <= 10
    ORDER BY r.total_sales DESC
    LIMIT 100
)";

// MySQL → BigQuery
std::string bigquery = Transpiler::transpile(
    mysql_query, Dialect::MySQL, Dialect::BigQuery
);
// Full semantic preservation: CTEs, window functions, JOINs all intact

// Round-trip: MySQL → BigQuery → PostgreSQL → Snowflake → MySQL
std::string step1 = Transpiler::transpile(mysql_query, Dialect::MySQL, Dialect::BigQuery);
std::string step2 = Transpiler::transpile(step1, Dialect::BigQuery, Dialect::PostgreSQL);
std::string step3 = Transpiler::transpile(step2, Dialect::PostgreSQL, Dialect::Snowflake);
std::string back_to_mysql = Transpiler::transpile(step3, Dialect::Snowflake, Dialect::MySQL);
// Query semantics preserved across 4 dialect conversions

// Simple transformations: PostgreSQL → SQL Server
std::string pg_query = "SELECT * FROM users WHERE active = TRUE LIMIT 10";
Arena arena;
Parser parser(arena, pg_query);
auto stmt = parser.parse_select();
std::string sql_server = Generator::generate(stmt, Dialect::SQLServer);
// Result: SELECT TOP 10 * FROM users WHERE active = 1
// LIMIT → TOP, TRUE → 1

// Multi-dialect pipeline: Parse once, generate for multiple targets
Arena shared_arena;
auto ast = Transpiler::parse(shared_arena, "SELECT name FROM users WHERE age > 18");

std::string postgres_sql = Generator::generate(ast, Dialect::PostgreSQL);
std::string mysql_sql = Generator::generate(ast, Dialect::MySQL);
std::string duckdb_sql = Generator::generate(ast, Dialect::DuckDB);
std::string snowflake_sql = Generator::generate(ast, Dialect::Snowflake);
// Single parse, multiple outputs - efficient for multi-target scenarios
```

## Benchmarks

Benchmarks run on x86-64 Linux with `-O3` optimisation. libsqlglot compared against pure Python sqlglot 30.0.1. Python bindings add ~125ns overhead (95-98% of C++ performance).

**What we measure:** Full parse + generate round-trip (SQL → AST → SQL). No optimisation applied in either implementation. Both produce identical output, proving identical work done. Apples-to-apples comparison of parser and generator performance.

**Measurement:** `std::chrono::high_resolution_clock` with 1000 iterations per query, averaged.

The 16 standard queries are sqlglot's benchmark. The 8 stress tests are ours, excluded from the average. They're here to show the scaling doesn't stop: 178.6× on benchmarks, 252× on these. What happens past 252× is left as an exercise for the reader.

### Standard benchmarks (16 queries)

| Query              | sqlglot (μs) | libsqlglot (μs) | Speedup  |
|--------------------|--------------|-----------------|----------|
| many_ctes          | 1,097.84     | 4.53            | 242.4×   |
| many_joins         | 1,038.03     | 5.81            | 178.7×   |
| nested_functions   | 892.91       | 6.27            | 142.4×   |
| nested_subqueries  | 495.97       | 2.79            | 177.8×   |
| many_unions        | 2,585.04     | 17.30           | 149.4×   |
| tpch               | 1,110.16     | 7.74            | 143.4×   |
| complex_where      | 575.69       | 4.27            | 134.8×   |
| many_windows       | 933.16       | 13.00           | 71.8×    |
| deep_arithmetic    | 388.10       | 2.86            | 135.7×   |
| many_columns       | 1,643.41     | 14.81           | 110.9×   |
| values             | 15,951.28    | 162.90          | 97.9×    |
| large_case         | 4,480.21     | 35.30           | 126.9×   |
| large_in           | 11,756.63    | 116.38          | 101.0×   |
| many_numbers       | 6,463.57     | 67.71           | 95.5×    |
| short              | 118.61       | 1.54            | 77.0×    |
| large_strings      | 146.92       | 4.54            | 32.4×    |

**Average: 126.1× faster** (range: 32.4× to 242.4×). A million queries: 29 seconds vs 52 minutes.

libsqlglot achieves this through:
- **Memory**: Arena allocation (O(1) cleanup), string interning (pointer equality), zero-copy tokenisation (`string_view`)
- **Algorithms**: Perfect hash keyword lookup (O(1)), branchless uppercase conversion (no branch misprediction)
- **Compiler**: LTO (whole-program optimisation), aggressive inlining, constant folding, C++23 `constexpr`
- **Cache**: Contiguous memory layout, spatial locality, no per-node `malloc`/`new` fragmentation

### Stress tests (8 queries, supplementary)

Designed to break parsers: 15-level nested CTEs, 35-level CASE expressions, 100+ WHERE conditions, 20-way joins. 1,186 lines of SQL, 51KB total.

| Query                    | sqlglot (μs) | libsqlglot (μs) | Speedup  |
|--------------------------|--------------|-----------------|----------|
| Correlated subqueries    | 14,400.86    | 60.60           | 237.6×   |
| Complex subqueries       | 18,414.37    | 73.18           | 251.6×   |
| Complex string functions | 15,609.81    | 71.42           | 218.6×   |
| Deep CTE nesting         | 17,214.85    | 82.36           | 209.0×   |
| Union chains             | 24,075.57    | 115.77          | 208.0×   |
| Complex WHERE clause     | 14,499.12    | 81.99           | 176.8×   |
| Deep CASE nesting        | 14,651.62    | 103.10          | 142.1×   |
| Multi-table joins        | 13,862.49    | 141.16          | 98.2×    |

**Average: 192.7× faster** (range: 98.2× to 251.6×). A million queries: 86 seconds vs 4.6 hours.

### Validation

```python
# Python sqlglot: 3,917 μs
parsed = sqlglot.parse_one("SELECT * FROM users WHERE age > 18")
output = parsed.sql()  # "SELECT * FROM users WHERE age > 18"
```

```cpp
// libsqlglot: 45 μs (87× faster)
Arena arena;
auto ast = Transpiler::parse(arena, "SELECT * FROM users WHERE age > 18");
auto output = Transpiler::generate(ast);  // "SELECT * FROM users WHERE age > 18"
```

See `benchmarks/bench_complete_comparison.py` to reproduce.

## Supported SQL dialects

**45 dialects** with full parse and generation support. Use `Dialect::Name` in C++ or `"name"` strings in Python (e.g., `read="mysql"`, `write="postgres"`).

**Note:** Dialect names are case-sensitive (e.g., `Dialect::PostgreSQL`, not `Dialect::postgresql`).

Each dialect includes proper identifier quoting, keyword handling, function name translation, and syntax transformations (e.g. LIMIT vs TOP vs FETCH FIRST).

If somehow a dialect is missing from the map, get_features() returns ANSI SQL defaults.

### Core dialects (explicit configurations)

| Database | Dialect | Dialect-Specific Features |
|----------|---------|---------------------------|
| ANSI SQL | ANSI | Standard SQL compliance: ANSI joins, standard aggregations, CTEs |
| BigQuery | BigQuery | STRUCT types, ARRAY literals `[1,2,3]`, SAFE_CAST, INT64/STRING types, nested field access |
| Calcite | Calcite | TABLESAMPLE BERNOULLI, Apache Calcite optimiser hints |
| ClickHouse | ClickHouse | Column-oriented syntax, MergeTree engines, SAMPLE BY, ARRAY JOIN |
| DB2 | DB2 | FETCH FIRST n ROWS ONLY, DB2 stored procedures, OLAP functions |
| Drill | Drill | Schema-free JSON queries, nested data access, FLATTEN |
| DuckDB | DuckDB | QUALIFY clause, ASOF joins, PIVOT/UNPIVOT, macro functions, LIST type |
| Hive | Hive | PARTITIONED BY, CLUSTERED BY, SerDe formats, Hive UDFs |
| MySQL | MySQL | Backtick identifiers, MySQL-specific functions, storage engines |
| Oracle | Oracle | CONNECT BY hierarchical queries, PRIOR, START WITH, DUAL table, PL/SQL blocks |
| Phoenix | Phoenix | HBase integration: SALT_BUCKETS, ARRAY_APPEND, UPSERT VALUES |
| Pinot | Pinot | Real-time OLAP: segment pruning, star-tree indexes, broker queries |
| PostgreSQL | PostgreSQL | RETURNING, ON CONFLICT, LATERAL joins, window functions, array types, JSONB |
| Presto | Presto | APPROX_DISTINCT, UNNEST, ROW types, lambda functions |
| Snowflake | Snowflake | FLATTEN for JSON, VARIANT type, TIME_TRAVEL, CLUSTER BY, RESULT_SCAN |
| Solr | Solr | score() relevance function, faceted search, Lucene query syntax |
| Spark | Spark | NULL-SAFE equality `<=>`, Hive metastore, broadcast hints, cache table |
| SQL Server | SQLServer | T-SQL syntax: TOP, IDENTITY, OUTPUT clause, EXEC, GO batches, temp tables `#` |
| SQLite | SQLite | Minimal SQL: no RIGHT JOIN, PRAGMA commands, autoincrement |
| Teradata | Teradata | MULTISET tables, BTEQ syntax, FastLoad/MultiLoad hints |

### Inherited dialect configurations

These dialects inherit features from a compatible base dialect and add specific extensions.

| Database | Dialect | Inherits From | Dialect-Specific Features |
|----------|---------|---------------|---------------------------|
| Athena | Athena | Presto | AWS Athena Presto syntax, S3 partitioning |
| CockroachDB | CockroachDB | PostgreSQL | UPSERT statement, distributed transactions, RETURNING clause |
| Databricks | Databricks | Spark | OPTIMIZE tables, ZORDER BY clustering, Delta Lake operations |
| Doris | Doris | MySQL | DUPLICATE KEY model, BUCKETS distribution, DISTRIBUTED BY HASH |
| Dremio | Dremio | Presto | CREATE REFLECTION for materialisation, data lakehouse queries |
| Druid | Druid | MySQL | TIME_FLOOR for time bucketing, approximate aggregations, roll-up |
| Dune | Dune | PostgreSQL | Blockchain analytics: bytearray_to_uint256, ETH address functions |
| Exasol | Exasol | PostgreSQL | DISTRIBUTE BY for parallel execution, Lua scripting UDFs |
| Fabric | Fabric | SQL Server | Three-part lakehouse.schema.table naming, OneLake integration |
| Greenplum | Greenplum | PostgreSQL | DISTRIBUTED BY/RANDOMLY, column/append-optimised tables, GPORCA optimiser |
| Impala | Impala | Hive | COMPUTE STATS, Kudu integration, CACHED IN pools |
| MariaDB | MariaDB | MySQL | MySQL-compatible with RETURNING, window functions, JSON functions |
| Materialize | Materialize | PostgreSQL | TAIL for streaming results, materialised views, temporal filters |
| Netezza | Netezza | PostgreSQL | DISTRIBUTE ON distribution keys, zone maps, statistics |
| Redshift | Redshift | PostgreSQL | DISTKEY distribution, SORTKEY ordering, SUPER type (JSON) |
| RisingWave | RisingWave | PostgreSQL | EMIT CHANGES for streaming, temporal joins, watermarks |
| SingleStore | SingleStore | MySQL | VECTOR type for embeddings, DOT_PRODUCT, columnstore/rowstore |
| Spark2 | Spark2 | Spark | Legacy Spark 2.x: CACHE TABLE, broadcast joins, RDD compatibility |
| StarRocks | StarRocks | MySQL | Vectorised execution, primary key model, materialised views, bitmap indexes |
| Tableau | Tableau | PostgreSQL | ZN() null-to-zero, Tableau calculation functions, RAWSQL passthrough |
| Teradata | Teradata | Teradata | MULTISET tables, BTEQ syntax, FastLoad/MultiLoad hints |
| TiDB | TiDB | MySQL | AUTO_RANDOM for distributed primary keys, MySQL compatibility, TiKV storage |
| TimescaleDB | TimescaleDB | PostgreSQL | time_bucket() for time-series, hypertables, continuous aggregates |
| Trino | Trino | Presto | Presto-compatible: UNNEST, lambda expressions, ROW types |
| Vertica | Vertica | PostgreSQL | CREATE PROJECTION for physical design, SEGMENTED BY HASH, columnar storage |
| YugabyteDB | YugabyteDB | PostgreSQL | SPLIT INTO n TABLETS, distributed SQL, PostgreSQL compatibility |

## Licence

Apache 2.0
