Edit on GitHub

Dialects

While there is a SQL standard, most SQL engines support a variation of that standard. This makes it difficult to write portable SQL code. SQLGlot bridges all the different variations, called "dialects", with an extensible SQL transpilation framework.

The base sqlglot.dialects.dialect.Dialect class implements a generic dialect that aims to be as universal as possible.

Each SQL variation has its own Dialect subclass, extending the corresponding Tokenizer, Parser and Generator classes as needed.

Implementing a custom Dialect

Creating a new SQL dialect may seem complicated at first, but it is actually quite simple in SQLGlot:

from sqlglot import exp
from sqlglot.dialects.dialect import Dialect
from sqlglot.generator import Generator
from sqlglot.tokens import Tokenizer, TokenType


class Custom(Dialect):
    class Tokenizer(Tokenizer):
        QUOTES = ["'", '"']  # Strings can be delimited by either single or double quotes
        IDENTIFIERS = ["`"]  # Identifiers can be delimited by backticks

        # Associates certain meaningful words with tokens that capture their intent
        KEYWORDS = {
            **Tokenizer.KEYWORDS,
            "INT64": TokenType.BIGINT,
            "FLOAT64": TokenType.DOUBLE,
        }

    class Generator(Generator):
        # Specifies how AST nodes, i.e. subclasses of exp.Expression, should be converted into SQL
        TRANSFORMS = {
            exp.Array: lambda self, e: f"[{self.expressions(e)}]",
        }

        # Specifies how AST nodes representing data types should be converted into SQL
        TYPE_MAPPING = {
            exp.DataType.Type.TINYINT: "INT64",
            exp.DataType.Type.SMALLINT: "INT64",
            exp.DataType.Type.INT: "INT64",
            exp.DataType.Type.BIGINT: "INT64",
            exp.DataType.Type.DECIMAL: "NUMERIC",
            exp.DataType.Type.FLOAT: "FLOAT64",
            exp.DataType.Type.DOUBLE: "FLOAT64",
            exp.DataType.Type.BOOLEAN: "BOOL",
            exp.DataType.Type.TEXT: "STRING",
        }

The above example demonstrates how certain parts of the base Dialect class can be overridden to match a different specification. Even though it is a fairly realistic starting point, we strongly encourage the reader to study existing dialect implementations in order to understand how their various components can be modified, depending on the use-case.


  1# ruff: noqa: F401
  2"""
  3## Dialects
  4
  5While there is a SQL standard, most SQL engines support a variation of that standard. This makes it difficult
  6to write portable SQL code. SQLGlot bridges all the different variations, called "dialects", with an extensible
  7SQL transpilation framework.
  8
  9The base `sqlglot.dialects.dialect.Dialect` class implements a generic dialect that aims to be as universal as possible.
 10
 11Each SQL variation has its own `Dialect` subclass, extending the corresponding `Tokenizer`, `Parser` and `Generator`
 12classes as needed.
 13
 14### Implementing a custom Dialect
 15
 16Creating a new SQL dialect may seem complicated at first, but it is actually quite simple in SQLGlot:
 17
 18```python
 19from sqlglot import exp
 20from sqlglot.dialects.dialect import Dialect
 21from sqlglot.generator import Generator
 22from sqlglot.tokens import Tokenizer, TokenType
 23
 24
 25class Custom(Dialect):
 26    class Tokenizer(Tokenizer):
 27        QUOTES = ["'", '"']  # Strings can be delimited by either single or double quotes
 28        IDENTIFIERS = ["`"]  # Identifiers can be delimited by backticks
 29
 30        # Associates certain meaningful words with tokens that capture their intent
 31        KEYWORDS = {
 32            **Tokenizer.KEYWORDS,
 33            "INT64": TokenType.BIGINT,
 34            "FLOAT64": TokenType.DOUBLE,
 35        }
 36
 37    class Generator(Generator):
 38        # Specifies how AST nodes, i.e. subclasses of exp.Expression, should be converted into SQL
 39        TRANSFORMS = {
 40            exp.Array: lambda self, e: f"[{self.expressions(e)}]",
 41        }
 42
 43        # Specifies how AST nodes representing data types should be converted into SQL
 44        TYPE_MAPPING = {
 45            exp.DataType.Type.TINYINT: "INT64",
 46            exp.DataType.Type.SMALLINT: "INT64",
 47            exp.DataType.Type.INT: "INT64",
 48            exp.DataType.Type.BIGINT: "INT64",
 49            exp.DataType.Type.DECIMAL: "NUMERIC",
 50            exp.DataType.Type.FLOAT: "FLOAT64",
 51            exp.DataType.Type.DOUBLE: "FLOAT64",
 52            exp.DataType.Type.BOOLEAN: "BOOL",
 53            exp.DataType.Type.TEXT: "STRING",
 54        }
 55```
 56
 57The above example demonstrates how certain parts of the base `Dialect` class can be overridden to match a different
 58specification. Even though it is a fairly realistic starting point, we strongly encourage the reader to study existing
 59dialect implementations in order to understand how their various components can be modified, depending on the use-case.
 60
 61----
 62"""
 63
 64import importlib
 65
 66DIALECTS = [
 67    "Athena",
 68    "BigQuery",
 69    "ClickHouse",
 70    "Databricks",
 71    "Doris",
 72    "Drill",
 73    "Druid",
 74    "DuckDB",
 75    "Dune",
 76    "Hive",
 77    "Materialize",
 78    "MySQL",
 79    "Oracle",
 80    "Postgres",
 81    "Presto",
 82    "PRQL",
 83    "Redshift",
 84    "RisingWave",
 85    "Snowflake",
 86    "Spark",
 87    "Spark2",
 88    "SQLite",
 89    "StarRocks",
 90    "Tableau",
 91    "Teradata",
 92    "Trino",
 93    "TSQL",
 94]
 95
 96MODULE_BY_DIALECT = {name: name.lower() for name in DIALECTS}
 97DIALECT_MODULE_NAMES = MODULE_BY_DIALECT.values()
 98
 99MODULE_BY_ATTRIBUTE = {
100    **MODULE_BY_DIALECT,
101    "Dialect": "dialect",
102    "Dialects": "dialect",
103}
104
105__all__ = list(MODULE_BY_ATTRIBUTE)
106
107
108def __getattr__(name):
109    module_name = MODULE_BY_ATTRIBUTE.get(name)
110    if module_name:
111        module = importlib.import_module(f"sqlglot.dialects.{module_name}")
112        return getattr(module, name)
113
114    raise AttributeError(f"module {__name__} has no attribute {name}")
Athena = <MagicMock id='139751697187312'>
BigQuery = <MagicMock id='139751700524208'>
ClickHouse = <MagicMock id='139751692227040'>
Databricks = <MagicMock id='139751692228720'>
Doris = <MagicMock id='139751700208064'>
Drill = <MagicMock id='139751686009664'>
Druid = <MagicMock id='139751683623712'>
DuckDB = <MagicMock id='139751691282480'>
Dune = <MagicMock id='139751699955232'>
Hive = <MagicMock id='139751699948800'>
Materialize = <MagicMock id='139751700284896'>
MySQL = <MagicMock id='139751691355472'>
Oracle = <MagicMock id='139751688206368'>
Postgres = <MagicMock id='139751688193552'>
Presto = <MagicMock id='139751699887872'>
PRQL = <MagicMock id='139751700837616'>
Redshift = <MagicMock id='139751700828976'>
RisingWave = <MagicMock id='139751692901664'>
Snowflake = <MagicMock id='139751692900224'>
Spark = <MagicMock id='139751700279552'>
Spark2 = <MagicMock id='139751699068144'>
SQLite = <MagicMock id='139751688599632'>
StarRocks = <MagicMock id='139751686246592'>
Tableau = <MagicMock id='139751700645920'>
Teradata = <MagicMock id='139751691970032'>
Trino = <MagicMock id='139751683950720'>
TSQL = <MagicMock id='139751683958640'>
Dialect = <MagicMock id='139751683999392'>
Dialects = <MagicMock id='139751684007312'>