Dialects
While there is a SQL standard, most SQL engines support a variation of that standard. This makes it difficult to write portable SQL code. SQLGlot bridges all the different variations, called "dialects", with an extensible SQL transpilation framework.
The base sqlglot.dialects.dialect.Dialect
class implements a generic dialect that aims to be as universal as possible.
Each SQL variation has its own Dialect
subclass, extending the corresponding Tokenizer
, Parser
and Generator
classes as needed.
Implementing a custom Dialect
Creating a new SQL dialect may seem complicated at first, but it is actually quite simple in SQLGlot:
from sqlglot import exp
from sqlglot.dialects.dialect import Dialect
from sqlglot.generator import Generator
from sqlglot.tokens import Tokenizer, TokenType
class Custom(Dialect):
class Tokenizer(Tokenizer):
QUOTES = ["'", '"'] # Strings can be delimited by either single or double quotes
IDENTIFIERS = ["`"] # Identifiers can be delimited by backticks
# Associates certain meaningful words with tokens that capture their intent
KEYWORDS = {
**Tokenizer.KEYWORDS,
"INT64": TokenType.BIGINT,
"FLOAT64": TokenType.DOUBLE,
}
class Generator(Generator):
# Specifies how AST nodes, i.e. subclasses of exp.Expression, should be converted into SQL
TRANSFORMS = {
exp.Array: lambda self, e: f"[{self.expressions(e)}]",
}
# Specifies how AST nodes representing data types should be converted into SQL
TYPE_MAPPING = {
exp.DataType.Type.TINYINT: "INT64",
exp.DataType.Type.SMALLINT: "INT64",
exp.DataType.Type.INT: "INT64",
exp.DataType.Type.BIGINT: "INT64",
exp.DataType.Type.DECIMAL: "NUMERIC",
exp.DataType.Type.FLOAT: "FLOAT64",
exp.DataType.Type.DOUBLE: "FLOAT64",
exp.DataType.Type.BOOLEAN: "BOOL",
exp.DataType.Type.TEXT: "STRING",
}
The above example demonstrates how certain parts of the base Dialect
class can be overridden to match a different
specification. Even though it is a fairly realistic starting point, we strongly encourage the reader to study existing
dialect implementations in order to understand how their various components can be modified, depending on the use-case.
1# ruff: noqa: F401 2""" 3## Dialects 4 5While there is a SQL standard, most SQL engines support a variation of that standard. This makes it difficult 6to write portable SQL code. SQLGlot bridges all the different variations, called "dialects", with an extensible 7SQL transpilation framework. 8 9The base `sqlglot.dialects.dialect.Dialect` class implements a generic dialect that aims to be as universal as possible. 10 11Each SQL variation has its own `Dialect` subclass, extending the corresponding `Tokenizer`, `Parser` and `Generator` 12classes as needed. 13 14### Implementing a custom Dialect 15 16Creating a new SQL dialect may seem complicated at first, but it is actually quite simple in SQLGlot: 17 18```python 19from sqlglot import exp 20from sqlglot.dialects.dialect import Dialect 21from sqlglot.generator import Generator 22from sqlglot.tokens import Tokenizer, TokenType 23 24 25class Custom(Dialect): 26 class Tokenizer(Tokenizer): 27 QUOTES = ["'", '"'] # Strings can be delimited by either single or double quotes 28 IDENTIFIERS = ["`"] # Identifiers can be delimited by backticks 29 30 # Associates certain meaningful words with tokens that capture their intent 31 KEYWORDS = { 32 **Tokenizer.KEYWORDS, 33 "INT64": TokenType.BIGINT, 34 "FLOAT64": TokenType.DOUBLE, 35 } 36 37 class Generator(Generator): 38 # Specifies how AST nodes, i.e. subclasses of exp.Expression, should be converted into SQL 39 TRANSFORMS = { 40 exp.Array: lambda self, e: f"[{self.expressions(e)}]", 41 } 42 43 # Specifies how AST nodes representing data types should be converted into SQL 44 TYPE_MAPPING = { 45 exp.DataType.Type.TINYINT: "INT64", 46 exp.DataType.Type.SMALLINT: "INT64", 47 exp.DataType.Type.INT: "INT64", 48 exp.DataType.Type.BIGINT: "INT64", 49 exp.DataType.Type.DECIMAL: "NUMERIC", 50 exp.DataType.Type.FLOAT: "FLOAT64", 51 exp.DataType.Type.DOUBLE: "FLOAT64", 52 exp.DataType.Type.BOOLEAN: "BOOL", 53 exp.DataType.Type.TEXT: "STRING", 54 } 55``` 56 57The above example demonstrates how certain parts of the base `Dialect` class can be overridden to match a different 58specification. Even though it is a fairly realistic starting point, we strongly encourage the reader to study existing 59dialect implementations in order to understand how their various components can be modified, depending on the use-case. 60 61---- 62""" 63 64import importlib 65import threading 66 67DIALECTS = [ 68 "Athena", 69 "BigQuery", 70 "ClickHouse", 71 "Databricks", 72 "Doris", 73 "Drill", 74 "Druid", 75 "DuckDB", 76 "Dune", 77 "Fabric", 78 "Hive", 79 "Materialize", 80 "MySQL", 81 "Oracle", 82 "Postgres", 83 "Presto", 84 "PRQL", 85 "Redshift", 86 "RisingWave", 87 "Snowflake", 88 "Spark", 89 "Spark2", 90 "SQLite", 91 "StarRocks", 92 "Tableau", 93 "Teradata", 94 "Trino", 95 "TSQL", 96 "Exasol", 97] 98 99MODULE_BY_DIALECT = {name: name.lower() for name in DIALECTS} 100DIALECT_MODULE_NAMES = MODULE_BY_DIALECT.values() 101 102MODULE_BY_ATTRIBUTE = { 103 **MODULE_BY_DIALECT, 104 "Dialect": "dialect", 105 "Dialects": "dialect", 106} 107 108__all__ = list(MODULE_BY_ATTRIBUTE) 109 110_import_lock = threading.Lock() 111 112 113def __getattr__(name): 114 module_name = MODULE_BY_ATTRIBUTE.get(name) 115 if module_name: 116 with _import_lock: 117 module = importlib.import_module(f"sqlglot.dialects.{module_name}") 118 return getattr(module, name) 119 120 raise AttributeError(f"module {__name__} has no attribute {name}")