SQL Dialect

An SQL Dialect defines a simplified syntax to detect complex SQL statements, typically to handle "code objects" such as Stored Procedures or Functions, where a statement may include several inner statements. We need to detect where the statements begins and ends in order to skip over the embedded delimiters (typically ;) (the details inside the block is usually not interesting).

For this purpose, we designed a tiny domain-specific XML language.

The schema is located in this directory, and each dialect implementation is stored with the database it implements. The exact location may vary over time, but at the time of writing the dialects are named after the database type they implement and stored in the folder of the database (next to the facade etc).

The syntax of the dialect folder and file name is databasetype/databasetype-dialect.xml.

Example: the dialect for the generic type is stored in com/onseven/dbvis/db/generic/generic-dialect.xml.

Creating a new Dialect

The dialect name is derived from the database type defined by com.onseven.dbvis.sql.DatabaseType. By default, the name of the dialect file maps directly to the name of the type (DatabaseType.getTypeName()), but in some cases we do a special mapping in SQLDialect.mapDialectName().

  1. find the reference documentation for the database to understand what code objects it supports, and what the syntax is
  2. make sure the type is listed in com.onseven.dbvis.sql.DatabaseType
  3. create a new file for the database type in the appropriate folder, using the name of the dialect
    (as a starting point, you may want to copy an existing dialect that is similar to the one you are creating)
  4. design the dialect by editing the file to match the syntax of the dialect (see below)
  5. add comments with links to the syntax you used as a reference to design the dialect
  6. add test files for the dialect (see below) and run the tests

Designing the Dialect

Simplified, the task is to identify all keyword combinations that signal the beginning and end of a complex element. A typical example is CREATE PROCEDURE. Identifying the start of a procedure is normally fairly easy, whereas the end of the procedure is not necessarily as straight-forward. In some cases, the dialect has a designated terminator such as a single slash (/) on a separate line. In other cases, the complex statement is itself not terminated by any special token but embeds the inner statements in blocks enclosed in open/close tokens such as BEGIN .... END (in this case, you don't have to specify an end terminator).

When you open the dialect file using an appropriate editor (such as IntelliJ), the schema reference should provide tooltip explanations for all fields. While this provides some help, the best way to understand the syntax of the dialect specification is probably to open the schema in visual editor (such as XML Spy) and to study existing dialects.

Testing

All tests reside in the same package and this in the regular test structure.

The testdata folder holds manually crafted test data files, the testdata/examples holds (with a few minor exceptions) verbatim examples fetched from the vendor site. Files are registered in the file testdata/all.fixtures and run using SQLScriptParserFileTest.

Note: preserve the structure in the folders and files! Place new files in the appropriate section (manual tests, examples, or sakila).

Expected Results

Please refer to the test classes for an explanation of how to define expected results. The existing test cases may seem complex, but is really just a question of running the tes case and manually inspecting the result. If the result looks good, the "actual" value can be entered in the test case as "expected" value.