Open source code generation using TSQL and handlebars
Assuming that you just installed BETL I’ll give you an example on how to use it. The goal of this getting started guide is to generate the ETL and DDL for building the first two layers in a data warehouse, which I call staging and raw data warehouse.
First we need to extract the meta data that tells us which tables there are and under which database, schema and server. This is called the object tree. Ingesting is a synonym for importing.
Let’s have a look at the current meta data repository. The following command shows you the BETL meta data repository.
-- or if you prefer to type less:
It will return:
Let’s get back to our first step: Ingesting the object tree. This can be done with the following code:
declare @obj_tree_param ObjTreeTableParam
insert into @obj_tree_param
exec [dbo].[ingest_obj_tree] @obj_tree_param
-- see what happened
We only ingested the current database. Next step: ingest the source database object tree.
Because Azure SQL database does not support cross database queries, the solution for azure is a little bit different than on premise. The main difference is that for azure we have to use an ETL tool like Azure data factory and for the on premise solution we can just build the solution using only T-SQL and in some cases a linked server.
This is the pipeline that imports the tables into our staging layer. Actually there is not much code generation done here, but we are using the meta data of betl and the generic copy task of ADF.
This pipeline shows the generic power of ADF combined with good meta data.
The next layer I call the raw datawarehouse. This is modelled exactly like the source system. But we add some things like:
Let’s have a look at the overall orchestration pipeline for adventureworks, called pl_aw. This is sometimes called the master pipeline.
First we extract the required tables into our staging layer as we have seen above. The processing of the rdw layer can be divided into 2 steps:1. ddl2. etl
The ddl step makes sure that RDW tables are created including extra columns for recording history and other meta data. Also primary keys and identity columns are created here. This ddl step can be skipped for Test, Acceptance and Production environment, because ddl changes are normally done via the release process. (However it is possible to implement a runtime ddl changes on a production environment as well. For example for adding columns in the rdw layer). You have to be careful in this case that you don’t implement ddl changes that will result in data loss.
I like the anchor modelling method of recording history. In short, you only have to add an effective date time stamp and you only have to insert records. Never delete or update anything. Most of the time you need the most recent version of a record. For your convenience we create a latest view that always returns this current version of a record. History tables are suffixed with for example _his. I’ve created the following naming convention ( please feel free to create your own). Objectives: group related tables (coming from the same source system) together by name, group layers by name, use latest views by default.