Home > Data Warehousing > Schema Validation Framework (SVF)

Schema Validation Framework (SVF)


SVF is the heart of any data warehousing project. I just coined this acronym, but invented this phase of the data warehousing processing engine many years ago and have been successfully using it. Most of the data warehousing projects don’t even go beyond the start line because they don’t have an SVF infrastructure in place; they begin, they struggle and they eventually die. I will explain in a bit how that happens.

SVF is a way to protect your ETL from any/all schema changes in the source systems.

Most of DBMSs provide in-built detailed metadata about the tables, relationships, constraints, columns, data types, keys, etc. This metadata is provided as a separate database called INFORMATION_SCHEMA in some of the major DBMSs. The selective and transformed, yet detailed information provided in this metadata for a source-system database can be stored in the SVF layer that executes prior to the ETL layer. On sub-sequent processing of the source-system database, SVF will fetch the metadata info, compare it with the previous stored version that the ETL has been made compatible with. If no serious mutations are found, ETL can proceed; else it can gracefully exit after logging and alerting about the mutations.

I have seen companies and consultants spending endless time discussing merits/demerits of 3NF vs. Dimensional-Modeling, real-time vs. delayed data processing, Inmon vs. Kimball, ETL tool vs. SQL-queries, OLTP vs. OLAP, Business Objects vs. Microstrategy, … and the debates go on. I haven’t seen many folks discussing the data procurement in details. Yet, as soon as a data warehousing project starts and ETL passes the first phase, the source-system database changes. The data warehouse team pushes to put a process in place where the application development team informs the data warehouse team about any database schema changes prior to the deployment. The application development team almost always fails to do so for obvious and mostly justifiable reasons; there isn’t much ROI on the efforts involved in following this process (especially when SVF can automate it). But because SVF is not given a thought, the blame game starts, the problem continues to exist, ETL continues to break and the data warehousing SLAs continue to lose their meaning,…

I am tempted to write a little bit about the inappropriateness of slave/subscriber databases as the source data for the serious data warehousing projects, but will hold that temptation until my next blog. Meanwhile, it is a good food for thought for those who are curious enough 🙂

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: