Home > Data Warehousing, Dimensional Modeling > Why Normalization in database design is not so well understood? And are Data-Warehouses really Denormalized?

Why Normalization in database design is not so well understood? And are Data-Warehouses really Denormalized?


Over last three months, four software professionals I met at various places asked me this question: “Why is there so much hype about normalization technique,  when you can get away with a wide table (large number of columns) for almost anything”. Yes, most of the database vendors these days support large number of columns so you can indeed write one wide table instead of paying big bucks to the database designers.

Here is the catch; as soon as any software application becomes worth its salt, one wide table will start falling short. Without getting into the formal definitions of normalization/denormalization, I will provide an example:

Consider designing a database for maintaining transaction on the checkout counters in a retail store chain. Let us assume a simplified case, the details printed on your recipt has “store”, “product” and “salesman” information. Behind the scenes, there are four tables, one for transactions and three others for the “stores”, “products” and “salesmen”. Let us say, we want to just maintain one WIDE table for this work. This wide table has these columns: transaction-date&time, store-name, product-name and salesman-name. Here is what we will experience:
1. We cannot enter the details of a store, until the first transaction happens in that store. Same is the case for product and salesman.   C of CRUD (Create Retrieve, Update Delete) is compromised here.
2. You want to list of products. It is not straight forward as reading from a product table. You will have to scan all transaction rows (slow operation), pick upall  the products and then do a “distinct” on them. R of CRUD is compromised.
3. Let us say a product is sold out, ideally in the product table we would have toggled a flag. One row would have got impacted. Here that column needs to be updated for all the transactions that had that product. It could be updating potentially millions of rows. U of CRUD is compromised.
4. A product needs to be deleted. There is no straightforward way to do it. D of CRUD is compromised.
In summary, normalization is not a choice. It is a necessity for maintaining the basics of CRUD.
Coming to Data-Warehouses; are they really denormalized? Don’t they need to maintain the basics of CRUD? Short answer is data-warehouses have different set of requirements as compared to OLTP applications.  maintaining the basics of CRUD is a necessity for OLTP applications. In data-warehouses, U and D of CRUD usually doesn’t happen. C and R definitely happens, that is why, “dimensions” are NORMALIZED from the “facts”. So, “dimensions” within themselves are denormalized. “Facts” within themselves are also denormalized. But the fact that “dimensons” are taken out of “facts”, it conveys that normaliztion does happen in data-warehouses also.
Advertisements
  1. August 23, 2012 at 12:25 pm

    Amarpal, good analysis. I can understand your arguments about C and R, but I am not sure if U and D are as dire a problem as you indicate. Maybe it is a function of the example you chose, but if a product goes out of stock, why would you need to Update all past transactions? Couldn’t that be a separate column to mark things going out of stock (very bad design i know but just sayin)

    Similarly, per your example why would one want to Delete past transactions of a now deleted product, database can and may want to store past transaction for historical analysis.

  2. August 23, 2012 at 10:10 pm

    Thanks Sarabjeet. Yes, a separate column for U will do the job theoretically. But you will end up updating that column for thousands of rows every time a product is out of stock and then again when it gets replenished. Also if a product name changes and you want to maintain history of changes, there is any straight forward way to do it.

    If you want a list of products that are sold in the store, you don’t want the products that are not sold anymore. Again, you can have an additional column which has the same problem as that of Updates.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: