Home > Data Warehousing > Irony of doing data warehousing on sharded data

Irony of doing data warehousing on sharded data


Data is sharded for cost and performance reasons. If cost is not a problem, one can always get a powerful hardware to get whatever performance is needed. Sharding works because the business-needs align with the sharding key. Sharding key partitions the data in a self sufficient way. If business grows and this self-sufficiency is not maintained, sharding will lose its meaning.

 Data warehousing on sharded data is a tricky thing. Data warehousing unifies data and aligns business definitions. Data warehousing will attempt to find distincts and aggregates across all the shards. Well, it is not possible without a performance hit. So, real time data warehousing is out of question. For a not so excruciating performance, data warehousing will need to bring in all the data:

  • Either on a single physical computer. This will have a cost hit. You avoided the cost hit for your OLTP app, but if you need a good data-warehouse, you have just deferred your expenditure for future. It could be well thought out decision or a shocking discovery; depending on how much planning you have done ahead of time. One saving grace is that the cost hit on this data- warehousing hardware can be relatively less if you don’t need the solutions in (near) real time.
  • Or on a single logical computer (hadoop or in-memory kind of solutions). It is still sharding in some sense, isn’t it 🙂 And you have lost the real-time-ness and the ACID-ness 🙂

Thoughts?

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: