Design Dimensional Model

Design Dimensional Model

The art of designing a dimensional model is exactly that. An art. And, like any reputable artist will know, you can hone your artistic skills with regular exposure to alternative design approaches. Your goal is to hone your skills, and, that takes practice. So, if you know all the techniques in designing a dimensional model, my advice is: Practice this art some more.
 For some this is a topic that is "old news". They say: "We know all about designing a dimensional model." I say, well, this can largely be true, because you've been working with facts and dimensions for 1 2 3 or more years.
 However, when I start to worry about such ones, is when they use terms like: "We should actually use a SCD (slowly changing dimension) on this fact, and not this dimension". Some of you might see this statement as acceptable. However, for people who are purists when it comes to Ralph Kimball's design, that statement is an irritation on the ears. Why? Because all dimensions in a dimensional model are SCD's. The only differentiating factor is what type of SCD are you using.
 Another concern of mine is that the culture of "my data is my data" is still strong, and, this mentality is what cripples many EDW projects. The concept is known as "stovepipe" data marts, and a contributing factor to this is when different departments refuse to share their data and insights into business operations. The reason why this is so troubling, is, because one of the first considerations we ought to give attention to when designing the dimensional model is integration. The culture of an integral unit is still viewed as non essential. People might not say it, but the way they design their models show this to be true. Also the lack of commitment by all business departments also shows this to be true.
 Now, lets recap on some of the "golden rules" of a dimensional model, and lets see if we need to debate one or more of these points. Lets concentrate on the fact table:
  1. All facts and dimensions on the fact table should be numeric data. 
    • There are rare occasions where you might feel the need to breach this rule. In such cases you need to make doubly sure that the field added cannot possibly be added as an dimension attribute.
  2. All non non measure fields should form the PK (primary key) for the fact table.
    • Now in the purest sense Kimball insists on this. However, I would rephrase the rule to say: All non non measure fields should uniquely identify each row. The reason for my slight wording change is that you may find it more beneficial to have a surrogate key on your fact table as the PK - I've followed this pattern in the past on a very successful project where PK space needed to be optimized. With a concatenated PK, the index is much bigger in the RDBMS that for an index on the surrogate key.
  3.  Fact tables "talk" to one another in a cube, or warehouse via its conformed dimensions.
    • This rule should be applied as religiously as possible, with no deviation of it under any circumstances.
  4.  Degenerate dimensions on fact tables like flags or transaction numbers (i.e. for Transactional Fact Tables) are acceptable
    •  ..
  5. Process to follow in building a fact table:
    • Choose the Process
    • Choose the grain
      • The smallest replicable unit - lowest level of detail. The grain determines the dimensionality of the design
    • Choose the dimensions
      • A descriptive context that has a single value in the presence of the measures, i.e. is true to the grain
    •  Choose the facts
      • Numeric measures (i.e. additive, non additive, semi additive) true to the grain

Essential Rules

 These are the 10 rules of dimensional modeling:
  1. Load detailed atomic data into dimensional structures
  2. Structure dimensional models around business processes
  3. Ensure every fact table has an associated date dimension table
  4. Ensure all facts in a single fact table are at the same grain or level of detail
  5. Resolve many-to-many relationships in fact tables
  6. Resolve many-to-one relationships in dimension tables
  7. Store report labels and filter domain values in dimension tables
  8. Ensure that dimension tables use a surrogate key
  9. Create conformed dimensions to integrate data across the enterprise
  10. Continuously balance requirements versus realities to deliver a DW/BI solution that is accepted by business and supports their decision making

 Complementary techniques

  We have various techniques at our disposal in designing the data warehouse. These are some of them that we need to leverage and use, as needed:
  1. Role playing dimensions
    • A date dimension could play various roles in the EDW - i.e. shipped date, requested date, ordered date. Think about this need when designing. Keep in mind that in implementing this, we typically make use of virtualisation, i.e. views or derivations
  2.  Conformed dimensions
    • These enable an incremental development approach. They are easier to maintain. They answer business questions across business processes. They support centralized and distributed architectures.
  3.  Drilling across
    • This is an advanced technique that a modeller will use in design. She will think about how various fact tables in her design will complement one another with detail
  4. Drilling down
    •  The amount of variation between less and more detail you provide business with is directly related to the amount of dimensionality you provide your model with; i.e. how much dimensions exist on your fact table.

Toolset

 The question I've been presented with for years is: What ER modelling tool do we need to design a comprehensive model that is accurate and sound? The short answer is NO modelling tool needed. Designing a dimensional model is about mapping a business process into a fact table. No tool can substitute the process that your "grey matter" needs to process. You need to use your brain and map the business process into a fact table.
 However, I've been asked what about reverse engineering your model (i.e. in RDBMS). For this, I have found that copying the eventual design from your brain to the modelling tool is good for a number of reasons:
  1. Business can see the output of your design process
  2. Developers can see the output of your design process and see the mapping of it for the BI tool
  3. The business dba type role person can then more easily reverse engineer the model you've placed in the tool for deployment purposes.
 Keep in mind that designing a dimensional model is firstly about understanding the business process and representing it on a fact table. The only characteristic that matters, besides adhering to the physical rules of the attributes, are clarity. If the model is clearly understood, and is sound from business's perspective, then you have succeeded in your design.

Conclusion
 A successful design is about business processes that are clear and accurately represented on your fact tables. Buy in across the organization is still a sensitive issue, especially for those who find it difficult to accept change in an industry which has no mercy for those who cannot embrace it. The IT industry has exploded in terms of the need for modellers who can produce art to the likes of Vincent van Gogh, Leonardo da Vinci, Edvard Munch and others. Yes, just like those artists of note have honed their skills over the years, you as the dimensional modeller needs hone your modelling skills, by investing as much time as possible using the skills Ralph Kimball and others have pioneered for all.

Happy modelling people.

Comments

Popular posts from this blog

"Bigness" data - some thoughts about the big data journey

Code Reviews

Moving Data [Apache Spark]