Dimensional modeling

After having seen a few blog posts on security topics, we will turn our attention to ‘Data warehousing concepts’ and more specifically the modeling techniques in this post. This post assumes knowledge of basic databases like tables and fields.

We have already seen the meaning of a Data warehouse, the reasons for creating a data warehouse and the components of a Data warehouse in earlier posts.

Kimball approach to designing Data warehouses:

Ralph Kimball preferred the bottom-up approach to designing data warehouses. Since the data warehouse is considered to be the union of all its data marts in the Kimball approach, it is said to be the bottom- up approach.

The picture below illustrates this concept. The data from different source systems (legacy systems, ERP systems, CRM systems, flat files) is cleaned and loaded onto the different data marts. The data marts are in 3NF (third normal form) and the data warehouse is the union of all its data marts.

kimball-vs-inmon4

A few facts regarding the Kimball approach:

  1. The data marts cater to different business processes(related to purchasing, sales or inventory)
  2. The data marts themselves are in 3NF(third normal form)
  3. The DW is in a ‘star schema’ and it is created using dimensional modeling technique.
  4. ‘Dimensional modeling’ makes use of fact tables and dimensional tables

Primary key, Foreign key and more:

Before we get deeper into the different facts of dimensional modeling, let us first see the definitions of primary key, foreign key and surrogate key. These concepts are the fundamental building blocks in the world of data bases, data warehousing and data analytics.

Primary key:

A ‘primary key’ is a value that uniquely identifies the table. It is important to note that the primary key cannot take null values and there can be only one primary key in a table.

Foreign key:

A ‘foreign key’ is the ‘primary key’ in another table. Foreign keys can take null values.

For example, in the picture shown, the PK in the ‘customer table’ is ‘CustID’ and it is a FK in the ‘Order table’.

CustomerOrderTable

Surrogate keys:

Surrogate keys, on the other hand are an alternate to the primary keys and are naturally generated by system.  They can take null values. Surrogate keys are also considered to be “meaningless  keys” because they are generated by the system. According to Ralph Kimball, the joins in dimensional modeling should take place between surrogate keys.

Composite keys:

Concatenation of two attributes is said to be a ‘composite key’. For example, in the example shown, ‘CustID’ and ‘CustPhone’ can be considered to be a ‘composite key’.

Dimensional modeling concepts:

The ‘Dimensional modeling’ approach is a contrast to the E-R modeling approach.  The E-R modeling approach results in databases that cannot be queried. To overcome this, the dimensional modeling approach is adopted.

The DM modeling approach involves ‘fact tables’ and ‘dimensional tables’. These fact tables and dimensional tables are united through a ‘star join’.

‘ Fact tables’ represent the “facts” related to any event in any organization. They are numerical and additive.  For example, in the picture shown,  the ‘Sales Fact table’ shows the number of products purchased  for a particular day.

The corresponding dimensional tables and their attributes are shown around the ‘fact table’ in a star join.

star

By means of analyzing the data in a data warehouse, smart business decisions can be taken.  We will explore more Data warehousing concepts in subsequent posts.

About Pavan Gumaste

Pavan Rao is a programmer / Developer by Profession and Cloud Computing Professional by choice with in-depth knowledge in AWS, Azure, Google Cloud Platform. He helps the organisation figure out what to build, ensure successful delivery, and incorporate user learning to improve the strategy and product further.

2 thoughts on “Dimensional modeling”

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top