Populating data to the Date Dimension table

Date dimension plays an important role in data warehouse designing. It allows you to examine behavior and trends in the data over a period of time.

The columns for the DimDate table (date dimension table) should be chosen according to the requirement of the data analysis. When entering data to the DimDate table, extracting dates from the source database is not a good method because there can be missing days in between as the transactions may not have occurred on each day of the year. Therefore, the best way is to populate data to the DimDate table separately, as the date is an universal attribute.

There are various methods of populating data to the table;
  •    Using an Excel file
In this method, an excel spreadsheet which contains the date values is used as the data source and it’s extracted to the data warehouse using SSIS.
  •    Using a SQL script
In this method, a SQL script or a stored procedure is used to generate dates between two days and to populate data to the DimDate table.

When comparing above two methods, Excel reading is not a good solution as it has couple of drawbacks. When the need occurs to add more days to the table, the excel sheet need to be updated with the new days removing all the data that was there before. And when a new column needs to be added, that should be added to the excel and the data should be generated.  Also if the data loading is done in the server, it may be difficult to move the file to the server or to a location where the server can read the file.

When it comes to the SQL scripting method, it’s easy to use and update, also has long term stability. When more days are need to be added, it’s just a simple action of changing the endDate parameter and new columns can also be added easily.

Considering the above facts, in my opinion, the best way to populate the DimDate table is to generate data between two days using a SQL script.

When it comes to finding SQL scripts or stored procedures to generate data for a date dimension, there are many solutions out there in the internet. Still, I would like to share my really simple stored procedure which I have developed to create a DimDate table and to populate data.



CREATE PROCEDURE DimDateGenerator
            @startDate date,
            @endDate date
AS
BEGIN

            CREATE TABLE dbo.DimDate(
                        [DateID] int NOT NULL PRIMARY KEY,
                        [Date] date NOT NULL,
                        [DayName] varchar(20) NOT NULL,
                        [DayOfWeek] int NOT NULL,
                        [Week] int NOT NULL,
                        [Month] int NOT NULL,
                        [MonthName] varchar(20) NOT NULL,
                        [DayOfYear] int NOT NULL,
                        [Year] int NOT NULL,
                        [Quarter] int NOT NULL,
                        [FirstOfMonth] date NULL,
                        [FirstOfYear] date NULL
            )

            WHILE(@startDate <= @endDate)
            BEGIN
                        INSERT dbo.DimDate
                        SELECT
CAST(CONVERT(varchar(8),@startDate,112) as int) [DateID], @startDate [Date],
                                    DATENAME(DW,@startDate) [DayName],
                                    DATEPART(WEEKDAY,@startDate) [DayOfWeek],
                                    DATEPART(WW,@startDate) [Week],
                                    DATEPART(MM,@startDate) [Month],
                                    DATENAME(MM,@startDate) [MonthName],
                                    DATEPART(DY,@startDate) [DayOfYear],
                                    DATEPART(YY,@startDate) [Year],
                                    DATEPART(QQ,@startDate) [Quarter],
                                    CONVERT(date, DATEADD(MONTH, DATEDIFF(MONTH, 0, @startDate), 0)) [FirstOfMonth],
                                    CONVERT(date, DATEADD(YEAR,  DATEDIFF(YEAR,  0, @startDate), 0)) [FirstOfYear]

                        SET @startDate = DATEADD(DD, 1, @startDate )
            END
END


The stored procedure can be executed using the EXEC command, passing values for  @startDate and @endDate  parameters as shown below.


EXEC DimDateGenerator @startDate = '20160101', @endDate = '20181231';  

Comments

Popular posts from this blog

Dimensional modeling - Star schema vs Fact constellation

Using Entity Model for connecting the database