Populating data to the Date Dimension table
Date dimension plays an important role in data warehouse designing. It allows you to examine behavior and trends in the data over a period of time.
The
columns for the DimDate table (date dimension table) should be chosen according
to the requirement of the data analysis. When entering data to the DimDate table,
extracting dates from the source database is not a good method because there
can be missing days in between as the transactions may not have occurred on each
day of the year. Therefore, the best way is to populate data to the DimDate table
separately, as the date is an universal attribute.
There
are various methods of populating data to the table;
- Using an Excel file
In this method,
an excel spreadsheet which contains the date values is used as the data source
and it’s extracted to the data warehouse using SSIS.
- Using a SQL script
In this method,
a SQL script or a stored procedure is used to generate dates between two days
and to populate data to the DimDate table.
When
comparing above two methods, Excel reading is not a good solution as it has
couple of drawbacks. When the need occurs to add more days to the table, the excel
sheet need to be updated with the new days removing all the data that was there
before. And when a new column needs to be added, that should be added to the
excel and the data should be generated. Also if the data loading is done in the
server, it may be difficult to move the file to the server or to a location
where the server can read the file.
When it
comes to the SQL scripting method, it’s easy to use and update, also has long
term stability. When more days are need to be added, it’s just a simple action
of changing the endDate parameter and new columns can also be added easily.
When it comes to finding SQL scripts or stored procedures to generate data for a date dimension, there are many solutions out there in the internet. Still, I would like to share my really simple stored procedure which I have developed to create a DimDate table and to populate data.
CREATE PROCEDURE DimDateGenerator
@startDate
date,
@endDate
date
AS
BEGIN
CREATE TABLE dbo.DimDate(
[DateID] int NOT NULL PRIMARY KEY,
[Date]
date NOT NULL,
[DayName]
varchar(20) NOT NULL,
[DayOfWeek]
int NOT NULL,
[Week]
int NOT NULL,
[Month]
int NOT NULL,
[MonthName]
varchar(20) NOT NULL,
[DayOfYear]
int NOT NULL,
[Year]
int NOT NULL,
[Quarter]
int NOT NULL,
[FirstOfMonth]
date NULL,
[FirstOfYear]
date NULL
)
WHILE(@startDate <= @endDate)
BEGIN
INSERT dbo.DimDate
SELECT
CAST(CONVERT(varchar(8),@startDate,112) as int) [DateID], @startDate [Date],
DATENAME(DW,@startDate) [DayName],
DATEPART(WEEKDAY,@startDate)
[DayOfWeek],
DATEPART(WW,@startDate) [Week],
DATEPART(MM,@startDate) [Month],
DATENAME(MM,@startDate) [MonthName],
DATEPART(DY,@startDate) [DayOfYear],
DATEPART(YY,@startDate) [Year],
DATEPART(QQ,@startDate) [Quarter],
CONVERT(date, DATEADD(MONTH,
DATEDIFF(MONTH, 0, @startDate), 0)) [FirstOfMonth],
CONVERT(date, DATEADD(YEAR,
DATEDIFF(YEAR, 0, @startDate), 0)) [FirstOfYear]
SET @startDate = DATEADD(DD, 1, @startDate )
END
END
The stored procedure can be executed using the EXEC command, passing values for @startDate and @endDate parameters as shown below.
EXEC DimDateGenerator @startDate = '20160101', @endDate = '20181231';
Comments
Post a Comment