Friday, June 26, 2009

10 Essential Rules of Dimensional Modeling by Kimball University

Margy Ross from Kimball University writes the 10 Commandments of Dimensional Modeling. (Well she hated to use the word Commandments but I still liked it.)

The article contains pointers, most of which we do follow while designing a Data Warehouse. But, it was really good to have all of them at one place.

I am listing the Commandments here, but please read the full article by clicking on the link here...
  1. Load detailed atomic data into dimensional structures.
  2. Structure dimensional models around business processes.
  3. Ensure that every fact table has an associated date dimension table.
  4. Ensure that all facts in a single fact table are at the same grain or level of detail.
  5. Resolve many-to-many relationships in fact tables.
  6. Resolve many-to-one relationships in dimension tables.
  7. Store report labels and filter domain values in dimension tables.
  8. Make certain that dimension tables use a surrogate key.
  9. Create conformed dimensions to integrate data across the enterprise.
  10. Continuously balance requirements and realities to deliver a DW/BI solution that's accepted by business users and that supports their decision-making.

Thursday, June 25, 2009

Create Clustered Index as Unique or Not?

Clustered Index is very important and determine the order in which data is stored in the table. A table without a Clustered Index is called a HEAP. (Try and picture a Heap in your mind and then you can visualize how data is stored in table without a Clustered Index.. Huh!!)

Clustered Index also helps the SQL Engine uniquely identify a row. Thus, if possible, always create a Clustered Index as UNIQUE. If you do not specify the UNIQUE clause while creating your Clustered Index, SQL Server automatically adds a 4 byte uniqueidentifier column to the table to make every row unique.

Thus, specifying a UNIQUE clause also helps save valueable database space.

More information can be found at the following MSDN Page.

Wednesday, June 24, 2009

Handy Shortcuts for SSMS

I like to add the following short-cut keys to my SQL Server Management Studio. This makes a lot of regular day-to-day commands much easier to use.

Click on Tools Menu.
Select Options Menu
Select Keyboard on the Environment Tree
add the following shortcuts

Ctrl+3 : sp_Columns
Ctrl+4 : sp_Depends
Ctrl+5 : sp_SpaceUsed
Ctrl+6 : sp_HelpIndex
Ctrl+7 : sp_HelpText


(You can add/change your own Shortcut Commands)

Click OK. You will have to re-start SSMS to use these shortcuts.

Next time if you want to see the size of a table, instead of typing sp_SpaceUsed TableName, simply select the table and press Ctrl+5 and Voila...

A list of Database Engine Stored Procedures can be found here...

Comparison of Kimball and Inmon Methods

I have always been bugged and intrigued by Kimball and Inmon methods of Data Warehouse. Some questions that always used to bother me was:
What is the Kimball Method?
How is Inmon's Datawarehouse different from Kimball's?
Which model am I working on? Strangely I did not knew this, except for the fact that it was a Dimensional Model.

I could find a plethora of articles on web. But this one by Mary Breslin is the best. "Data Warehousing Battle of the Giants: Comparing the Basics of the Kimball and Inmon Models". The article explains both the methods, does a fair analysis and comparison and also suggest which approach is best in a given scenario.

I could also find a link to the Presentation for the same article. You can download the PDF here.

Tuesday, June 23, 2009

Clear Log Space from Database

After large operations on the database the Log file may get filled and you may get a warning that there is no space to execute the SQL Command.

Use the following command to clear the Log File.
DBCC SHRINKFILE(LogFileName, 1)

You can get the name of the Log File with the following command
sp_helpdb databasename

DBCC SHRINKFILE is an interesting command and can also be used to shrink the Data files. Read the following MSDN article to more about this command.

View Log Size and Space Used

Use the following command to see the current log size (MB) and space used for all Databases.

DBCC SQLPERF(LOGSPACE)

The output will be as follows

Move a Partitioned Table to PRIMARY Filegroup

The following command will move the Partitioned Table to PRIMARY Filegroup. (The table will be un-partitioned then....)

  1. Drop the existing Non Clustered Indexes on the Table (if any)
  2. Drop the Clustered Index on the table
  3. Re-Create the Clustered Index on the table specify the Filegroup as PRIMARY.
    CREATE CLUSTERED INDEX IndexName ON TableName(ColName) ON [PRIMARY]
  4. Re-Create the Non Clustered Indexes (if any)

Steps for Creating Partition

  • Identify the List of Tables to be Partitioned
  • Determine Partitioning Keys in the Tables and re-align Indexes if required
  • Determine No. of Partition for each table or group of Tables
  • Create Filegroups (both on Database and Filesystem)
  • Create Partition Function
  • Create Partition Scheme
  • If new Tables, Create tables on Partition Scheme
  • If existing Tables, Create Indexes with DROP INDEX...WITH (MOVE TO...) syntax

Identify your SQL Server Version

The following code can help you identify your SQL Server Version.

SELECT
SERVERPROPERTY('ProductVersion'),
SERVERPROPERTY('ProductLevel'),
SERVERPROPERTY('Edition')


Well, more details on this can be found at the following KB321185

Partition Alignment, LEFT or RIGHT?

While defining the Partition Range you need to specify if the Partition will be LEFT aligned or RIGHT aligned. Unless you have a good idea of this, it can be very confusing.
Let us break the code..... :)

A Partition Range defined with:
LEFT means Upper boundary of the 1st Partition Range
RIGHT means Lower boundary of the 2nd Partition Range

Also, the no. of Partition Boundary is always 1 less than total no. of Partition Range. e.g. A Partition with 5 Range will have 4 boundary specified.

Let us try and understand all with an example.
A Partition defined with the LEFT as follows
RANGE LEFT for VALUES ('20010101', '20020101', '20030101', '20040101')
will have the following Partition Range
<= 20010101
20010102 to 20020101
20020102 to 20030101
20030102 to 20040101
>= 20040102

(Notice the first date for every Partition Range is 1 more than the Boundary value specified as the LEFT is for Upper Boundary)

Similarly a Partition defined with the RIGHT
RANGE RIGHT for VALUES ('20010101', '20020101', '20030101', '20040101')
will have the following Partition Range
<20010101
20010101 to 20011231
20020101 to 20021231
20030101 to 20031231
>= 20040101

(Notice the first date for every Partition Range is the same as Boundary value specified as the RIGHT is for Lower Boundary)

Thus, you see it is more simple to specify a Partition Range as RIGHT Aligned for date values.

Specifying a Partition Boundary with a datetime data type has additional complexities. We will discuss this in a later post.

New Date and Time data types in SQL Server 2008

We the developers have always struggled with storing either the Date or Time values in SQL Server, because earlier versions had only "datetime" and "smalldatetime" data type. So, even if the requirement is to store only date you will also have time part stored in the column with may be default values as 00:00:00. Same goes for time.

With SQL Server 2008, Microsoft did a big favor to the developers by introducing 2 new data types "date" and "time" which allows us to do exactly that, "Store Date and Store Time". It looks much clean now.... isn't it.

There are also 2 additional data types for date and time.
datetime2: This is an extension of earlier datetime with added precision and larger date range.
datetimeoffset: This has been included to store datetime along with timezone awareness.

More information about new date and time data types in SQL Server 2008 can be found here..

Monday, June 22, 2009

SQL Server 2008 Overview

A very good presentation by Microsoft on the overview and new features on SQL Server 2008.