Intelligent BI : Rahul Agrawal's BI Blog: 2009

Monday, December 21, 2009

ANCESTOR MDX Function

I recently started looking up on MDX Functions and one of the first MDX Function I came across was ANCESTOR. Although, MSDN does provide some very good resource on MDX functions, but it was not so simple for me. Either I am totally new to MDX or I am too lame.

MSDN tries to explain the ANCESTOR MDX function using the Product Hierarchy on the Adventure Works Database but I think Date Hierarchy can be more cool.

So, here we go...

As the name suggests, the Ancestor MDX expression helps you find the Ancestor of any MDX Expression. You can specify the either the no. of levels you want to go back or also an expression specifying a level.

The syntax for Ancestor is as follows:
Ancestor(Member_Expression, Level)
or
Ancestor(Member_Expression, Distance)

Thus, for example, if the Date Hierarchy is as follows:
Year -> Semester -> Quarter -> Month -> Date
The MDX expression results to 04th October 2003
and you want to know the total Internet Sales Amount for the Semester this date belongs to... you will use the following syntax

SELECT {
ANCESTOR(
[Date].[Calendar].[Date].[October 4, 2003], --Member Expression
[Date].[Calendar].[Calendar Semester]) --Level Expression
} ON COLUMNS,
[Measures].[Internet Sales Amount] ON ROWS
FROM [Adventure Works]

Please note as the Member Expression can return sets, this is used with { }.

If you want to know the Internet Sales Amount for the Month, Quarter and Year based on the given date you will use the following syntax

SELECT {
ANCESTOR([Date].[Calendar].[Date].[October 4, 2003], [Date].[Calendar].[Month]),
ANCESTOR([Date].[Calendar].[Date].[October 4, 2003], [Date].[Calendar].[Calendar Quarter]),
ANCESTOR([Date].[Calendar].[Date].[October 4, 2003], [Date].[Calendar].[Calendar Year])
} ON COLUMNS,
[Measures].[Internet Sales Amount] ON ROWS
FROM [Adventure Works]

The syntax produces the following output

You can use the following syntax to use the Distance (Numeric Expression) to get the same output.

SELECT {
ANCESTOR([Date].[Calendar].[Date].[October 4, 2003], 1),
ANCESTOR([Date].[Calendar].[Date].[October 4, 2003], 2),
ANCESTOR([Date].[Calendar].[Date].[October 4, 2003], 4)
} ON COLUMNS,
[Measures].[Internet Sales Amount] ON ROWS
FROM [Adventure Works]

Remeber from Date, Month is 1 Level up, Quarter is 2 Levels up and Year is 4 Levels up..

Please note the Ancestor Function always returns a single value..
Interestingly, there is also an MDX Function ANCESTORS but I could not find the difference between two. Even the Ancestors return a single value..

The MSDN article on ANCESTOR can be access here...

Monday, November 23, 2009

Sharepoint in Plain English

The very simple video by CommonCraft explains the concept of Microsoft Sharepoint Server in "plain english".

SharePoint in Plain English

Thursday, November 5, 2009

Special Characters in a Table Name

The last day we accidentally changed a table name from 'dbo.TableName' to 'db.TableName'.

Afterwards we were not able to access the table at all, although we could see the table on querying sys.objects or INFORMATION_SCHEMA.TABLES.

The first impression was that the table schema/ownership got changed from dbo to db. However, the table was still existing in the same dbo schema but the name of the table became 'db.TableName' from 'TableName'.

Thus to rename the table I used the following command.
EXEC sp_Rename 'dbo.[db.TableName]', 'TableName'

Please note the use of [] brackets. It is used to embed special characters in the SQL Syntax.
Also, in the second parameter we do not specify the schema/username.

Unable to remove Partition Filegroup

Today while removing a Filegroup we came across the famous error message.
The filegroup FileGroupNameXXX cannot be removed because it is not empty.

Normally, on getting this error message the first thing I do is try to shrink the Filegroup
DBCC SHRINKFILE (FileGroupNameXXX, 0)
This will clear all unused space from it.

However SQL Server informed me that the File does not exist. On further checking I noticed that the Filegroup does not have an associated File or Partition Scheme associated with it. Then why is it not empty and what is associated with it?

At this point, SQL Server DMV (Dynamic Management Views) came to my rescue and the following command showed there is a table along with an index still associated with the Filegroup.

SELECT ds.name, i.name, o.name
FROM sys.data_spaces ds
inner join sys.indexes i on i.data_space_id = ds.data_space_id
inner join sys.objects o on i.object_id = o.object_id
where ds.name = FileGroupNameXXX

This gave me the name of the Objects associated. So, I first dropped the table and ran the following command to remove the FileGroup.

ALTER DATABASE TestDB REMOVE FILEGROUP FileGroupNameXXX

Hope this helps...

Wednesday, November 4, 2009

Rename a Table or a Column

Microsoft SQL Server has a very cool function to change the name of table or column or indexes.. sp_rename

To rename a table
EXEC sp_rename 'dbo.Cust' 'Customer'
The above command renames the table Cust to Customer in the dbo schema.

To rename a column
EXEC sp_rename 'dbo.Cust.CustName' 'CustomerName', 'COLUMN'
The above command renames the Column CustName to CustomerName on the table dbo.Cust

You can use the same command to rename an Index as well.
EXEC sp_rename 'dbo.Customer.idx_CustomerId', 'idx_Customer_CustomerId', 'INDEX'

More details about the sp_rename on MSDN Link

Clean Partition on SQL Server

During the development and testing for Partitions on SQL Server, I have come across this question a lot of time "How do we clear the partition, file, filegroup, partition scheme and function and then start all over again?"

This can be a really tedious process, if you do not do it right and in the right order. This post documents on the steps you need to take to clean everything and start from scratch.

The first thing I always suggest is move all the tables that are on the said partition back to PRIMARY file group. You can check the steps here.

Once you have moved all the tables to the PRIMARY file group, the second step is to remove the Partitioned file groups. Assuming you have a database named TestDB and the Partition Filegroups name is something like TEST_PARTITION_2009_01, TEST_PARTITION_2009_02 and so on for 12 months, run the following syntax

ALTER DATABASE TestDB REMOVE FILE TEST_PARTITION_2009_01
ALTER DATABASE TestDB REMOVE FILE TEST_PARTITION_2009_02
ALTER DATABASE TestDB REMOVE FILE TEST_PARTITION_2009_03
ALTER DATABASE TestDB REMOVE FILE TEST_PARTITION_2009_04
ALTER DATABASE TestDB REMOVE FILE TEST_PARTITION_2009_05
ALTER DATABASE TestDB REMOVE FILE TEST_PARTITION_2009_06
ALTER DATABASE TestDB REMOVE FILE TEST_PARTITION_2009_07
ALTER DATABASE TestDB REMOVE FILE TEST_PARTITION_2009_08
ALTER DATABASE TestDB REMOVE FILE TEST_PARTITION_2009_09
ALTER DATABASE TestDB REMOVE FILE TEST_PARTITION_2009_10
ALTER DATABASE TestDB REMOVE FILE TEST_PARTITION_2009_11
ALTER DATABASE TestDB REMOVE FILE TEST_PARTITION_2009_12

Third, remove the Partition Scheme and Function. Replace the Partition scheme and Function name accordingly.

DROP PARTITION SCHEME TEST_PARTITION_Part_Sch
DROP PARTITION FUNCTION TEST_PARTITION_Part_Func

Finally, you need to remove the Partition FileGroups from the database. I assume the same naming convention..

ALTER DATABASE TestDB REMOVE FILEGROUP TEST_PARTITION_2009_01
ALTER DATABASE TestDB REMOVE FILEGROUP TEST_PARTITION_2009_02
ALTER DATABASE TestDB REMOVE FILEGROUP TEST_PARTITION_2009_03
ALTER DATABASE TestDB REMOVE FILEGROUP TEST_PARTITION_2009_04
ALTER DATABASE TestDB REMOVE FILEGROUP TEST_PARTITION_2009_05
ALTER DATABASE TestDB REMOVE FILEGROUP TEST_PARTITION_2009_06
ALTER DATABASE TestDB REMOVE FILEGROUP TEST_PARTITION_2009_07
ALTER DATABASE TestDB REMOVE FILEGROUP TEST_PARTITION_2009_08
ALTER DATABASE TestDB REMOVE FILEGROUP TEST_PARTITION_2009_09
ALTER DATABASE TestDB REMOVE FILEGROUP TEST_PARTITION_2009_10
ALTER DATABASE TestDB REMOVE FILEGROUP TEST_PARTITION_2009_11
ALTER DATABASE TestDB REMOVE FILEGROUP TEST_PARTITION_2009_12

And voila... you have a clean slate now. Enjoy

Thursday, October 22, 2009

Service Oriented Architecture Podcast

A compilation of podcasts related to Service Oriented Architecture (SOA) spread across the internet.

What is SOA? From whatis.com, sponsored by IBM

Another good article explaining what is SOA?
An introduction to SOA on Javaworld.com

Tuesday, October 20, 2009

Microsoft Excel 2010

Get a sneak preview of Microsoft's most famous BI Tool Excel in a new avatar Excel 2010 here.

http://www.microsoft.com/office/2010/

Count Distinct Records for Multiple Column

We all know how to find the Distinct Count for any particular column in a table

SELECT COUNT(DISTINCT ColumnName) FROM dbo.TableName

What if you want to count the Distinct combination for multiple Columns? The above syntax will not work. You can use a Derived table to count the distinct combination

SELECT COUNT(*) FROM (
SELECT DISTINCT Column1, Column2, Column3 FROM dbo.TableName) AS DistinctTable

Hope this helps!!

Monday, October 19, 2009

Create a Duplicate table

There are often times when I want to create a duplicate structure of an existing table. The simplest way to do that is following:

SELECT TOP 0 * INTO dbo.tablename

Remember to put a dbo before the new tablename, or else SQL Server creates the table under your schema.

Thursday, October 8, 2009

Phases of Development Projects

I was watching the following Microsoft Webcast
IT Manager Webcast: How Microsoft IT Manages Vendor Development Projects (Level 300)

Lots of interesting sheets, graphs and tracker. Also, liked the following phases listed in Development Project. This can act as a Ready Reckoner.

REQ - Requirements
HLD - High Level Design (e.g. FSPEC)
HLDINSP - High Level Design Inspection
DLD - Detail Design
DLDR - Detail Design Review
DLDINSP - Detail Design Inspection
TCR - Test Case Review
CODE - Coding
CR - Code Review
UT - Unit Test
CODEINSP - Code Inspection
IT - Integration Testing
ST - System Testing
PM - Post-mortem
STRAT - Strategy (e.g. Vision/Scope)
PLAN - Planning, Project Management
UAT - User Acceptance Test
PL - Product Live

Friday, October 2, 2009

Impact of Multi-Column Clustered Index

Clustered Indexes are really important and special. Special because there can only be 1 Clustered Index in a table and they decide the order in which data is stored in tables.

Clustered Index help the database uniquely identify individual rows in a table. As mentioned in my previous blog if the Clustered Index do not uniquely identify a row, SQL Server automatically adds a 4 byte uniqueidentifier column to make it unique.

However we should be really careful not to add too many columns in a Clustered Index in trying to make it unique, because all Non-Clustered Indexes store the columns from Clustered Index at the end to point to a row in a table.

Therefore, more the no. of columns in a Clustered Index will be, more the size of Non-Clustered Index will be.

Wednesday, September 30, 2009

Disable and Enable Index

Microsoft introduced the concept of disabling an Index in SQL Server 2005.

They syntax to disable the index is as follows:
ALTER INDEX indexname ON tablename DISABLE
e.g.
ALTER INDEX idx_TEST ON dbo.TEST DISABLE

However, contrary to popular belief syntax to enable the index is not
ALTER INDEX idx_TEST ON dbo.TEST ENABLE

The correct syntax to enable an Index is
ALTER INDEX idx_TEST ON dbo.TEST REBUILD

Remember the keyword REBUILD for it.

Monday, September 28, 2009

The Evolution and Future of Business Intelligence

A good article explaining the evolution of Business Intelligence taking an example of a hypothetical Telecom Company and how it emerges during the different stages of evolution.

BI Gen I : Reporting using the Enterprise Data Warehouse using slicing and dicing. Helps in Reporting at Speed-of-Thought.
BI Gen II : Giving real time Reporting capability to the Data Warehouse using Dashboards and Scorecards. Ability to analyze events as it happens.
BI Gen III : Reporting should not only be Real-Time but also be Actionable. Automated decision making.
BI Gen IV : Simulating Business Strategies based on past available information. Run hypothetical scenarios on real data and analyze results as if it were real.
BI Gen V : Optimize business using Artificial Intelligence as well as Traditional Statistics. Make it Autopilot.

Read the complete article here...
The Evolution and Future of Business Intelligence by Rob Walker

Sunday, September 27, 2009

OLAP Jokes to tickle your Funny Bone

I came across this collection of really funny OLAP Jokes.

Posting some of them..

Q: Why did the dimension take all day to take off its suit and put on a pair of jeans?
A: It was a slowly-changing dimension

Q: Why couldn't the hierarchy become a tightrope-walker?
A: It was unbalanced

Q: Why did the dimension keep itself to itself?
A: It was a private dimension.

Q: What's the porn director's favourite MDX function?
A: BottomCount

Q: What MDX function makes Previous Year Growth and Market Share blush?
A: StripCalculatedMembers

Q: How did the BI developer send his backed-up database to a colleague on the other side of London?
A: In a .cab file

And my favorite one.

Q: Was the cube straight or gay?
A: Neither, it was BI.

You can read more at the Chris Webb's BI Blog....

Friday, September 18, 2009

US Government IT Dashboard

The US Govenment in June launched a IT Dashboard to keep a track of their IT Spending. This is an extremely good example of use of Business Intelligence by a Federal Government to bring more transparency and acccountability in the way they work.

The Dashboard was developed in flat six weeks time. It is highly interactive and allows you to drill down to individual departments, their investments and presents a scorecard of their performance.

Check out the site at http://it.usaspending.gov/

Surrogate Key in Data Warehouse

Simply put, Surrogate Keys are substitution to original Primary Key of a Dimension Table. They still act as a Primary Key but are Data Warehouse generated unique Integer values.

One may argue if these are simply Substitution then why can't we simply use the original Primary Keys. Well, it has its own advantages:

What if the Source Primary Key Changes? (We agree this is an unlikely situation, but yes this can happen)
What if data is integrated from Multiple Sources? (How do we uniquely identify a row in this case. Because both system may have their own key, and the key can be repeated in both systems. Well, this is not possible if we use a GUID column, because as the name says it is Globally Unique Identifier. But we will come back to this later.)

Thus, as the Surrogate Key is generated and maintained by the Datawarehouse System and not the the Source/Transactional Sytem, we can easily overcome these issues. The Surrogate Key acts as a bridge or substitution over the Source System.

It also helps us in dealing with Dummy Values. Normally -1 or 0 is used to denote a Dummy Value. (e.g. If you receive Sales without Promotions you can put -1 as a value for "No Promotions" instead of NULL Values). While handling large queries joins between Integer values are more efficient than NULL Values.

Another important characteristic of Surrogate Keys are it should be meaningless. No Intelligence should be derived from these keys. A lot of source system has keys where there are hidden meanings in the Keys. (e.g. A source key value XXYYYZZZ1234 the first 2 characters may refer to country code, the next 3 department code and so on.) This may be good for a Transactional System but not for a Data Warehouse environment. If we come across such scenarios it is always good to bifurcate the meanings and store it into different columns using ETL. So that later they can be referred as individual columns in a SQL Query which will be more efficient.

Another great advantage of Surrogate Keys are it helps optmize space. These days a lot of Source System has their Primary Keys as AlphaNumeric Values or GUID Columns. A GUID column occupies 16 bytes and it can be anything for a AlphaNumeric value. Whereas an integer value generally uses 4 bytes. Thus, in comparison to GUID it saves 12 bytes. This may look small in isolation or when you simply see a Dimension Table. But Imagine a saving of 12 bytes over a 100 million rows Fact Table, which is pretty moderate. This means a saving of around 1 GB and we are talking only one column. What if the table has 1 Billion rows, it will save around 11 GB and if you have multiple GUID columns.. Huh!!!

Thus, we see the benefits of having a Surrogate Key over a normal Primary Key. It also goes by the following names in many environments.

Alternate Key
Non Natural Key
Meaningless Key
Integer Key
Artifical Key

Tuesday, August 4, 2009

Check if #Temporary table exists on the database

You can check if a #Temporary exists in the database with the following command.

SELECT * FROM tempdb.dbo.sysobjects WHERE name LIKE '#%'

e.g.
If you create a temporary table with the name #Customer you can check for it using:
SELECT * FROM tempdb.dbo.sysobjects WHERE name LIKE '#Customer__%'

Friday, June 26, 2009

10 Essential Rules of Dimensional Modeling by Kimball University

Margy Ross from Kimball University writes the 10 Commandments of Dimensional Modeling. (Well she hated to use the word Commandments but I still liked it.)

The article contains pointers, most of which we do follow while designing a Data Warehouse. But, it was really good to have all of them at one place.

I am listing the Commandments here, but please read the full article by clicking on the link here...

Load detailed atomic data into dimensional structures.
Structure dimensional models around business processes.
Ensure that every fact table has an associated date dimension table.
Ensure that all facts in a single fact table are at the same grain or level of detail.
Resolve many-to-many relationships in fact tables.
Resolve many-to-one relationships in dimension tables.
Store report labels and filter domain values in dimension tables.
Make certain that dimension tables use a surrogate key.
Create conformed dimensions to integrate data across the enterprise.
Continuously balance requirements and realities to deliver a DW/BI solution that's accepted by business users and that supports their decision-making.

Thursday, June 25, 2009

Create Clustered Index as Unique or Not?

Clustered Index is very important and determine the order in which data is stored in the table. A table without a Clustered Index is called a HEAP. (Try and picture a Heap in your mind and then you can visualize how data is stored in table without a Clustered Index.. Huh!!)

Clustered Index also helps the SQL Engine uniquely identify a row. Thus, if possible, always create a Clustered Index as UNIQUE. If you do not specify the UNIQUE clause while creating your Clustered Index, SQL Server automatically adds a 4 byte uniqueidentifier column to the table to make every row unique.

Thus, specifying a UNIQUE clause also helps save valueable database space.

More information can be found at the following MSDN Page.

Wednesday, June 24, 2009

Handy Shortcuts for SSMS

I like to add the following short-cut keys to my SQL Server Management Studio. This makes a lot of regular day-to-day commands much easier to use.

Click on Tools Menu.
Select Options Menu
Select Keyboard on the Environment Tree
add the following shortcuts

Ctrl+3 : sp_Columns
Ctrl+4 : sp_Depends
Ctrl+5 : sp_SpaceUsed
Ctrl+6 : sp_HelpIndex
Ctrl+7 : sp_HelpText

(You can add/change your own Shortcut Commands)

Click OK. You will have to re-start SSMS to use these shortcuts.

Next time if you want to see the size of a table, instead of typing sp_SpaceUsed TableName, simply select the table and press Ctrl+5 and Voila...

A list of Database Engine Stored Procedures can be found here...

Comparison of Kimball and Inmon Methods

I have always been bugged and intrigued by Kimball and Inmon methods of Data Warehouse. Some questions that always used to bother me was:
What is the Kimball Method?
How is Inmon's Datawarehouse different from Kimball's?
Which model am I working on? Strangely I did not knew this, except for the fact that it was a Dimensional Model.

I could find a plethora of articles on web. But this one by Mary Breslin is the best. "Data Warehousing Battle of the Giants: Comparing the Basics of the Kimball and Inmon Models". The article explains both the methods, does a fair analysis and comparison and also suggest which approach is best in a given scenario.

I could also find a link to the Presentation for the same article. You can download the PDF here.

Tuesday, June 23, 2009

Clear Log Space from Database

After large operations on the database the Log file may get filled and you may get a warning that there is no space to execute the SQL Command.

Use the following command to clear the Log File.
DBCC SHRINKFILE(LogFileName, 1)

You can get the name of the Log File with the following command
sp_helpdb databasename

DBCC SHRINKFILE is an interesting command and can also be used to shrink the Data files. Read the following MSDN article to more about this command.

View Log Size and Space Used

Use the following command to see the current log size (MB) and space used for all Databases.

DBCC SQLPERF(LOGSPACE)

The output will be as follows

Move a Partitioned Table to PRIMARY Filegroup

The following command will move the Partitioned Table to PRIMARY Filegroup. (The table will be un-partitioned then....)

Drop the existing Non Clustered Indexes on the Table (if any)
Drop the Clustered Index on the table
Re-Create the Clustered Index on the table specify the Filegroup as PRIMARY.
CREATE CLUSTERED INDEX IndexName ON TableName(ColName) ON [PRIMARY]
Re-Create the Non Clustered Indexes (if any)

Steps for Creating Partition

Identify the List of Tables to be Partitioned
Determine Partitioning Keys in the Tables and re-align Indexes if required
Determine No. of Partition for each table or group of Tables
Create Filegroups (both on Database and Filesystem)
Create Partition Function
Create Partition Scheme
If new Tables, Create tables on Partition Scheme
If existing Tables, Create Indexes with DROP INDEX...WITH (MOVE TO...) syntax

Identify your SQL Server Version

The following code can help you identify your SQL Server Version.

SELECT
SERVERPROPERTY('ProductVersion'),
SERVERPROPERTY('ProductLevel'),
SERVERPROPERTY('Edition')

Well, more details on this can be found at the following KB321185

Partition Alignment, LEFT or RIGHT?

While defining the Partition Range you need to specify if the Partition will be LEFT aligned or RIGHT aligned. Unless you have a good idea of this, it can be very confusing.
Let us break the code..... :)

A Partition Range defined with:
LEFT means Upper boundary of the 1st Partition Range
RIGHT means Lower boundary of the 2nd Partition Range

Also, the no. of Partition Boundary is always 1 less than total no. of Partition Range. e.g. A Partition with 5 Range will have 4 boundary specified.

Let us try and understand all with an example.
A Partition defined with the LEFT as follows
RANGE LEFT for VALUES ('20010101', '20020101', '20030101', '20040101')
will have the following Partition Range
<= 20010101
20010102 to 20020101
20020102 to 20030101
20030102 to 20040101
>= 20040102

(Notice the first date for every Partition Range is 1 more than the Boundary value specified as the LEFT is for Upper Boundary)

Similarly a Partition defined with the RIGHT
RANGE RIGHT for VALUES ('20010101', '20020101', '20030101', '20040101')
will have the following Partition Range
<20010101
20010101 to 20011231
20020101 to 20021231
20030101 to 20031231
>= 20040101

(Notice the first date for every Partition Range is the same as Boundary value specified as the RIGHT is for Lower Boundary)

Thus, you see it is more simple to specify a Partition Range as RIGHT Aligned for date values.

Specifying a Partition Boundary with a datetime data type has additional complexities. We will discuss this in a later post.

New Date and Time data types in SQL Server 2008

We the developers have always struggled with storing either the Date or Time values in SQL Server, because earlier versions had only "datetime" and "smalldatetime" data type. So, even if the requirement is to store only date you will also have time part stored in the column with may be default values as 00:00:00. Same goes for time.

With SQL Server 2008, Microsoft did a big favor to the developers by introducing 2 new data types "date" and "time" which allows us to do exactly that, "Store Date and Store Time". It looks much clean now.... isn't it.

There are also 2 additional data types for date and time.
datetime2: This is an extension of earlier datetime with added precision and larger date range.
datetimeoffset: This has been included to store datetime along with timezone awareness.

More information about new date and time data types in SQL Server 2008 can be found here..

Monday, June 22, 2009

SQL Server 2008 Overview

A very good presentation by Microsoft on the overview and new features on SQL Server 2008.