Tuesday, March 29, 2011

Delete logical duplicates using Ranking Function

A simple way to delete duplicate (not absolute duplicate records where records are completely identical) records from table using the ranking function and CTE is as follows.

The following SQL will remove logical duplicate records from the table tblTransaction.
If the table tblTransaction has multiple records for a combination of AccountId and RecordId the DELETE syntax will keep only the latest dates and delete all previous records for the combination.

WITH Transaction_CTE (Ranking, AccountId, RecordId, StartDate) AS (
SELECT Ranking = DENSE_RANK()
OVER(PARTITION BY AccountId, RecordId
ORDER BY StartDate ASC),
AccountId, RecordId, StartDate
FROM dbo.tblTransaction )
DELETE FROM Transaction_CTE WHERE Ranking > 1

Thursday, March 24, 2011

Pass NULL String in the Conditional Expression

I have a SSIS package that reads data from a flat file and loads it into database table. The flat file should send valid values for all columns. However, I got the file with "NULL" written for integer values. The requirement was to store NULL values in the database column.

Passing NULL in SSIS is not straight forward. The MSDN article recommends using the expression NULL(DT_STR,10,1252) to return NULL.
http://msdn.microsoft.com/en-us/library/ms141758.aspx

This did not work for me. After some work, I figured that you have to again cast the NULL return value to string to make it work.
(DT_STR,10,1252)NULL(DT_STR,10,1252)

If you want to use this in Conditional expression, you can use the following format.
Pts_Earned == "NULL" ? (DT_STR,10,1252)NULL(DT_STR,10,1252) : Pts_Earned
The above expression means if it finds "NULL" string in the column Pts_Earned it will pass on NULL or else passes on the actual value.

Hope this helps.