YTD Calculations

Whether you’re creating a report, chart, KPI, or calculated measure, at some point you’ll probably need to add year-to-date calculations. It doesn’t matter if you’re using SQL or MDX, the technique to generate the From and To dates for a YTD calculation remains the same.

The general concept goes like this:

Step 1: Get the current year and prefix it with “01/01/”. This is the From date.
Step 2: Get today’s date. This is the To date.

It seems simple enough but if your ETL process runs daily and pulls “yesterday’s” data then you’ll need to account for a 1 day lag on the To date.  For example, if today is June 15th, 2011 then your warehouse will only contain data up to June 14th.  Using the logic described above, the YTD calculation will select data from “01/01/2011″ to “06/15/2011″, even though data for June 15th hasn’t yet been loaded into the warehouse.  Although not entirely accurate, it won’t hurt anything, the YTD calculation just won’t return any data for the 15th.

Now let’s say today is December 31st, 2011.  On this day the warehouse will only contain data up to December 30th and the YTD calculation will select data from “01/01/2011″ to “12/31/2011″.  Are you beginning to see the problem?  Here’s one last example to drive the point home: Lets advance one day to January 1st, 2012.  On this day the warehouse will only contain data up to December 31st, 2011 but the YTD calculation will select data from “01/01/2012″ to “01/01/2012″.  In this case the YTD calculation will return an empty resultset and the business users will never get a full 365-day view of the data.

Ok, so we know we have to update the YTD calculation to handle the one-day lag in the ETL.  The logic now looks like this:

Step 1: Get the current year of yesterday and prefix it with “01/01/”. This is the From date.
Step 2: Get yesterday’s date. This is the To date.

To summarize, here’s what the YTD calculation will select using the original logic:

  • June 15th, 2011: “01/01/2011″ through “06/15/2011″
  • December 31st, 2011: “01/01/2011″ through “12/31/2011″
  • January 1st, 2012: “01/01/2012″ through “01/01/2012″

And here’s what the YTD calculation will select using the new logic:

  • June 15th, 2011: “01/01/2011″ through “06/14/2011″
  • December 31st, 2011: “01/01/2011″ through “12/30/2011″
  • January 1st, 2012: “01/01/2011″ through “12/31/2011″

As you can see, given the 1-day ETL lag, the new YTD calculation will always select the correct range of data, even on the first of the year.

But lets take it a step further.  I propose the idea of working with the “last completed month” or even the “last completed week”.  This way calculations are handled more gracefully.  In other words, the YTD calculation is based on January through the last completed month or week.  Keep in mind that we still have to account for the 1-day ETL lag.

This is what would happen if we decided to operate on the last completed month:

  • June 15th, 2011: “01/01/2011″ through “05/31/2011″
  • December 31st, 2011: “01/01/2011″ through “11/30/2011″
  • January 1st, 2012: “01/01/2011″ through “12/31/2011″

In order to achieve this behavior we must implement the following logic:

Step 1: Get the current year of yesterday and prefix it with “01/01/”. This is the From date.
Step 2: Get the month of yesterday’s date, append “/01/” for the day, append the year of yesterday’s date, and subtract one day. This is the To date.

The benefit of this approach is that business users have up to a month to analyze data produced by year-to-date calculations.  However, it’s a double edged sword because business users have to wait up to a month.  Admittedly this approach may be a better choice for “rolling” calculations but this will depend on what works best for the business.  In some cases using the last completed week may be preferred over using the last completed month since users still have time to analyze data but don’t have to wait long for new data to be included in the calculation.

Posted in MDX, SQL | Leave a comment

When To Use NVARCHAR

The Short Answer

Only when you need to store multi-lingual (unicode) data. 

The Nitty Gritty

The fundamental unit of storage in a SQL Server database is an 8 KB page.  To paraphrase the MSDN, SQL Server reads and writes whole data pages and as a result, the maximum amount of data that can be contained within a single row on a page is 8,060 bytes (8 KB). 

The page size in turn limits the maximum size of VARCHAR, a variable-length non-Unicode character datatype, to 8,000 bytes.  Take note!  The operative word here is non-Unicode.  A non-Unicode character is stored as UTF-8 (8 bits) and requires one byte of storage per character, meaning that the VARCHAR datatype may contain at most 8,000 characters. 

In contrast, NVARCHAR is a variable-length Unicode datatype.  And unlike non-Unicode characters, Unicode characters are stored as UTF-16 (16 bits) and require two bytes of storage per character.   Because of the SQL Server page size, NVARCHAR datatypes have the same length restrictions as their VARCHAR cousins, 8,000 bytes.  This means that an NVARCHAR datatype may contain, at most, 4,000 characters.  The net result is NVARCHAR dataypes take up twice as much space as a VARCHAR datatype.  Said another way, NVARCHAR(4000) is the same size as VARCHAR(8000).

But the NVARCHAR datatype shouldn’t be dismissed entirely.  The English language can be encoded in UTF-8, making VARCHAR the datatype of choice.  However, other languages, such as Japanese, Hebrew, Arabic, etc., have an extended set of character codes that are only found in UTF-16.  In order to store data in these languages you must use the NVARCHAR datatype, but that should be the only time.

Warehouse Beware

The rule for when to use an NVARCHAR datatype applies to both OLTP systems and OLAP systems.  If your OLTP system is riddled with unnecessary NVARCHAR datatypes you may want to consider converting them to VARCHAR in the data warehouse to save space.  It’s tempting just to convert all NVARCHARS to VARCHARS but you always have to ask yourself, was the column defined as an NVARCHAR for a reason?  This is a question you’ll need to have answered by the business users.  Is your company planning to expand their business globally?  If so, it might make sense to keep the NVARCHARs, but only for columns where it really matters.

Posted in SQL Server | Tagged , , , , , , | 4 Comments

How to Generate Insert Scripts for Existing Data

Let’s say you have a bunch of data stored in a table and you need to generate an insert script for each record. Maybe you need to save the scripts in source control or maybe you need the scripts to initialize the same table in another environment. What would you do?

Traditional Methods

Several ideas come to mind, all painfully tedious:

  • You could dynamically create the insert statements with a stored procedure

If you do a quick search you’ll find several stored procedures that will generate insert scripts. This isn’t a bad way to go in terms of effort but I would definitely scrutinize the procedure before using it. And as you’ll see in a bit, there’s an even better approach.

  • You could dynamically create the insert statements with a SQL query

Following this approach your SQL query would look something like this:

SELECT 'INSERT INTO dbo.MyTable (MyCol1, MyCol2) VALUES (' + MyCol1 + ',' + MyCol2 + ')' AS InsertStatement FROM dbo.MyTable

If your table is very wide then the script can be very complex to write and difficult to read and maintain.

  • You could dynamically create the insert statements with Excel

This is a two-step approach where first you select all data from the table and then copy it into Excel.  You then add columns before and after each value so you can create an expression for the insert statement.  For example, cell A1 would look something like this:

=INSERT INTO dbo.MyTable (MyCol1, MyCol2) VALUES (

Using the fill handle you drag the expression down for each row and repeat for each of the other columns (remember, each value you insert needs to be separated by a comma and if you have character data then it must also be enclosed in single quotes). Once you copy your script out of Excel you will notice that the values in each column are separated by a tab. If you are inserting character data you will need to either update your expression and surround it with the RTRIM() and LTRIM() functions, or manually remove the tab yourself. This can get very ugly very quickly.

  • You could manually type the insert statements

When all else fails you can always start to write the script yourself. Depending on how many rows of data you have, and the width of the table, this would require a tremendous amount of effort,

  • You could convince someone else to generate the scripts for you

A-la Tom Sawyer and the fence, your mileage may vary with this approach.

A Better Way

Luckily, there is a far easier approach where SQL Server does all the work.  Unfortunately, if you’re running SQL Server 2005 or earlier you’re still limited to the approaches mentioned above – this well-hidden gem is only available in SQL Server 2008 and later.

Starting in SQL Server Management Studio, navigate to the database where your data lives in the Object Explorer.  Right click on the database name, go to Tasks, and select Generate Scripts.

The SQL Server Scripts Wizard will appear, click Next to begin.

Select the database where your data lives.  In this example I am choosing AdventureWorksDW2008.

In the Choose Script Options there are a number properties that you can tweak.  This is where the magic happens!  In order to generate insert scripts you need to set the “Script Data” property located in the “Table/View Options” group to True (by default it’s set to False). 

Select the type of objects you  want scripted.  Since we want to generate insert scripts the object type table nust be selected.

Select the tables for which you want to generate insert scripts.  Here I’ve chosen DimAccount.

Select your output option. 

Review your settings.  Notice that “Script Data” is set to True.

Let the wizard spin…

And voila!  An insert statement for each record (along with the table definition) has been generated!

So Now What?

I *highly* recommend that you check out and familiarize yourself with all of the options in the Check Options screen as it contains numerous ways to customize your scripts that could end up saving you a lot of time!

Posted in SQL Server, SSMS | Tagged , , | 6 Comments

Virtual PC and Windows 7

This morning while trying to spin up one of my Virutal PCs I ran into some peculiar error messages.  In case the same thing happens to you, the workaround is pretty simple. 

When I opened Virtual PC and selected my VMC file I received the following message:

Could not register the virtual machine.  The virtual machine configuration could not be added. User does not have sufficient access rights.

Even if I ran Virtual PC as administrator I still got that error.  The fix is to create a new virtual machine.  When you do so, make sure to point to your existing VHD file.  At this point you’ll probaly get this message:

Virtual PC . . . was unable to write to one of its virtual hard disks.

The fix here is to reference the VHD file by share name, not drive name.  So instead of simply pointing to C:\MyVirtualPCs\MyFavoriteVirtualPC.vhd, you have to point to \\ComputerName:\MyVirtualPCs\MyFavoriteVirtualPC.vhd.  Now, since I keep my VHD files on an external hard drive, this meant that I had to create a share on its parent directory. 

Thanks to Damir Dobric for posting the solution!

Posted in Windows | Tagged , , | 1 Comment

Prevent SSRS 2008 from Overwriting Datasets

Report development is one of my favorite areas of Business Intelligence and I have to say I’ve been fairly happy with the new interface for SSRS 2008.  Except for when it comes to MDX queries. 

Off the top of my head I can think of three major issues with the way SSRS 2008 handles MDX queries.  To be fair, only one of those issues is related to 2008.  The other two issues, which I’m saving for another blog post (no, I’m not even going to hint at them!), are related to SSRS in general, not 2008 specifically.

So what issue am I talking about?  Well, if you’ve ever developed a report in SSRS 2008 that used a parameterized MDX query then I’m pretty sure you’re familiar with it: after switching to MDX mode, making and saving changes to the report query, the MDX Query Designer overwrites the parameter datasets and wipes out all changes made to these parameter datasets.

If you define a parameter in your MDX query, SSRS 2008 will automagically create the parameter, and its dataset, for you.  I have to admit, that’s pretty slick.  Unless the dataset already exists, in which case it gets overwritten.  In fact, anytime you modify an MDX query that references the parameter, the dataset will get overwritten.  If you’re using the generic dataset then this isn’t a problem.  But if you’ve customized the dataset in any way, then it gets annoying fast.  Really fast.  Can you imagine having to update an MDX query that referenced several parameters?  And can you imagine having to make multiple tweaks to said MDX query, just so it would operate just right?

After suffering a mild aneurism I finally said Enough is Enough.  Something must be done.  And, thanks to Teo, something has been done.  Well, kinda-sorta-not really.  There is a bug listed on the Microsoft Connect site but due to introducing backward-compatibility issues, Microsoft is unable to fix this behavior in the current release.  However, they have promised to fix the behavior in a future release.  (Has it been fixed in R2?)

In the meantime, you can perform the workaround described below.  But be warned, it involves manually editing the XML code, so you should probably make a backup of your RDL in case things go horribly awry. 

How to Prevent SSRS 2008 from Overwriting Datasets:

  1. Open the report in XML mode.  There are two ways to do this: One way is to navigate to the RDL file in Windows Explorer, right click on the RDL, and select Open With…  and then choose Notepad.  The second way is to open the report in BIDS, right click on the report in the Solution Explorer, and select View Code. 
  2. Do a search on “<Dataset>”, and add “<rd:SuppressAutoUpdate>true</rd:SuppressAutoUpdate>” to the “<Query>…</Query>” tags, as shown below.  Just to be on the safe side, I added this line to all of my datasets.

image

Posted in SSRS | Tagged , , , , , , | 2 Comments

SQL Server 2005 Installation Failures

Recently I was trying to install SQL Server 2005 on a workstation running Windows XP Professional SP3 and for some reason the installation kept failing. The first item in the list that failed was MSXML 6.0. After combing through the log file I found the following message: “Product: MSXML 6 Service Pack 2 (KB954459) — Configuration failed.”

According to http://support.microsoft.com/kb/968749, when SQL Server Setup tries to install MSXML 6.0 SP2, it detects that the version of MSXML 6.0 that is described in Knowledge Base article 954459 is already installed on the computer. Therefore, the validation process indicates that the current version of MSXML 6.0 should not be replaced. This stops the installation of both MSXML 6.0 and SQL Server 2005.

To work around this issue, use the Windows Installer CleanUp utility to remove references to MSXML 6.0 Service Pack 2. The Windows Installer CleanUp utility can be downloaded here: http://support.microsoft.com/kb/290301. Once installed, simply run the utility, select MSXML6 Service Pack 2 (KB954459) [6.20.1099.0] in the list and click Remove. After MSXML 6.0 has been removed then you should be able to install SQL Server 2005 without any further problems.

Posted in SQL Server | Tagged , , , | 1 Comment

SharePoint JavaScript Error: Library Not Registered

The other day one of my clients upgraded from Outlook 2003 to Outlook 2007.  Since then, whenever she went to her company’s SharePoint site, she received the following error: 

================================================================== 

Website error details 

User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0;
.NET CLR 1.1.4322; MS-RTC LM 8; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)
Timestamp: Mon, 4 Jan 2010 18:27:45 UTC

Message: Library not registered.

Line: 1935
Char: 4
Code: 0
URL: http://server/_layouts/1033/init.js?rev=ck%2BHdHQ8ABQHif7kr%2Bj7iQ%3D%3D

==================================================================

Apparently the Outlook upgrade did something to a DLL (removed it?) that SharePoint needs.
If you have just upgraded from Outlook 2003 to Outlook 2007 and you receive a similar error, simply run Office Diagnostics and it should fix the problem!

To run Office Diagnostics follow these steps:
1. Open Microsoft Outlook 2007
2. On the Help menu, click Office Diagnostics
3. Click Continue, and then click Start Diagnostics

Posted in SharePoint | Tagged , , , , , , | 12 Comments

Adding Custom Components to SSIS

Below are the basic steps for adding custom components to SSIS; however, for more in-depth information I highly recommend reading Matthew Roche’s post about deploying and testing custom SSIS components

1. Register the custom assembly in the Global Assembly Cache (GAC)

Copy the DLL to the c:\windows\assembly\ directory.  The easiest way is to drag and drop the DLL using Windows Explorer (right clicking on the DLL and doing a copy/paste will not work).  Alternatively you can use the command line utility called GACUTIL.EXE (for more info about GACUTIL see the link above to Matthew’s Roche’s post).  Either way you will need Admin rights. 

2. Copy the custom assembly to Visual Studio’s “special folder” 

Copy the DLL to the appropriate subfolder in the C:\Program Files\Microsoft SQL Server\90\DTS\ directory.  For Control Flow Tasks put the custom assembly in the Tasks folder; for Data Flow Tasks use the PipelineComponents folder.  Remember that Visual Studio is a 32 bit application.  If you’re running on a 64 bit machine, make sure to use the Program Files (x86) path, otherwise you won’t see your component listed in the Choose Toolbox Items window of Step 3.

3. Add the component to the SSIS toolbox 

Open an Integration Services project in Visual Studio, right click anywhere in the Toolbox, and select Choose Items.  Go to the appropriate tab in the Choose Toolbox Items window and select the appropriate components.  

Posted in SSIS | Tagged , , , | 1 Comment

The Path Specified Cannot Be Used At This Time

This morning I needed to check the properties for one of our SharePoint application pools so I logged onto the server and opened Internet Information Services (IIS) Manager.  To my surprize the root was empty; it should have automatically connected to the localhost.  To my even greater surprize, all attempts to manually get IIS to connect to the server yeilded the message “The path specified cannot be used at this time”. 


After a quick search I found a post by Old Nick stating that this problem is easily resolved by restarting the Windows SharePoint Services Timer service.  That’s pretty obvious, right?  Sure enough, restarting that service solved the problem.  The next time I opened IIS it connected to the localhost. 

Posted in SharePoint | Tagged , , | Leave a comment

PPS Filters: Beware of Changing Keys

The other day I thought it would be fun to arbitrarily change the keys for all the members in one of my dimensions.  Actually, there was a legitimate reason but after the fact I realized there was a much better way to solve my problem that didn’t involve changing the keys.

Nevertheless, the point is this: If you have a PerformancePoint (PPS) filter linked to an attribute in a dimension, and the keys change, your filter will break. 

Lets say you have a PPS dashboard containing a filter and a scorecard.  The filter is linked to the Category attribute in the Product dimension.  If the keys in the Product dimension change from Bikes having a key of 3 and Accessories having a key of 4 to Bikes having a key of 4 and Accessories having a key of 5, then whenever you select Bikes in the filter the scorecard will display the results for Accessories.  If these were the only two members in your dimension I’m not sure what would happen if you selected Accessories but I assume the scorecard would just show empty cells.  Or spontaneously combust.

To fix this all you have to do is edit the filter by clicking on Member Selection, going through the wizard, and publish/deploy your dashboard.

Posted in MDX, PPS | Tagged , , | Leave a comment