OLAP Cubes provide a convenient way to crunch data in Excel. Imagine a Pivot Table pulling information from thousands or millions of data points with hundreds of columns and you being able to share this humongous data set with colleagues using an Excel spreadsheet that weights nothing more than a few Kilobytes. Add to this the ability to update the source data while the users get to look at that information real time. That’s what a Pivot Table based on an OLAP cube can do for you. A disclaimer : You will not be able to create an OLAP cube in Excel 2007 – the functionality has been deprecated. To create an OLAP cube you will have to rely on good ol’ Excel 2003 though you can create a pivot table in Excel 2007 with an OLAP cube as the source.
What is an OLAP Cube?
Wiki defines an OLAP cube (Online Analytical Processing) as “a data structure that allows fast analysis of data. It can also be defined as the capability of manipulating and analyzing data from multiple perspectives”. Each dimension of an OLAP cube would define one data category. Obviously, the cube is misnomer – an OLAP cube can have more than three dimensions.
OLAP Cubes in Excel
Excel provides a set of drivers that can be used to connect to a whole host of data sources and create your own olap cubes. (We will learn about creating an OLAP cube a little later.) During the preparation of an OLAP cube, Excel will connect to a data source and then read data from it. Once the cube has fed itself with the data points that it needs, it can act as a standalone data source. The data residing in the OLAP cube can then be presented in an Excel spreadsheet using a pivot table. In that sense, the OLAP cube acts as the data source for the pivot table (normal pivot tables will typically have their source data residing in an excel spreadsheet). The cube processes all the requests for slicing & dicing of the data sent by the pivot table. Whenever a fresh copy of the data is requested by the pivot table, the OLAP cube fetches the latest information from the data source, refreshes itself and then continues to service all the requests.
An important point to note here is that the pivot table’s size is reduced to a fraction of what a normal pivot table would have been. For all its data processing abilities, the pivot table relies upon the OLAP cube. The physical separation of the Pivot Table from the source data, achieved via an OLAP cube, allows the source data to be placed anywhere in the network and be updated real-time independent of the pivot table. If the network connection is fast enough, the user will never know the difference but will surely appreciate the compactness.
How to create an OLAP cube in Excel
Let’s first begin by looking at the source data. One major advantage that an OLAP cube provides the data miner is that it can pack a lot more data than an ordinary excel spreadsheet would. The other advantage is that you can use a variety of data sources including txt, csv , asc and others. So let’s start with a comma separated file with the following data points – product, region, date and sales.
Create a new directory and place the file with the raw data in it. In our case, the file is named source_data.txt in a directory named CUBE.
Now open up a fresh workbook using Excel 2003. Click on “Data” from the menu and then select “Import External Data” followed by “New Database Query”.
Provide a name to the data source and select the type of driver that you want to use for the purpose. In our case, since we are using a .txt file as the raw data, we will use the Microsoft Text Driver. Press Ok.
The left hand box has a list of the data sources in the directory that we selected in the previous step. You can use more than one data source at a time, though in our case, we will be sticking to just one. Move all the items (or the ones that you like) to the box on the right and press next. You can provide filtering and sorting options in the next two forms but we will just give them a miss for now and simply click “Next” which brings us to the final screen.
Here’s where the difference between Excel 2003 and Excel 2007 lie in terms of creating OLAP cubes. While in Excel 2003, you will see three options (shown below), in Excel 2007 you only be presented with two options with the option to “Create an OLAP cube from this query” having been taken away.
Fortunately for us, we still have access to Excel 2003 and we choose the third option – the one to create the OLAP cube. On pressing Ok, the OLAP Cube wizard opens up.
Press ok and move onto the next step where we specify the field that we would like to summarize. Think of this as the data field in a pivot table. In our case, it is sales since we would like to analyze sales across product, regions and date ranges.
Press “Next”. Specify the fields that we would like to use to make the OLAP cube. Choose only the ones that you are sure that you would like to use – choosing everything, especially if you have a large number of fields, can slow down the cube’s response time.
Press Next where we provide the location where we can store the cube.
Press Ok and the application will provide a new form asking for the path where to store the OLAP query . The moment you press ok, the OLAP cube creation process begins and may take a few seconds. This is the points where Excel pack the cube with information from the raw data file. The moment the process is complete, the application would open up the familiar Pivot Table Creation Wizard form. At this point, if the sole purpose was to make an OLAP cube, we can cancel (press esc key). If you would like to proceed, you can use specify the pivot table layout and save the file. In our case, since we wanted Excel to create the cube for us, we click Cancel. Let’s move over to Excel 2007 to see how we can use this cube to create a Pivot Table. If we go back to the specified directory, we will see a new file with a .cub extension having been created by Excel. That’s the cube !
How to use an OLAP Cube in Excel to make a Pivot Table
Open up an Excel spreadsheet and click on the Pivot Table tab from the ribbon.
Choose to “Use an external data source” and then use the “Browse for more” button to locate the .cub file that we created in the previous step.
The form picks up the OLAP cube as the data source and shows the connection name.
Save the worksheet. Now comes the good part. Just go back and look at the size of the file – in our case, we packed more than 65000 rows of data and the file size was a tiny 11 KB !!! The OLAP cube has stored all the for use by the Pivot Table reducing its footprint to a fraction of a conventional Pivot Table. You can practically do every thing that you do with a normal pivot table here – move fields around, sort, slice just as you would a normal pivot table. Another interesting thing – whenever you refresh the pivot table, the OLAP cube communicates with the raw data file and uploads fresh information but otherwise acts pretty much independent unit.
So what’s in it for the Data Miner
1. Reduced file size. You can pack a million rows in to an OLAP cube, create a pivot table out of it and have it sent to users without sweating about the file size, moving emails to offline folders or the ugly “you have used more than your authorized limit” taunts from the server
2. You can update the raw data file independent of the workbook that you may have already dispatched to users. So no worrying about having to send a fresh copy every time fresh data arrives. A single refresh of the pivot table will cause it to communicate to the OLAP cube and update itself with the newest data points. (While creating the OLAP cube, you can provide any path on the network to store it. Till the time the path is accessible to the workbook, your job is done.)
3. Showcase your skill, sit back and smile
Bonus if you’ve been able to get to this point :
Checking out if a Pivot Table is based on an OLAP Cube
Here’s how you can check if a Pivot table is based on an OLAP Cube:
Dim pvt As PivotTable
Set pvt = ActiveSheet.PivotTables(1)
MsgBox "Pivot Table uses OLAP Cube " & pvt.PivotCache.OLAP