Thursday, December 6, 2007

Tools of the Trade I: The Reporting System Architecture

Want to know more about your customers' buying behavior? What was the impact of the $5 coupons that you handed out at the metro station? Should you hand out more? Is there really a correlation between the sales of diapers and beer in your store after midnight? What other items are these grumpy dads buying? Nachos to go with the beer? Or do they prefer pretzels? In today's blog entry, I want to discuss about the tools you will be using for your analytics strategy so that you can answer questions like these. Specifically, I want to show you how to put together an analytical reporting system using SAS and Microsoft Excel to analyze your data. This system will to pull data from various sources, scrub it clean, perform your analyses, and output the result(s) in accordance to your needs. The architecture that I have shown below should be sufficient to solve a large number of your analytics problems; basically I think that you're limited only by your imagination. Most large firms have SAS licenses and almost everyone with a PC has Microsoft Excel. Both softwares work seamlessly together and create a very powerful analytics tool; however, I have noticed most companies use only a fraction of the capabilities that these two systems offer. A reason why these two products complement each other so well is because they counterbalance their weaknesses. For instance, SAS can read in hundreds of millions of records and perform very sophisticated statistical analyses on very large datasets. SAS, however, can't produce professional looking deliverables with graphs and charts that you can hand out at a meeting. Excel, on the other hand, can produce pretty looking deliverables, but is severely restrained on the number of records it can work on (65,536 rows per worksheet). The best approach, therefore, is to have SAS do the heavy lifting, and Excel do the trimmings. A note to freshman analysts. If you can, at the very least, learn and execute the components needed to implement the system shown below, there will always be food on your table and a roof over your head.



The structure above has four main components. These are:

Extracting the Data
I'll show how to extract data from different types of files. In one example, we will read in the entire data from an Excel file. In another example, we will read in data from a specific row (In case someone provided you an Excel file with headers and titles that you want to exclude when extracting data). We will also read in data from a comma delimited file and a text file. I will also tell you what file formats are preferable in what circumstances (extracting from text files gives you the greatest level of control on what you want to read into your program, Excel gives you the least. However, Excel being a very popular file format means that you may be receiving a lot of your data in Excel format). In another example, we will extract data from a DB2 database (i.e., if you want to read data from your corporate databases) using SQL code in a pass-thru program (if your organization prefers Oracle, similar code can be used to extract data from Oracle databases too). A lot of people don't know that SAS has specific code that produces a GUI to add or delete individual records from a table (think of it as a back door approach to handle one-sies and two-sies, so you don't need to run the data extraction programs again). We will utilize that code too.

Data Validation
After reading in the data, you will want to ensure that it is not corrupted or useless for your purpose. In this stage, we will employ common sense techniques and filtering procedures to ascertain the quality and integrity of our data.

The Analytics Engine
This is the heart of your system, where all the 'magic' takes place. Whether four lines of a regression datastep or 120 pages of extremely sophisticated linear programming code to execute an optimization strategy or pick stocks, think of it as a black box where raw data enters in one end, and actionable information comes out the other end. This black box will contain all your business rules.

Reporting Your Output
SAS allows us to output data in Excel, comma delimited, or text formats. Moreover, one of the nice features about SAS is that it allows us to kick off Excel VBA macros from the program itself. We will use this feature to automatically format the Excel workbook to which SAS outputs our results, eliminating the need to manually pretty up our deliverable. The idea is to have the deliverable ready as soon as the SAS program finishes its run, as well as eliminate any unnecessary manual work or possibility for human errors. SAS also allows us to output data in HTML format to a specific directory. Therefore, we can output the results to an internet directory and have a web based report available to everyone with a web browser, preferably at the same time the spreadsheet deliverables are available for distribution. Think web based Dashboards with hard copy deliverables available instantaneously for distribution.

Over the next several weeks, I'll be elaborating further on how each of these system components work. I might also be tweaking the overall system to introduce improvements, such as introducing a control program and scheduling reports to run automatically. Finally, I will be providing several actual examples where this architecture has been implemented.

1 comment:

Effort Reporting System said...

One of the principal features of the new system is that service requesters will be able to track the progress of their requests online.