The structure above has four main components. These are:
Extracting the Data
I'll show how to extract data from different types of files. In one example, we will read in the entire data from an Excel file. In another example, we will read in data from a specific row (In case someone provided you an Excel file with headers and titles that you want to exclude when extracting data). We will also read in data from a comma delimited file and a text file. I will also tell you what file formats are preferable in what circumstances (extracting from text files gives you the greatest level of control on what you want to read into your program, Excel gives you the least. However, Excel being a very popular file format means that you may be receiving a lot of your data in Excel format). In another example, we will extract data from a DB2 database (i.e., if you want to read data from your corporate databases) using SQL code in a pass-thru program (if your organization prefers Oracle, similar code can be used to extract data from Oracle databases too). A lot of people don't know that SAS has specific code that produces a GUI to add or delete individual records from a table (think of it as a back door approach to handle one-sies and two-sies, so you don't need to run the data extraction programs again). We will utilize that code too.
Data Validation
After reading in the data, you will want to ensure that it is not corrupted or useless for your purpose. In this stage, we will employ common sense techniques and filtering procedures to ascertain the quality and integrity of our data.
The Analytics Engine
This is the heart of your system, where all the 'magic' takes place. Whether four lines of a regression datastep or 120 pages of extremely sophisticated linear programming code to execute an optimization strategy or pick stocks, think of it as a black box where raw data enters in one end, and actionable information comes out the other end. This black box will contain all your business rules.
Reporting Your Output
SAS allows us to output data in Excel, comma delimited, or text formats. Moreover, one of the nice features about SAS is that it allows us to kick off Excel VBA macros from the program itself. We will use this feature to automatically format the Excel workbook to which SAS outputs our results, eliminating the need to manually pretty up our deliverable. The idea is to have the deliverable ready as soon as the SAS program finishes its run, as well as eliminate any unnecessary manual work or possibility for human errors. SAS also allows us to output data in HTML format to a specific directory. Therefore, we can output the results to an internet directory and have a web based report available to everyone with a web browser, preferably at the same time the spreadsheet deliverables are available for distribution. Think web based Dashboards with hard copy deliverables available instantaneously for distribution.
Over the next several weeks, I'll be elaborating further on how each of these system components work. I might also be tweaking the overall system to introduce improvements, such as introducing a control program and scheduling reports to run automatically. Finally, I will be providing several actual examples where this architecture has been implemented.
1 comment:
One of the principal features of the new system is that service requesters will be able to track the progress of their requests online.
Post a Comment