Indexing
An index is an optional data file created for a SAS data set to provide direct access to specific observations within the data set. The index file consists of entries that are organized in a tree structure, and connected by pointers. Your program knows where to look, and avoids a wasting a lot of time skimming through unnecessary data. Indexes can be created in a data step or an SQL statement.
Used the following code to create an index in a data step:
data flights
(index = (origin /* Creating a simple index */
dteflt = (fltDate flightID/unique)); /* Creating a composite index */
infile 'UPS_FLIGHTS.dat';
input flightID $7. routeID $7. fltDate date9. origin $3. dest $3. distance 8.
;
run;
Later, after you're finished with that piece of code and you want to delete the index:
proc datasets library = work nolist;
modify flights;
index delete origin;
index delete dteflt;
quit;
Use the following code to create an index in a SQL statement:
proc sql;
create index origin
on flights (origin);
create unique index dteFlt
on flights (fltdate, flightID);
quit;
If you want to delete the index in a SQL statement:
proc sql;
drop index origin
from flights;
drop index dteFlt
from flights;
quit;
Conditional Merges
Conditional merges are more complicated than conventional merges. In certain instances, however, conditional merges can work much faster than conventional merges.
A conditional merge matches observations in a secondary dataset with the observations in a primary dataset. The merge steps through the primary dataset one observation at a time, seeking “matching records” in the secondary dataset.
This matching is done within the context of an ‘Event Segment’. An event segment can be defined as the set of observations in the primary dataset which shares the same values with the observations belonging to the secondary dataset for a given number of variables (i.e., trade date, issue, firm, etc.). A set of rules within the event segment allows the records from the secondary dataset to merge with the records from the primary dataset.
In our example, we created five new variables to facilitate the conditional merge logic:
FirstZ – The FirstZ variable indicates the start of an event segment.
NextZ – The NextZ variable indicates where the pointer within a particular event segment.
Zloc The Zloc variable indicates whether the record for the secondary dataset is before, in or after a particular event segment.
Continue – The Continue variable tells the whether to process the data (i.e., execute event segment rules) within an event segment or move to the next event segment. Continue set to ‘1’ implies that the secondary record is still within the event segment, and the processing should continue. Continue set to ‘0’ implies the pointer for the secondary record is after the event segment, and processing should start for the next event segment.
Noobs - Number of records in the secondary dataset for a particular event segment.
In the conditional merge, the records in the secondary dataset are merged to that of the primary dataset by the set of common variables (which also determine the event segment). If the values belonging to any of these variables in the secondary dataset are less than that of the primary dataset, Zloc is given a value of ‘-1’. If the value is greater, Zloc is given a value of ‘1’. If the values are the same, Zloc is given a value of ‘0’.
Zloc equals ‘-1’ implies that the pointer for the secondary record is before the event segment. The value for NextZ is incremented until it matches the value for FirstZ (Start of the event segment). When NextZ exceeds the number of records in the secondary dataset for the event segment, Continue is reset from ‘1’ to ‘0’ prompting the pattern to evaluate the next event segment. FirstZ (i.e., start of the next event segment) is reset to correspond with NextZ.
If Zloc equals ‘1’, the pointer for the secondary record is below the end to the event segment. Continue is reset to ‘0’ prompting the pattern to evaluate the next event segment. NextZ is reset to correspond with FirstZ (i.e., start of the next event segment).
If Zloc equals ‘0’, the pointer for the secondary record is within the event segment. During this stage, the merge will execute any rules specified within the event segment. For example, if there is a rule to put a new value for a field (i.e., Change ‘N’ to ‘Y’ for a Flag) when the primary and secondary datasets variables (i.e., Sell Date = Buy Date) are the same, that rule will be executed within the event segment. The pointer for the secondary record is also incremented by 1, causing the pointer to move to the subsequent secondary record.
At the end of the conditional merge, we should have a dataset which has all the primary records aligned with the secondary records that satisfy the rules within a particular event segment.
The code of a condition merge is given below:
data merged_data (drop=firstZ continue Zloc [any columns you don’t need to output]);
retain nextZ 1 firstZ 1 Zloc break 0;
set primary_data;
by pdate pitem ptime pprice;
continue = 1;
do while (continue=1 and nextZ<=Zcount);
set secondary data
nobs=Zcount point=nextZ;
if sdate LT pdate then Zloc = -1;
else if sdate GT pdate then Zloc = 1;
else do;
if sitem LT pitem then Zloc = -1;
else if sitem GT pitem then Zloc = 1;
else do;
if stime LT pitem then Zloc = -1;
else if stime GT pitem then Zloc=1;
else Zloc = 0;
end;
end;
/* before window, step forward in secondary unless at end */
if Zloc = -1 then do;
if nextZ LT Zcount then nextZ + 1; else continue = 0;
firstZ = nextZ;
end;
/* after window, reset the cursor to start of window and do next primary */
if Zloc = 1 then do;
continue = 0;
nextZ = firstZ;
end;
/* in window, step thru, check logic, but leave the cursor as is */
if Zloc = 0 then do;
/*Put business logic here*/
If sprice GE pprice + (pprice*0.1)
if nextZ LT Zcount then nextZ + 1; else continue = 0;
end;
end;
if sprice GE (pprice + (pprice*0.1)) then do;
output;
end;
run;
Hash Objects
Hash objects are a new addition to SAS, and are supposed to speed up your program while making the most efficient use of memory. An unfortunate drawback of this technique is that has hash objects cannot be used in SQL statements, only in data steps. In the example below, we are trying to merge the variables for the participant data set with those in the weight table. The merged data set is called results. If we try to perform a conventional merge procedure, we would have create two additional data steps to sort the participants and weight data sets by name, thus causing the program to require more memory as well as increasing its run time.
data participants;
input name $ gender $1. treatment $;
datalines;
John M Placebo
Ronald M Drug_A
Barbara F Drug_B
Alice F Drug_C
;
data weight (drop = i)
input data DATE9. @;
do i to 4;
input name $ weight @;
output;
datalines;
05MAY2006 Barbara 125 Alice 130 Ronald 170 John 160
04JUN2006 Barbara 122 Alice 133 Ronald 168 John 155
;
data results;
length name treatment $ 8 gender $ 1;
if _n_ = 1 then do;
declare hash h (dataset: 'participants');
h.defineKey('name');
h.defineData('gender','treatment');
h.defineDone();
end;
Set weight;
if h.find() = 0 then
output;
run;
proc print data = results;
format date DATE9.;
var date name gender weight treatment;
run;
Format tables
Format tables are similar to hash objects, although slightly trickier to implement. This method requires an additional step, which is creating the format table. Subsequently, the values of the format table can be appended to a dataset based on a key variable that resides in both format table and dataset, similar to the hash object shown above. In the example below, we use the format procedure to add the names of the stock symbols to the indat dataset, and calculate the count of stock issues. Alternatively, we could have sorted both the indat dataset and a dataset containing the issue names, merged them together and performed a summary or means procedure to get the number of issues. Our method eliminated two sort procedures, one merge procedure and one summary/means procedure. When analyzing millions of records, time and memory space savings from eliminating four procedures can be significant.
First, read in your data set.
data indat;
input @1 name $3. @5 ask 5.2 @11 bid 5.2; cards;
IBM 16.25 16.12
AOL 17.00 17.06
AOL 16.25 13.02
IBM 16.25 16.05
IBM 18.25 17.02
FNM 18.00 18.06
FNM 18.25 17.02
FRE 18.25 17.02
;
run;
Second, create your format table.
proc format;
value $Symbol
'IBM'='IBM'
'AOL'='America Online'
'FNM'='Fannie Mae'
'FRE'='Freddie Mac';
run;
When performing your analyses, recall the values from your format table.
proc freq data=indat;
tables name /list;
format name $Symbol.;
title "Issue Count";
run;
14 comments:
شركة كشف تسربات المياه بالرياض
لكافة خدمات التسليك وكشف التسربات ستجدون شركتنا فى المقام الاول كأفضل شركة كشف تسربات المياه بالرياض والتى تتخصص فى الكشف السريع والدقيق لمكان التسربات فى شبكات المياه ومواسير الحمامات والصنابير المنزلية وخطوط الغاز كشف تسربات المياه , كما تعتمد الشركة على افضل عماله بشرية كفء فى هذا المجال وكافة الخدمات المنزلية الاخرى بأرخص الاسعار والتكاليف المتاحه فى سوق الرياض
افضل شركة كشف تسربات المياه بالرياض
نظرا لان شركتنا بالرياض هى الافضل فهى تنصح عملاؤها الكرام بتجنب الاتى لعدم تعرضهم لمشكلة التسربات حيث
اسعار تكييفات 2017
اسعار تكييفات 2018
عنوان مبيعات يونيو اير
5شارع عامر امين القومية العربية بجوار الدائري
12561 إمبابة
مبيعات يونيو اير
يونيون اير
صيانة يونيون اير
تكييف يونيون اير
تكييف يونيون اير
خدمة عملاء يونيون ايرجروب
رقم خدمة عملاء unionaire
شركة يونيون اير
تكيفات يونيون ايير
اسعار تكيفات يونيون ايير
شاشات يونيون اير
رقم يونيون اير 19058
صيانة unionaire
توكيل يونيون اير
تكييفات بالتقسيط
تكييفات يونيون اير
الخطوط الارضية 0233580534/0233564318/0233564320/0233565322/0233564325/0233509275/023509267/0233509268
عنوان مبيعات شارب
4شارع عامر امين القومية العربية بجوار الدائري
12561 إمبابة
مبيعات شارب
شارب
صيانة شارب
تكييف شارب
سعر تكييف شارب
خدمة عملاء شارب
رقم خدمة عملاء شارب
شركة شارب
تكيفات شارب
اسعار تكييفات شارب
شاشات شارب
رقم شارب 19058
صيانة شارب
توكيل شارب
تكييفات بالتقسيط
تكييفات شارب
مبيعات شارب
الخطوط الارضية 0233580534/0233564318/0233564320/0233565322/0233564325/0233509275/023509267/0233509268
شراء سيارات تشليح
شراء سيارة تشليح
تشليح سيارات
اطلب الان خدماتنا المميزة في تشليح السيارات
شراء سيارات تشليح
تعرف على خدمات سطحة الرياض لنقل السيارات المصدومة والعطلانة والتالفة داخل المملكة العربية السعودية بافضل سعر
" like Do you have a vacation car in the Kingdom of Saudi Arabia and you are looking for distinctive services such as تشليح سيارات
We offer you the best sites specialized in the field of cars in Saudi Arabia, especially since this field requires expertise and efficiency and this is what we are completely distinguished by."
Do you have a vacation car in the Kingdom of Saudi Arabia and you are looking for distinctive services such as تشليح سيارات
We offer you the best sites specialized in the field of cars in Saudi Arabia, especially since this field requires expertise and efficiency and this is what we are completely distinguished by.
يمكنك التعرف على المزيد من الخدمات المميزة التي نقدمها لك في هذا الاطار داخل المملكة العربة السعودية ك عزيزي العميل شراء سيارات عطلانة والتي تعتبر واحدة من افضل خدمات البيع والشراء على الاطلا.
تشليح مكة شارع الحج
An alternative title for this post could have been ways to avoid sorting your data. Sort procedures in SAS are notorious memory hogs, and can radically slow down your program.
تشاليح جدة لشراء السيارات
An index is an optional data file created for a SAS data set to provide direct access to specific observations within the data set. The index file consists of entries that are organized in a tree structure
صيانة بيكو
صيانة بوش
صيانة براندت
صيانة باناسونيك
صيانة اندست
صيانة فريجيدير
صيانة فاجور
صيانة طومسون
صيانة سيمنس
صيانة سيلتال
صيانة ميلا
صيانة كارير
صيانة كلفينيتور
صيانة كريازي
صيانة كاندي
Post a Comment