PDV Program Data Vector
Default operation
(1) Data step is a loop – rerun until out of data. Each pass through, data are first read into the Program Data Vector
Data A; Put _all_; Input X1 X2 Y; put _ALL_//; put “Next Pass”;
Datalines;
5 6 10
2 8 20 35
4 Z 30
1
2
3
;
Proc Print noobs data=A; run;
(2) At end of each pass, contents of PDV written to data set, variables in PDV are set to
missing, _N_ incremented, file parsed for more data. Most of these defaults can be overridden.
Drop and Retain.
(1) Surprisingly, DROP and RETAIN have little to do with each other
DROP: Do not write this to the data set (still in PDV).
RETAIN: Do not reset this to missing (.) on each pass.
Note: RETAIN is executed at compile time.
Data A; put _all_; input X; drop X; XSQ = X*X; put _all_ //;
Datalines;
1
2
3
;
Proc Print; Run;
(2) RETAIN can initialize something. This is done at compile time rather than execution time.
Data A; put _all_; input Y;
drop X; XSQ = X*X; X = X+1;
put _all_ //; retain X 3;
Datalines;
1
2
3
;
Proc Print; Run;
The OUTPUT statement:
(1) When you issue an OUTPUT statement, the PDV contents are output at that point and nowhere else (unless there is another OUTPUT statement).
Data A; Input X; do i=1 to 10;
Y = X+I; *output;
end;
Datalines;
5
Proc Print; Run;
(2) You can output conditionally:
Data A; Input lbs price_per_lb;
Price = lbs*price_per_lb;
If Price < 2.00 then output;
Datalines;
19 0.10
5 0.25
8 0.75
20 0.07
;
Proc Print; Run;
USING _N_
(1) _N_ is available only in the PDV. Use it there.
Data A; Input lbs price_per_lb;
If _N_=1 then bill=0; retain bill;
Price = lbs*price_per_lb; Bill=Bill+Price*(1.07); *Tax;
Datalines;
19 0.10
5 0.25
8 0.75
20 0.07
;
Proc Print; Format bill dollar5.2; Run;
(2) T+1 automatically implies a “Retain T 0”.
Data A; Input lbs price_per_lb;
Price = lbs*price_per_lb; Bill+Price*(1.07); *Tax;
Datalines;
19 0.10
5 0.25
8 0.75
20 0.07
;
Proc Print; Format bill dollar6.2; Run;
KEEP
(1) The KEEP statement really means “Drop everything but …” so nothing else in the PDV gets
into the data set.
Data A; Input X1 X2 X3 X4 X5;
Keep X3 X4 X5; Drop X2 X3;
Datalines;
5 10 15 20 25
;
Proc Print; Run;