PDV Program Data Vector

Default operation

(1) Data step is a loop – rerun until out of data. Each pass through, data are first read into the Program Data Vector

Data A; Put _all_; Input X1 X2 Y; put _ALL_//; put “Next Pass”;

Datalines;

5 6 10

2 8 20 35

4 Z 30

1

2

3

;

Proc Print noobs data=A; run;

(2) At end of each pass, contents of PDV written to data set, variables in PDV are set to

missing, _N_ incremented, file parsed for more data. Most of these defaults can be overridden.

Drop and Retain.

(1) Surprisingly, DROP and RETAIN have little to do with each other

DROP: Do not write this to the data set (still in PDV).

RETAIN: Do not reset this to missing (.) on each pass.

Note: RETAIN is executed at compile time.

Data A; put _all_; input X; drop X; XSQ = X*X; put _all_ //;

Datalines;

1

2

3

;

Proc Print; Run;

(2) RETAIN can initialize something. This is done at compile time rather than execution time.

Data A; put _all_; input Y;

drop X; XSQ = X*X; X = X+1;

put _all_ //; retain X 3;

Datalines;

1

2

3

;

Proc Print; Run;

The OUTPUT statement:

(1) When you issue an OUTPUT statement, the PDV contents are output at that point and nowhere else (unless there is another OUTPUT statement).

Data A; Input X; do i=1 to 10;

Y = X+I; *output;

end;

Datalines;

5

Proc Print; Run;

(2) You can output conditionally:

Data A; Input lbs price_per_lb;

Price = lbs*price_per_lb;

If Price < 2.00 then output;

Datalines;

19 0.10

5 0.25

8 0.75

20 0.07

;

Proc Print; Run;

USING _N_

(1) _N_ is available only in the PDV. Use it there.

Data A; Input lbs price_per_lb;

If _N_=1 then bill=0; retain bill;

Price = lbs*price_per_lb; Bill=Bill+Price*(1.07); *Tax;

Datalines;

19 0.10

5 0.25

8 0.75

20 0.07

;

Proc Print; Format bill dollar5.2; Run;

(2) T+1 automatically implies a “Retain T 0”.

Data A; Input lbs price_per_lb;

Price = lbs*price_per_lb; Bill+Price*(1.07); *Tax;

Datalines;

19 0.10

5 0.25

8 0.75

20 0.07

;

Proc Print; Format bill dollar6.2; Run;

KEEP

(1) The KEEP statement really means “Drop everything but …” so nothing else in the PDV gets

into the data set.

Data A; Input X1 X2 X3 X4 X5;

Keep X3 X4 X5; Drop X2 X3;

Datalines;

5 10 15 20 25

;

Proc Print; Run;