Notes: FACTSET Variable Code Selection and Alpha Tester Conventions
In-sample date range: 1/31/1987 – 11/31/2001
Out-of-sample date range: 12/31/2001 – 12/31/2004
(Note that using 31 as the last day of the month when specifying the date range is necessary—even when there is no 31st day of the specified month. If not used in this way, lagged variables may not work properly in alpha tester.)
(Also note that FactSet’s convention is to associate the returns from the following month with the data- and portfolio selected for the final day of a given month. For example, the first in-sample portfolio is formed on Jan. 31, 1987 based on data available on Jan. 31, 1987 and the first return recorded in-sample is the return realized in February 1987 – though this return is of course associated with the date Jan. 31, 1987.)
Standard universe screens:
NYSE, NASDAQ, AMEX only
Top 60% of these by market cap.
Fuqua Investment Analytics added the following screen:
Average daily dollar volume over the past month must be greater than $500,000.
(An improved approach might scale this threshold downward in earlier periods because a portfolio manager would have had lesser dollar-liquidity requirements because he would have been managing a lesser number of nominal dollars.)
In alpha tester, always select low fractile = low values. This convention will maintain consistency across our screens and alpha tests. (This convention was used in the work of Duke Investment Analytics for Global Asset Allocation and Stock Selection. For the second course, Quantitative Stock Selection, Fuqua Investment Analytics has chosen to associate the low numbered fractiles with high factor values.)
Also resolve ties in ranking factor by choose midpoint.
Standard formulas to use for common variables:
Note that one area of remaining concern regards whether these selected variables are properly adjusted to reflect stock splits (and dividends, etc.). Further research is presently underway to verify that these phenomena are not introducing distortions into our research.
It is also noteworthy that a dollar-volume screen is sought to add to the existing screening criteria governing universe definition. Thus far, a robust implementation of such a screen in FactSet’s Alpha Testing has proved challenging and has not yet been achieved (due to limitations on parameter availability, historical parameter accuracy, and computational capability).
Price per share:
MP(0)
Also considered— CM_P, P(0), @AVAIL(P(0),MP(0))
P(0) gives all or almost all NAs
Shares Outstanding:
MSHS(0)
Though this reflects as-of-fiscal-period-end share counts that were not necessarily available (ie not as-reported), it eliminates other problems with non-contemporaneous lagged data being used in the current period. For example, companies for which shares outstanding should have been NA were finding an old (lagged 45 days) value that, multiplied with their current day price, pushed them into the market cap consideration set of companies— this caused some 80000% returns to enter our data- problems.
Also considered— CM_SHS(0), CM_SHS(0 L45D)
For future investigation- IH_SHRS_OUT
Market Capitalization:
MP(0) * MSHS(0)
Also considered—RI_MKTCAP(0)/1000, CM_MKT_VALUE(0), @AVAIL(P(0),MP(0)) * CM_SHS(0 L45D)
Average Daily Dollar Volume Over the Past Month:
???
SUM20(P_PRICE(0)*P_VOLUME(0))
???
Problem with P_VOLUME(-1), P_VOLUME(-2), etc. is that it appears to increment backwards by calendar days, not trading days as desired. Same problem with P_VOLUME(-1,0,1000), P_VOLUME(-2,0,1000), etc. These problems appear to occur in the Universal Screening Report output, but not necessarily in the Excel environment.
The following (in blue) works in Excel, but not in Universal Screening
AVG(P_PRICE(-1M,0D,D)*P_VOLUME(-1M,0D,D))
Note that -1M bounds the date range at the last day of the preceding month. -1AM would have picked out the same day as the day referenced by 0D, but a month earlier. See FactSet Help PageID 1964.
I tested this code with regard to stock splits for one stock in a recent time period and found that historical P_PRICE and P_VOLUME numbers had indeed been accurately revised in the P_ database to align with the post-split share levels. …at least this was my initial conclusion when comparing historical volume quotes for NYSE:SM in Yahoo! Finance with those outputted from FactSet to an Excel spreadsheet. Note that, at least in the case of the ERICY reverse split, it is evident that the historical volumes listed in Yahoo! Finance ARE NOT changed to reflect the new number of shares outstanding. This matter requires further attention/review.
See FactSet Help PageID 614 for info on handling of Dividends, Stock Splits, and Spinoffs.
IB_VOL_1D does not appear to increment backwards through time (at least not in Universal Screening Report (i.e. IB_VOL_1D gives same value as IB_VOL_1D(-1) or IB_VOL_1D(-100)).
Note that median dollar volume would be preferable to mean dollar volume so that our estimated liquidity of a given stock is not thrown off by a one day spike in volume that enters the trailing window sample period.
Book value per share:
AVAIL(CM_BK(0 L45D), G_BOOK_PS_USD(0 L45D))
CM_BK(0) appears to update quarterly as-of-fiscal-period-end
CM_BK(0) appears to have less missing data (NAs) than G_BOOK_PS_USD(0)
G_BOOK_PS_USD appears to update annually as-of-fiscal-period-end
After finding problems with the G_ dividend variables using current rather than historical values in alpha tester, I re-checked these G_ variables and fortunately did not find this same problem.
For future investigation: CQ_ (book value numbers that begin with “CQ_”)
Book Value-to-Price:
AVAIL(CM_BK(0 L45D), G_BOOK_PS_USD(0 L45D))/MP(0)
It looks as if CM_BK and G_BOOK_PS_USD may only update annually in the 88-89 timeframe. Ideally, we would find a variable that updates quarterly.
Also considered— CM_PBK(0), G_PBK(0)
CM_PBK and G_PBK appear to contain the same data series
Trailing EPS:
AVAIL(IH_EPS_ACT_LTM(0), CM_EPS(0 L45D))
Forward EPS Estimate:
IH_MEAN_NTM(0) (A)
Alternates:
AVAIL(IH_MEAN_NTM(0), AVAIL((G_IBES_FY1_MEAN_USD(0)+IH_MEAN_NTMYR(0))/2, AVAIL(IH_MEAN_NTMYR(0), G_IBES_FY1_MEAN_USD(0)))) (B)
AVAIL(IH_MEDIAN_NTM(0), G_IBES_FY1_MED_USD(0)) (C)
IH_MEDIAN_FY2(0) (D)
After finding problems with the G_ dividend variables using current rather than historical values in alpha tester, I re-checked these G_ variables and fortunately did not find this same problem.
Also considered—These don’t look like they offer any additional beneficial information over those already considered above: IH_MEAN_FY1, IH_MEAN_FY1R, G_PRICE_USD(0)/G_PE_IBES_EST_FY1, G_PRICE_USD(0)/G_PE_IBES_EST_FY1D, G_PRICE_USD(0)/G_PE_IBES_EST_NTM, G_PRICE_USD(0)/G_PE_IBES_EST_NTMD, CM_P(0)/CM_PE_IBES_FY1, CM_P(0)/CM_PE_IBES_FY1R
Dividend Yield:
AVAIL(CM_DIV_YLD(0), IH_DIV(0)/MP(0)*100)
CM_DIV_YLD has fewer NAs than IH_DIV. Many IH_DIV NAs are 0’s in CM_DIV_YLD. CM_DIV_YLD also appears to contain better data for ADRs than IH_DIV. Still, there are rare instances where CM_DIV_YLD has NA when IH_DIV has a value. Thus, AVAIL function is used.
Also considered—CQ_DIVS_TOT
G_DIVS_PS_USD, G_COM_DIVS_TOTAL, G_DIV_YLD appeared to return current value only, not historical values as desired in alpha testing. Dangerous! Looks like IC_ variables in this respect.
CQ_DIVS_TOT_LTM, CQ_DIV_YLD, CQ_DIV_RATE_ANNUAL all returned all NAs.
CM_DIV, CQ_DIVS_PS_PDATE and CQ_DIVS_PS may report double the actual quarterly dividend amount because if two dividend ex-dates or payments happen to occur in the same fiscal quarter, it will report the sum of them rather than one or the other.
Implied Cost of Capital(ICC):
(A)Uses Forward EPS Estimate method C (median estimate NTM or FY1) for E1 in approximation of implied cost of capital.
(D)Uses Forward EPS Estimate method D (median estimate FY2) for E1 in approximation of implied cost of capital.
Interest Expense:
CA_INT_EXP_LTD(0) too many NAs.
CQ_INT_EXP_LTM(R22 0), CQ_INT_EXP_LTM(0), I_INTEREST_EXP(USD) return NAs.
SUE:
Note that it appears SUE data is not present in database until 8/31/89.
Large anomalous-appearing returns that checked out real:
IDCC 19991130 641%
OCCF 19960430 376%
REGN 20000130 359%
ARXX 20000131 283%
Large anomalous-appearing returns that checked out bad or questionable:
Cusip Ticker Date
90011120 TKC 20010928 bad 1536%
03512820 AU 19980630 questionable 442%
22025Y40 CXW 20000831 bad 389%