Using a Screening Instrument for Domestic Violence in Welfare Reform Programs

Using a Screening Instrument for Domestic Violence in Welfare Reform Programs

Screening for AOD/MH/DV Page 1

The CalWORKs Project


Substance Abuse, Mental Health and Domestic Violence Issues in Welfare Reform Programs:

Technical Manual

By Daniel Chandler

March 2000

Screening for AOD/MH/DV Page 1



California Institute for Mental Health (

2030 J Street

Sacramento, CA 95814

916-556-3480 (x-106)

916-446-4519 FAX

Sandra Naylor Goodwin, Executive Director/Project Director

Joan Meisel, Policy and Practice Consultant

Daniel Chandler, Research Director

Pat Jordan, Consultant

Shelley Kushner, Project Coordinator

Debbie Richardson Cox, Project Assistant

Children & Family Futures (

4940 Irvine Blvd., Suite 202

Irvine, CA 92620


714-505-3626 FAX

Nancy K. Young, Director

Sid Gardner, President

Karen Sherman & Shaila Simpson, Associates

Family Violence Prevention Fund (

383 Rhode Island Street, Suite 304

San Francisco, CA 94133


415-252-8991 FAX

Janet Carter, Managing Director

Kelly Mitchell-Clark, Program Manager

Cindy Marano, Consultant

Carol Ann Peterson, Consultant

Acknowledgement: We appreciate the generous financial support of the National Institute of Justice, Violence Against Women Office. Additional funding has been provided by California counties, the Wellness Foundation and the David and Lucile Packard Foundation.

Screening for AOD/MH/DV Page 1


I. An introduction to screening

A.Screening and trust

B.Screening instruments as a method of identification

Ii. Screening instruments, gold standards, and cut-points

A.Screening for mental health diagnoses

Recommendations for mental health screening instrument and cutpoints

B.Screening for problems with alcohol

Recommendations for alcohol screening instrument and cutpoints

C.Screening for problems with illicit drugs

Recommendations for illicit drug screening instrument and cutpoints

D. Screening for domestic violence

recommendations for domestic violence screening instrument and cutpoints

Appendix 1: Calculation of specificity, sensitivity and positive predictive value

Appendix ii: Age and partner specific tables for domestic violence

Screening for AOD/MH/DV Page 1



Screening. This report provides technical information to administrators who are considering using screening instruments to assist in identifying women with AOD/MH/DV issues. It should be used in conjunction with the Screening Guide being published at the same time and available from the California Institute for Mental Health or on the web at:

Screening needs to be distinguished from assessment and “appraisal.”

  • Screening is the use of a simple, usually brief, set of questions that can indicate the need for a thorough assessment of AOD/MH/DV issues. The questions can be self-administered or administered by a staff member who may or may not be an AOD/MH/DV specialist. The outcome of a “positive screen” is a referral to a specialist for an assessment. We are also specifically referring to “instruments,” that is sets of questions that have been scientifically validated.[2]
  • Assessment is the detailed evaluation of diagnosis, severity, and history that is necessary in order to determine whether treatment or services are appropriate and if so to design a treatment or service plan.
  • Appraisal refers to one of the formal steps in the CalWORKs Welfare-to-Work process. We do not use this term in relationship to AOD, MH, or DV issues.

One of the key reasons that these short screening instrument have been little used[3] is that their psychometric properties have never been described for a population receiving welfare, particularly in a welfare reform context. This report uses information from CalWORKs Project research in two California counties to choose between and validate screening instruments for AOD, mental health and domestic violence.

This Manual provides the background and rationale for recommendations we offer to administrators and practitioners regarding which instruments to use and which cut-points to use to achieve different ends.

Trust. These instruments only “work” in a context of trust and helpfulness. Although the questions are not “direct”—such as how often do you smoke marijuana—their purpose can be surmised by respondents. Honest answers can only be expected if the screens are administered in a setting in which CalWORKs applicants believe that if they divulge sensitive information it will be used to help them. There are many possible ways of establishing such a trusting setting, including using the screening instruments in the context of a home visit.

A caution is also in order: the insensitive use of screening instruments may well cause distress or distrust among CalWORKs applicants or recipients. In fact, it is better to avoid screening instruments entirely if a trusting context cannot be established. The issue of establishing trust is discussed at considerable length in the Guide.

The CalWORKs Project Research In Kern And Stanislaus Counties Tested Screening Instruments for AOD, MH and DV.

The CalWORKs Project research in Kern and Stanislaus counties offered an opportunity to validate screening instruments for AOD/MH/DV conditions with welfare recipients. The research included interviews with 347 women who had applied for cash aid (CalWORKs) in Stanislaus County and 356 women who were on-going recipients of cash aid under CalWORKs in Kern County. The Kern sample included 18 undocumented workers and 31 women with a disability—none of whom were required to participate in Welfare to Work activities. We have included these women in this analysis because a) their prevalence rates are very similar to the Welfare to Work group and b) we have recommended that counties provide Welfare to Work service opportunities to sanctioned or exempted women.[4] Since the research interview classed each person as having or not having domestic violence issues within the past 12 months, mental health issues within the past 12 months, or AOD abuse or dependence within the past 12 months, these classifications are used as a “gold standard” with which to compare the results of screening instruments. The methodology for this research, and the characteristics of the samples, are described in a recent CIMH report available on the web.[5] The overall prevalence rates we found are summarized in Table 1.

Please note that the respondents in the research are all women, so the properties of the tests described below apply only to women. In general, it has been found that prevalence and cut-points differ for men and women, so extrapolating to male CalWORKs recipients may not be valid. And the domestic violence screening instruments we tested were specifically designed for use with women.


There are many approaches to identifying CalWORKs recipients who have AOD/MH/DV issues. They range from direct personal questions to observation of signs or symptoms to self-disclosure by the recipient. How counties are approaching the issue of identification is considered at some length in the CalWORKs Project Six County Case Study.[6] The use of brief screening instruments should be considered in the larger context of a county’s identification efforts.

Table 1: Overall prevalence of AOD/MH/DV diagnoses/issues in CalWORKs Project Research in Kern County and Stanislaus County: Summer of 1999

Number Of Conditions / Kern Recipients
(N=347) / Stanislaus Applicants
None / 45% / 30%
One Only / 34% / 38%
Two Only / 19% / 26%
Three / 2% / 6%
TOTAL / 100% / 100%

Properties of Screening Instruments

Our companion report, the Screening Guide, discusses the advantages and disadvantages of using screening instruments. It also provides step-by-step factors to consider in designing a screening program. In this Technical Manual we present the considerations relevant to a choice of screening instruments and the data that back up our recommendations for which instruments to choose and which threshold or “cut point” indicates need for a formal assessment.

Costs of screening. Screening instruments have three types of costs. 1) There is a cost for administering instruments. 2) There is a cost for false positives, as assessments are expensive. And 3) there is the most important cost of false negatives—persons who need assistance but who our test wrongly leads us to believe do not need help. These costs depend in large part on the psychometric properties of the test. An ideal test would be cheap to administer, identify virtually all persons in need and not wrongly identify many of those not in need. The ideal is rarely realized in any context, but is very unlikely in screening for AOD/MH/DV issues. Thus administrators will need to use the properties of the screening instruments we describe below to balance the three different costs to attain an optimum for their particular agency. The advantage of having known psychometrics—which we present below—is that this balancing can be conducted knowledgeably rather than blindly.

Instrument requirements. In looking for screening instruments we wanted a very short set of questions (generally no more than five) that do not arouse anxiety by their directness. Instruments also had to be readily scoreable and interpretable to recipients.[7]

Context for administration. Although we will consider this issue more later on, it is important to recognize that the context for validation of the screening instruments did not match that which would be involved in their actual use. In a research interview in which absolute confidentiality and privacy is promised, CalWORKs recipients may be more ready to answer screening questions than they would be in a CalWORKs office or community based organization. The second stage of this process is developing—with some pilot counties—procedures that will maximize the efficiency of the screening instruments that this phase has validated with the population.

Even in the research context there was one important difference between how we administered screens for domestic violence and mental health compared to those for AOD. The screening questions for the former were asked prior to the gold standard questions—the usual way of testing instruments. However, we were concerned that the alcohol and drug screening questions might bias answers to the gold standard instruments, so we put the screening questions at the end. Although they “fit” better there, they may be affected by the participant having already answered a series of related questions.

The effects of age and race/ethnicity. Because test properties, as well as prevalence, may change from subpopulation to subpopulation we also explore the effects of age and race on the properties of screening tests. It is important to know how stable the properties of the screening instruments are and how to adjust their cut-offs to take account of subpopulation differences.

What counts as a “no” answer on a screening question. One important qualification is that all questions are converted into a “yes” or “no” answer, with the “yes” being those who were asked the question and said “yes” and “no” being all others. The reason for this is two-fold:

  • Because of the way the skips in the survey were arranged there were questions not asked of everyone because a previous response had skipped them out of the section, so we used the skip out questions in lieu of a “no” for those who were not asked. For example, women who skipped out of the alcohol section because they did not drink at least 12 drinks during the previous year were presumed to answer no to all of the questions on the screen for alcohol abuse.
  • The alternative would have been to have used the responses only from those who answered the specific questions. This means the prevalence rate would be higher than we found overall, since only those at risk were asked all the screening items. It also does not correspond to the screening situation we think will be most common: a set of questions asked of all CalWORKs participants. If the screening were limited to persons for whom there was already reason to suspect high risk, then our approach would not be as useful.

What makes a screen good. In comparing each screening instrument with its “gold standard” equivalent there are several factors we look at[8]:

  • How good is the instrument at detecting persons who are “positive” for the condition? This is usually termed the “sensitivity” of the screening instrument. Those who have the condition but not detected are called “false negatives.”
  • How good is the instrument at screening out those who really do not have the condition? This is called the “specificity” of the instrument. Those who do not have the condition but the screening instrument says they do are called “false positives.” If the rate of false positives is high it means the instrument will be inefficient, referring many to assessments who will turn out negative.
  • In actual usage, we would not know in advance how a person scored on the “gold standard,” so a very important measure is the “positive predictive value” of the screening instrument. This tells us what percentage of those who score positive on the screening instrument are likely to score positive on a more detailed assessment. (The positive predictive value is directly related to sensitivity and specificity.)
  • If instead of trying to identify persons with a condition, we were interested in “ruling out” persons (saying, for example, here is a group for which we do not have to worry about mental health problems), then we look at negative predictive value. It tells us the percentage of those screening negative who would actually be negative if assessed. Generally, we would want a very high negative predictive value before saying definitely someone does not have a AOD/MH/DV condition.
  • Each of these values is a “raw” or uncalibrated measure. We also calculated calibrated versions of the test performance statistics. These are most suited to choosing between tests.[9] Calibration is important because it takes out the effects of the level of the test and the prevalence rate. For example, if a test were set to be positive for 99 percent of the sample, it would also include 99 percent of those with a diagnosis—but would not be a good test. Similarly, even a random sample will generate a “hit” rate equal to the prevalence rate. To deal with this issue, the values are recalibrated to go between a zero when the test does no better than a random sample of the population (with a given prevalence) to 1.00 when it identifies all those with a diagnosis over and above a random sample.

a)We also report the Receiver Operating Curve (ROC) proportion for each test. This is a measure of the overall power of the screening questions to predict a diagnosis, independent of cut-point. It can range from .5, if the model has no predictive power, to 1.00 if it predicts 100 percent of the gold standard cases. The ROC proportions range from about .70 to .98 for the screening instruments presented here, so there is a wide range of performance across screens and diagnoses.

Inconsistent items. If a four item test is to be scored using a cut-off of 1, 2, or 3, then the items should be consistent in that each item has the same likelihood of predicting a positive on the gold standard or that the probabilities of the different items are “nested” so that all those who answer yes to D, will have answered C, B and A, etc. Not all of the instruments met this criterion.

A practical example

As an aid to understanding these test properties, try figuring out the results for a population of 1000 new applicants for aid. For example, using the statistics below taken from Table 2 (showing information for the MH5 with “Major Depression” as the gold standard and a cut-point of .20), we would expect:

1)Out of the 1000 persons there would be 270 who were “truly” depressed. (27% prevalence times 1000)

2)Using the .20 cut-off in this illustration, however, we would identify 480 as positive on the test and needing to be assessed. (48% is the percent positive, or level of the test; so 48% times 1000)

3)Those scoring positive on the test would include 89 percent of the 270 who are depressed or 240 persons. (Sensitivity of 89% times the 270 true positives.)

4)Since there are 270 true positives there are 730 true negatives. The 63 percent specificity means that the test correctly identifies as “negative” 63 percent of the true negatives (460 persons).

5) Out of the 480 who scored positive, 45 percent or 216 would be identified as being depressed upon an assessment (positive predictive value).

6)Of those with a negative on the test, 90 percent would be found to be a true negative if we were to do an assessment on them (negative predictive value).

7)The ROC of .80 indicates an overall moderate predictive capacity.

Illustration taken from Table 2.

Prevalence / Percent Positive on Test / Sensitivity / Specificity / Positive Predictive Value / Negative Predictive Value / Receiver Operating Curve
Cutoff =..20 / 27% / 48% / 89% / 63% / 45% / 90% / .80



Choice of screening instruments to be tested

Although there are numerous mental health screening instruments available, the psychometric properties of most are quite similar, with longer instruments not performing significantly better than short ones.[10] We selected the mental health instrument that has seemed to perform best in a variety of trials (it is sometimes called Mental Health Inventory or MHI-5)[11] It is a five item screen developed in the Medical Outcomes Study. Three of these items are used as well in the SF-12 Health Survey, also derived from the Medical Outcomes Study and a very widely used instrument for the brief assessment of functional health status.[12] Thus we have two instruments to test: the five item MH5 screen and the mental health subscore from the SF-12 Health Survey (which uses other SF-12 Health Survey items as well as the three mental health specific items and is weighted based on population studies).

For each question of the five MH5 questions there are six possible responses ranging from “all of the time” to “none of the time.” The three questions included in the SF-12 Health Survey are italicized.

  1. How much of the time during the past 4 weeks…have you been a very nervous person?
  2. How much of the time during the past 4 weeks…have you felt so down in the dumps that nothing could cheer you up?
  3. How much of the time during the past 4 weeks…have you felt calm and peaceful?
  4. How much of the time during the past 4 weeks…did you have a lot of energy?
  5. How much of the time during the past 4 weeks…have you felt downhearted and blue?

Properties of the instruments

Choice of a “Gold Standard” for Mental Health. Because depression has been shown to be correlated with welfare status we chose as one “gold standard” the diagnosis of major depression (overall prevalence 28 percent) as generated by the Composite International Diagnostic Interview—a part of our research interview. A second gold standard was a very broad criterion of “any mental health diagnosis;” (overall prevalence of 39 percent). Two narrower standards were “two or more mental health disorders” (overall prevalence of 19 percent) and “has a mental health diagnosis and reports of ‘a lot’ of interference with daily activities” (overall prevalence 27 percent). In the recommendations section there is further consideration of an appropriate gold standard. At this stage we want to be sure we look at all the possible standards.