Process Used to Link Case Reports in Haiti

Haiti’s HIV/AIDS Surveillance System (HASS) has been collecting case reports on HIV diagnoses and HIV-related sentinel events (entry to care, CD4 values, ART prescription, HIV clinical stage, death) for more than four years. On average, some 25,000 case reports are received each year via paper-based reporting and automated EMR data extraction. Through the automated EMR data extraction process, data related to historical HIV cases and disease progression (defined sentinel events) was also reported. Haiti’s HASS database contains more than 250,000 case reports.

Haiti has a population of approximately 10 million people, and an estimated HIV prevalence of 2.1 percent. UNAIDS estimates that approximately 150,000 [130,000-160,000] people are living with HIV in Haiti as of 2012.

Because the HASS can receive multiple case reports for each person with HIV (e.g., duplicate case reports from the same clinic, multiple case reports from different clinics, or new records for the same patient due to a sentinel events), it is understood that the 250,000 case reports represent fewer people in actuality. In order to effectively review each case report and attribute it to a new or existing case of HIV, a process was designed to assess and link cases where indicated.

Preparatory Phase

Before the matching process begins, the case report data are verified and cleaned so that quality is assured. The data manager verifies the completeness of the submitted data variable-by-variable and contacts data managers to follow-up with problematic file submissions. Data cleaning includes removing special characters in names (e.g., dashes) and converting abbreviations such as “JN” to “JEAN. Year of birth is calculated when age is missing and vice-versa. Also, records with completely missing names are not permitted to match to avoid “false” matches. (Remember, the database interprets two blanks as the same name!)

Once the case data are of expected quality, they are entered into the database. The reported case then goes through three steps to assess for uniqueness.

Step 1: Deterministic Matching

Deterministic matching describes the use of variables reported via the case report form to determine with high certainty if multiple case reports represent the same patient. The HASS database is programmed to implement this step automatically, and uses the patient’s first name, last name, birth month, birth year, sex and mother’s name, as well as the first four letters of the first name, the reporting clinic, and the birth place (district) to identify patient records when they are imported into the central database. These variables were selected to screen cases for uniqueness via a process of expert review and pilot testing. In Haiti’s HASS, 50% of follow-up case reports are linked via this step; remaining cases are sent to Step 2.

Step 2: Probabilistic Matching

Probabilistic matching describes the use of case-surveillance variables to match multiple case reports with a higher, but acceptable, degree of uncertainty. This process can be completed using software such as the U.S. Centers for Disease Control and Prevention’s LinkPlus; however HASS uses human judgment as the probabilistic matching “engine”.


First, to find records that probably match, HASS scans for records with the same 1) psuedo-unique HIV case reporting code or 2) first and last name. This step provides the team with a large set of records that probably represent the same patient. Second, these records are displayed on the secure HASS website for evaluation by epidemiologic staff based in Haiti. Staff are asked to decide if records represent the same or different persons and they are provided with a larger set of variables available in the case-report form (e.g., telephone number, commune/department of residence, marital status, occupation, month and year of HIV diagnosis). In Haiti’s HASS, another 50% of follow-up case reports are linked via this step.

Expert Review and Validation

Even the best matching algorithm will contain errors and may not be able to predict or discern all linked cases. For this reason, it is important to use local experts to evaluate the variables likely to produce the best matches in a pilot phase, pre-implementation, as well as for routine quality assurance. In Haiti, three members of the case-based surveillance staff participate in Step 2. These staff are able to leverage their local knowledge to assist in the human-driven probable matching process. For example, they know the relative geographic distance between clinics, or the short form of a given name, which helps in “mental calculations” of the match probability. On average, the Haiti staff review 500 flagged case reports for manual case matching on a monthly basis, and use a “two-out-of-three” decision-making process.

Finally, it is valuable to periodically review the entire matching process via a small validation study. The case-based surveillance team in Haiti completed a validation study with a major clinical partner. We selected a set of records that were matched by the processes listed above and asked our clinical partner to review the cases (n=100) in their full electronic medical records. Overall, we found that their physician reviewer agreed with HASS’ deterministic matches in 99/100 cases and probabilistic matches 91/100 cases.