Online Supplemental File 1

Online supplemental file 1.

Section A: Points to consider when establishing a biologics register

Purpose of the biologics register

Observational drug registers have many potential benefits, including real-life assessment of drug effectiveness and safety, hence are appealing to investigators, the pharmaceutical industry and regulators. However, before establishing a drug register, investigators should be clear why they are setting up their register/study. They should only seek to address scientific questions that are achievable and practicable. Safety studies in particular require large populations, because the low background rate of many serious adverse events makes them prone to type II errors. Scientists should perform power calculations to estimate the sample sizes and duration of follow-up required to address their questions.

Example: “The BSRBR was designed to have sufficient power to detect a doubling in lymphoma risk between patients treated with a specific anti-TNF agent and those treated with standard therapy over five years of follow up. Such a design required the recruitment of 4000 patients in each cohort to be followed up for 5 years, allowing for drop outs and switches, say from non-biological to biological treatment. The aim therefore is to recruit 4000 patients for each of the anti-TNF agents based on first use of that drug, as well as 4000 comparison patients.” (1)

For very rare events of interest, such as site-specific malignancy, individual registers may lack statistical power to generate robust risk estimates. This may be addressed by conducting pooled analyses or combined nested case-control studies across registers.

Apart from issues related to size/duration/data collection, there are several reasons for considering a priori the purpose of establishing a register. Clearly stating a primary objective, be it broadly defined or not, serves to reduce post hoc use of register-data to test hypothesis for which it was never intended, designed, or suitable. Similarly, experience tells us that it is easier to enthusiastically launch a register than to maintain it. If all stake-holders (sponsors, data-collectors, clinical medicine and academia) share a full understanding of the purpose of establishing a register, chances for uniform, diligent and sustained data collection will increase, and the risk of collection of data elements that will never come in use but only serve to slow down registration will be reduced.

Population to be targeted

“Incident” users are defined and registered because of initiation of a treatment, whereas “prevalent” users are defined (and entered into the register) on the basis of ongoing treatment. By definition, “prevalent” users have survived on their drug until the time point of registration, and are depleted of individuals who have developed adverse events or stopped treatment for other reasons (2). Thus, safety studies based on “prevalent” users run the risk of overlooking safety signals, especially in the short term. If “prevalent” users are allowed into the register, their true time point of treatment start should also be recorded.

It also needs be made clear whether the biologics register is intended as a register of specific drugs (drug X, Y, and Z), of all biologics, or of all drugs in one class of biologics (e.g., TNF alpha inhibitors).

The choice of comparator will affect the exact research safety hypothesis posed in a subtle yet important manner. Comparing patients who start treatment A to either patients who start an alternative treatment, or to patients who are stable on another treatment or no treatment, will provide different estimates of risk (see example, OSF2). Comparing risks before vs after treatment allows for intra-individual contrasts, but is generally not to be recommended if pre-treatment experiences may influence the decision to start treatment.

Data items to be collected – treatment and treated condition

It is important that there is no doubt about the intended way to report/register the data. For each data item, a definition is thus needed. For instance, what is the difference between a temporary treatment “suspension” and a treatment “stop”? Conversely, the level of detail in reporting of drug exposure needs be clear for the data collector. With respect to treatment discontinuation, it is advisable to record the stop reason. Whereas inefficacy, adverse events, and remission are obvious, the increasing treatment options call for additional reasons such as “partial response”. Irrespective of the drug discontinuation reason, investigators should carefully consider collecting outcome data beyond the stop date (see below).

If only a selection of the patient's entire list of medications is recorded (e.g., only the medications targeting the rheumatic disease), a definition of the drugs that constitute such a therapy may be necessary. If realistic, treatment data should also be ascertained for drugs that might confound the relation between the drugs of primary interest and the outcome. For example, when investigating the risk of congestive heart failure (CHF), it is not only important to know the exposure dates for the index drug but also the start and stop dates as well as dosages for glucocorticoids since this concomitant treatment might influence the risk of CHF. Data on concomitant medications may also be useful for identification of adverse events or changing co-morbidities during follow-up e.g. the initiation of insulin for diabetes.

With respect to the treated condition, diagnosis, disease duration, and phenotype data as available (e.g., rheumatoid factor status, erosive disease) will be important not only for the identification of predictors of risk, but also for the assessment of channelling bias. The same argument applies to co-morbidities. Again, however, a transparent definition of, e.g., “cardiovascular disease” is needed in order to avoid lumping uncomplicated hypertension together with serious ischemic heart disease.Non-clinical data items such as educational level, or socio-economic status, may also provide useful information, particularly in setting where there is not universal subsidised health care. Since response to prior treatments is often a predictor of the outcome of future treatments, the treatment history (for the disorder under observation) may be an important source for information to determine channelling bias.

Data items to be collected – outcomes

With respect to safety, results arising from a register are based on the incidence of events during the follow-up period. In RCTs, primary outcomes are tightly defined using either an internationally accepted definition, such as a EULAR DAS28 response, or an explicit definition of what is meant by e.g., "gastro-intestinal bleeding". The same rule should apply to biologics registers in order to ensure a true increased risk is not drowned by masses of less clearly specified events, for which no association with treatment exists.

Ideally, a clear-cut definition of each outcome should be available for the individual responsible for the data entry into the register (e.g., the clinician). Methods to obtain this could, for example, include pop-up windows describing the definition necessary to qualify for the outcome in question, if using a computerised data system. Experience tells us that comprehensive information on a specific safety outcome is often difficult to get from a single care-provider’s report. Instead, register-holders need be prepared to man a study secretariat or similar body that can request additional information for events of interest. For example, if an incident multiple sclerosis case is reported, the information should be gathered whether the diagnosis was made by a neurologist and which methods of ascertainment (such as MRI) were applied. This will enable identification of “definite”, or verified, cases.

Certain adverse drug reactions have an induction and/or a latency period, meaning clinical presentation can post-date drug initiation, or even drug discontinuation, by many months. Investigators should thus consider collection of adverse event information beyond the drug stop date. This is particularly important if the adverse event of interest is an outcome,such as malignancy, known to have an induction and latency period. In these cases, selection of an appropriate risk attribution model will also be necessary (see OSF2).

A particular challenge for biologics registers of long-term safety is to secure that reporting rates, which may be initially high, do not decline as time goes on. With increasing register duration, reporters may lose interest in reporting, and the connection between a safety event and a particular therapy (that may even be discontinued years before the event) may not be obvious. Methods of outcome collection that are independent of the treating physician/care provider/reporter to the register, and of treatment status, such as death notifications from national statistics, are superior in this regard, but may not exist for all events of interest. Therefore, an important task for holders of biologics registers is to maintain reporting, not just of new starts, but of increasing follow-up data on those already registered. Whether this is achieved through reimbursement, useful feedback data, or by other means may differ between registers.

Often, safety needs be viewed in light of effectiveness. Therefore, it is advisable to consider collecting core data that can be used to assess effectiveness. Any such measurement in observational studies requires the same outcome data that are generally agreed for the measurement of efficacy in clinical trials (e.g. by OMERACT). For instance, in the case of RA,this would imply variables required for calculation of DAS28 or ACR responses, such as swollen and painful joints, acute phase reactants, as well as patient-derived data on pain, functional status and general assessment of health status. In order to observe more specific outcomes, it can be advisable to collect data on fatigue, psychological state, or self-assessed level of disease activity, again using robust and uniform methods of outcome assessment(3). To avoid overloading reporters, collection of more detailed information could be restricted to subsets of patients in the register, as defined by geography, treatment, or other factors.

Follow-up

In routine care, it may be difficult to arrange follow-up visits on the exact pre-defined dates (e.g., a six-month follow-up visit). Biologics registers may therefore need to employ a system whereby each visit is recorded at its true occurrence (e.g., the actual date) but also tagged to the nearest eligible follow-up time point. In the event that patients have multiple visits in such a time-window (e.g., any visit between 60 and 120 days will be counted as a 90-day visit), there must be a rule to define which of these visits should have priority. Of course, this situation would not occur in registers based on, say, mailed questionnaires to patients/physicians. In that event, however, it is important that the date for the filling out of these questionnaires is specified.

As follow-up increases, patients in the register die, move, or become lost to follow-up for other reasons. It is important to keep these losses to a minimum, and to define, on an individual basis, why a follow-up was censored (death/emigration/migration/patient decision/other, ...).

Finally, it should be acknowledged that biologics registers may employ different “time units” depending on the way data at each follow-up visit is recorded. Either, all exact dates of events, treatment episodes and so forth are recorded (as has been advocated above). Alternatively, status at each follow-up is recorded irrespective of the exact date of occurrence. In the latter case, time can only be analysed in units as broad as the time interval between two visits. The benefit of this method is that it allows for a quicker data collection as only “snap shots” are taken at every follow-up time-point. The drawback is that it may allow for substantial imprecision and misclassification of true exposure/follow-up time.

Data collection process and data collectors

The choice of data collectors will affect the cost and quality of the data in the register. Whereas external study nurses may provide highly accurate and complete information, costs may not be negligible, especially for multi-site biologics-registers of long-term safety. The use of clinic-nurses or other care-providers may also allow for great detail, perhaps at a lower cost. Rheumatologists who report themselves will never be able to incorporate extensive forms into clinical practice, but are on the other hand are often in a better position to report on treatment changes and medical events. Sometimes, splitting the reporting between multiple parties may be advisable, although such division will only be successful if the responsibility of each reporting party is clearly defined, and if administration does not increase (as would be the case for mailed questionnaires).

Systems for electronic data entry are advancing. They promise multiple potential advantages including minimising missing data by refusing advancement to subsequent screens until data are complete; limiting impossible data by setting validation rules; and removing one step of data entry, thereby reducing transcription errors. However, investigators should ensure that they work with experienced IT groups, and that their systems comply with standards for information governance (safety of identifiable patient-level data) (see below). In order for systems to be successful, they must be easy to use, performance speedsmust be sufficient to keep pace with clinical activity, and must be reliable. The system should support any emerging national standards for the usability, consistency and integration of the systems used to capture study data.

Data handling and storage; ethical and legal considerations

Successful scientific studies require a pre-defined architecture where all parties are aware of the inter-relationships between the various stakeholders. Although provided from multiple sources, ownership of the collated data must be clearly described. Examples of ownership include ‘decentralised’ whereby each reporting site/doctor owns the data corresponding to his/her patient, ownership placed in national rheumatology societies, in independent academic groups, and also where external sponsors (e.g., pharmaceutical companies) are de facto owners of the data held by a register. National legislation may, in this regard, put a limit to who can be a data owner. The database may be anchored in a scientific society or in a group of scientists. Access to raw and summary data for researchers, investors and participating physicians must be carefully described, and can be tiered. A database review board may act as a ‘gatekeeper’ for data access.Data changes should be allowed, but also systematically tracked.

Management of large registers is time-consuming and costly, and therefore requires significant financial support. Several of the existing RA biologic registers are funded indirectly from the pharmaceutical industry. Contracts should be drawn up between the sponsors and the register holder(s), carefully detailing the rights and responsibilities of both parties. The relationship should be established in a way that maintains the academic independence of the investigators. In exchange for funding, the sponsors will likely request provision of data. Investigators should carefully detail the conditions for, and extent of, access to data or reports. This may be at set intervals, or on a more ad hoc basis. Sponsors may also request sight of manuscripts or slides prior to public presentation. The route for feedback to the investigators should be well-defined and may be via a third-party such as a steering committee.

Patients and clinicians provide data because they believe the study has a clear purpose and potential benefit. It is therefore important to outline the two-way relationship between those providing and those analysing data. Results may be fed back to patients and clinicians in a variety of ways, including newsletters, websites, presentations and scientific publications.

Collection of patient-identifiable information carries with it a responsibility to ensure that the data are processed securely and with proper regard for its confidentiality, integrity and availability. In daily work, access rights to patient level data should be strictly controlled.Researchers should be aware of national standards and guidelines for information governance. Collection of drug-related adverse event data is often regulated by national regulatory guidelines, such as Good Clinical Practice (GCP) and the EMEA guidelines on pharmacovigilance for medicinal products for human use. Investigators should be aware of their local guidelines to ensure they are compliant within their legal framework, rules for informed consent et cetera.

Systems with procedures that assure the quality of every aspect of the study should be implemented, such as audits of data quality, identification and scrutiny of ‘unusual’ outlying values, and methods for limiting missing data. It is of equal importance to devote sufficient resources to assess the coverage of the biologics register, inclusion rates, and time trends in these. In the event of multi-site biologics registers, some sites/regions may perform better than others. In this case, it is important to have a strategy either for exit of sites that do not perform well, or for allocation of additional resources to such sites. Restricted collection of data at some sites may be one method of keeping a high coverage.