Safety/Efficacy Data Masked and De-identified in FDA Proposal

spongescrub.pngOn June 4, FDA announced in the Federal Register that they were considering release of de-identified data sets in an effort to provide greater access to clinical and preclinical study data for researchers. This collaboration with the National Institutes of Health (“NIH”) would presumably work like other public databases NIH has supported over the years to provide access to medical data for research purposes. As a researcher, I (Eric) used NIH databases like the Lung Image Database Consortium (“LIDC”) and Reference Image Database to Evaluate Response (“RIDER”) sets of Lung computer tomography (“CT”) scans to gather anatomic information about the potential patient population to make product design choices. FDA’s proposal offers both potential benefits and harms to consider.

FDA recognizes that clinical study data sets are underutilized for the scientific understanding that they can provide. As a rule, preclinical and clinical studies are very expensive propositions that produce high quality data but receive limited publication, granting few in the research community with access to the raw data. This is understandable, in part, because sponsors wish to both protect their clinical data from competitors who may copy or undermine it, as well for protection of their intellectual property. This limited access prevents that data from being included in certain meta analyses studies. Meta analysis provides the numbers needed to identify lower frequency events and weaker correlations in the data. If a correlation does not reach the widely recognized p-value of 0.05 or lower, the scientific community is hard pressed to consider it a proven hypothesis. Meta analysis combines several studies and can produce “significant” results based on the increased size of the dataset. FDA has access to the raw data from clinical studies submitted with new product applications as well as other databases, however, it does not have the resources or mandate to perform this type of research.

FDA notes that there are potential hazards to this disclosure. FDA’s call for comments states that FDA is not a covered entity for the purposes of the Health Insurance Portability and Accountability Act of 1996 (“HIPAA”). However, patients who have signed up for clinical trials generally have agreed to have their information used for certain purposes, often including publication. For the most part, patients understand that such data may be reported in the aggregate.

As anyone who has tried to extract the values from a chart in a published article knows, there is a difference between a picture and the raw numbers that underlie it. FDA argues that “the contribution of patients who participate in clinical trials should be maximized for the benefit of society.” However, there are potential additional privacy costs to patients regarding their medical information. Privacy challenges may stem from a variety of different sources, including: 1) the evolving technology that may allow reidentification from previously disparate, non-identifying sources and 2) the impact on this information on related persons who have not consented for the release of their information.

For instance, current pattern-matching technology has severe limitations on the ability to recognize faces in photographs. Thus, broad distribution of photographs of people may not compromise their identities, because the search cost to re-identify the data does not justify the expenditure of time or effort. However, if the technology for automated facial recognition becomes available, the cost for the reidentification drops and at some point may make it cost effective. The release of information in a format that is not currently reindentifiable does not preclude reidentification in the future, even the near future. With the development of large commercial databases, so called “Big Data”, the potential to reidentify patients becomes increasing viable.

A typical chest CT, for example, will include physiological information that will allow the estimation of height, weight, sex, and age of the patient (like a prenatal ultrasound). Such estimates are not currently automated but could reasonably be automated in the future. Identification of the scanner and the scan date will further limit the number of potential individuals. If the facility is a cancer facility then a search through the commercial databases may find the names of persons with those characteristics who rented a hotel or made credit card purchases for parking nearby only around the period in question. If the person has put that they had cancer in a blog or Facebook page, then that could be correlated with any diagnosis from the CT. This reidentification can be achieved even if all textual patient information is scrubbed from the record. The very power of data correlation that allows extraction of useful medical hypothesis from these data sets, therefore, may also drive the destruction of privacy.

Informed consent is a basic principle of ethical, modern, medical research. However, as data accumulates, the information begins to read not just on the person who provided the consent but on their children, siblings, and parents. Someone with late stage cancer may have very different privacy concerns about the use of the data than that person’s grandchild, who bears a gene putting them at heightened risk for the same disease. The Genetic Information Non-discrimination Act (“GINA”), for example, provides some protections for personal information, and the harms to date have been more theoretical than practical.

Given the potential risks, additional precautions may be proposed to help further protect patient-identifiable information. For example, FDA could impose a registration requirement to provide more limited and identifiable access and use of the datasets. Such registration requirement would seem unlikely to deter legitimate researchers in academia or industry, who would have little or no interest to reidentify patients and would view such processes as an ethical violation. FDA has asked for comments to be submitted by August 5, 2013, which may propose other safeguards while permitting more disclosure of clinical data to help further medical research.