The HapMap reconsent form says “Because the database will be public, people who do identity testing, such as for paternity testing or law enforcement, may also use the samples, the database, and the HapMap, to do general research. However, it will be very hard for anyone to learn anything about you personally from any of this research because none of the samples, the database, or the HapMap will include your name or any other information that could identify you or your family.”
http://sev.prnewswire.com/null/20060208/NYW12408022006-1.html , NIH Seeks Input on Proposed Repository for Genetic Information , Diagnostic Multivariate Index Assays ].
At least 10 scenarios exist where ‘anonymous’ human subjects consent design can be compromised, each with precedents below.
(10) Re-identification after “de-identification” using other public data.
Given only redacted Group Insurance Commission list of birth date, gender, and zip code was sufficient to re-identify full medical records of Governor Weld & his family via voter-registration records.
[See Sweeney 1998 http://ncvhs.hhs.gov/980128tr.htm ]
“Drug Records, Confidential Data vulnerable via Harvard ID number & PharmaCare loophole”.
“A hacker gained access to confidential medical information at the University of Washington Medical Center, using the Internet to download thousands of files containing patient names, conditions, home addresses and Social Security numbers”
[See Harvard Crimson 2005 http://seclists.org/lists/isn/2005/Jan/0074.html
(8) Combination of surnames from genotype with geographical info
An anonymous sperm donor was traced on the internet 2005 by his 15 year old son who used his own Y chromosome genealogy to access surname relations.
[See http://www.newscientist.com/article.ns?id=mg18825244.200 ]
(7) Inferring phenotype from genotype
Markers for eye, skin, and hair color, height, weight, racial features, dysmorphologies, etc. are known and the list is undergoing rapid growth and refinement.
[See table below]
(6) Unexpected self-identification.
An example of this at Celera undermined confidence in the investigators.
[See Kennedy D. Science. 2002 297:1237. Not wicked, perhaps, but tacky.]
(5) A tiny amount of DNA data in the public domain with a name leverages the rest.
This would allow the vast amount of DNA data in the HapMap (or other study) to be identified. This can happen for example in court cases even if the suspect is acquitted. [See http://www.law.umkc.edu/faculty/projects/ftrials/Simpson/Dna.htm]
(4) Laptop theft.
26 million Veterans' medical records including SSN and disabilities stolen Jun 2006. Hewlett-Packard, Ford, Ameriprise, and Verizon.
(3) Unauthorized access to DNA bearing samples (e.g. hair, dandruff, hand-prints or lip-prints on glasses, etc).
(2) Government subpoena. False positive IDs can be very disruptive to the affected family.
(1) Identification by phenotype.
For example if CT or MR imaging data is part of a genetic study, although doesn’t look identifiable, it is becoming increasingly easy to reconstruct the appearance of a person based on such data. Even blood chemistry can be identifying in some cases.
[See Modeling Age, Obesity, and Ethnicity in a Computerized 3-D Facial Reconstruction ,
King Tut's New Face: Behind the Forensic Reconstruction
Walker helps on cold case and
Forensics experts recreate face from bone fragments ]
There are no doubt other scenarios. Any one of these could have psycho-social, health or economic impact on unprepared or unwilling human research subjects and/or their families. These scenarios also could cause significant loss of trust or public-relations backlash and a serious setback for NHGRI and the investigators involved. Even though scenario #5, may not sound like it would a high impact, it did cause a significant amount of alarm in ELSI, IRB, corporate and editorial circles. Variations on that theme could lead to identification of other members of an ‘anonymous’ pooled cohort. Discussing a plan for release of the identifying information in advance would have been preferable in that case (and probably is advisable in general).
For further discussion see the Personal Genome Project (PGP) editorial and web page.
Trait Genes Chromosome location
Hair/iris color ASIP 20 q11.2
Hair/iris color DCT 13 q32
Green/blue iris EYCL1 19 p13.1-q13.11
Brown/blue iris EYCL3 15 q11-q15 *
Height GH1 17 q22-q24
Height (Laron) GHR 5 p13-p12
Brown/blond hair HCL1 19 p13.1-q13.11
Brown/blond hair HCL3 15 q11-q15 *
Brown/red hair HCL2 4 q28-q31
Hair/iris color HPS1 10 q23.1-23.3
Hair/iris color HPS2 10 q24.32
Skin&hair color MC1R 16 q24.3
Height (Marfan) MFS 15 q21.1
Hair/iris color MITF 3 p12.3-14.1
Hair/iris color MYO5A 15 q21
Ocular albinism OA1 X p22.3
Ocular albinism OA2 X p11.4-p11.23
OcculoCut.Albinism OCA2 15 q11.2-q12 * R305W, R419Q blue to brown & green resp.
Hair/iris color PMOC 2 p23.3
Hair/iris color RAB27A 15 q15-21.1
Hair/iris color SILV 12 q13-q14
Skin color SLC24A5 15 q21.1 A111T dark to light skin
Short Stature SS X&Y p
Hair/iris color TYR 11 q14-q21
Hair/iris color TYRP1 9 p23
The human genome project sequence is largely from one man from
Out of ten volunteers in 1997, one male was “selected at random … Unfortunately, the attempt to prepare EBV-transformed cells for the RPCI-11 donor failed. As a consequence of the double-blind donor selection procedure, it was impossible to obtain a second sample from the same male donor for a second attempt to establish transformed cells.” See (Osoegawa et al 2001). Another donor identified himself in 2002.
Gene DNA (biallelic bp in bold in central codon) one RP11 allele
SLC24A5 atgttgcaggc Rca actttcatggcagcgg (R=g = darker skin)
OCA2_305 tccatcagcat cYg ggcctccctgcagcag (Y=c = bluer eyes)
OCA2_419 accggctctcc cRg ggacgggtgtgggcca (R=g = bluer eyes)
that the reference human genome represents only one of the two alleles in RP11
(above), but both alleles will be available for the HapMap individuals from