Abstract
Significant associations have been found between specific human leukocyte antigen (HLA) alleles and organ transplant rejection, autoimmune disease development, and the response to infection. Traditional searches for disease associations have conventionally measured risk associated with the presence of individual HLA alleles. However, given the high level of HLA polymorphism, the pattern of amino acid variability, and the fact that most of the HLA variation occurs at functionally important sites, it may be that a combination of variable amino acid sites shared by several alleles (shared epitopes) are better descriptors of the actual causative genetic variants. Here we describe a novel approach to genetic association analysis in which genes/proteins are broken down into smaller sequence features and then variant types defined for each feature, allowing for independent analysis of disease association with each sequence feature variant type. We have used this approach to analyze a cohort of systemic sclerosis patients and show that a sequence feature composed of specific amino acid residues in peptide binding pockets 4 and 7 of HLA-DRB1 explains much of the molecular determinant of risk for systemic sclerosis.