Genetics in sarcoidosis

Purpose of review Epidemiological and clinical observations as well as familial clustering support the existence of a genetic predisposition to sarcoidosis. In this article, we review the most recent findings in genetics of sarcoidosis and discuss how the identification of risk alleles may help advancing our understanding of disease etiology and development. Recent findings Genetic studies of sarcoidosis phenotypes have identified novel and ancestry-specific associations. Gene-environment interaction studies highlighted the importance of integrating genetic information when assessing the relationship between sarcoidosis and environmental exposures. A case-control-family study revealed that the heritability of sarcoidosis is only 49%, suggesting the existence of additional important contributors to disease risk. The application of whole-exome sequencing has identified associations with disease activity and prognosis. Finally, gene expression studies of circulating immune cells have identified shared and unique pathways between sarcoidosis and other granulomatous diseases. Summary Sarcoidosis genetic research has led to the identification of a number of associations with both sarcoidoses per se and disease phenotypes. Newer sequencing technologies are likely to increase the number of genetic variants associated with sarcoidosis. However, studying phenotypically and ethnically homogeneous patient subsets remains critically important regardless of the genetic approach used.


INTRODUCTION
Sarcoidosis is a multisystem disease of unknown etiology characterized by the presence of noncaseating epithelioid granulomas along with the accumulation of T-lymphocytes and macrophages in affected organs [1 && ,2]. Clinical manifestations and disease behavior are highly variable and unpredictable with some individuals having an acute disease, which often remits, and others experiencing a progressive course and life-threatening manifestations [3]. Due to its protean clinical presentation and outcomes, many believe that sarcoidosis is a number of 'syndromes', each possibly due to different exposures or etiologies.
Familial clustering of disease, variations in susceptibility to sarcoidosis in different ethnic groups, high sibling relative risk, and data from candidate gene studies and genome-wide association studies (GWAS) suggest the existence of a genetic predisposition to sarcoidosis and some of its clinical phenotypes [4]. Many of the factors pivotal to the immunopathogenesis of sarcoidosis, such as antigen presentation and granuloma formation, have been located in the human leukocyte antigen (HLA) Class II genes, although HLA Class I and Class III alleles as well as a number of non-HLA genes have also been associated with disease risk, phenotype and outcome [5]. However, the studies conducted to date have provided varying, and often conflicting, results, mainly because of the highly variable linkage disequilibrium (LD) (i.e., the nonrandom association of alleles at different loci) in the HLA region, which makes it difficult to pinpoint risk (or protective) loci, and the existence of racial-, ethnicand phenotype-specific genetic associations.
In this review, we summarize the most relevant genetic studies of sarcoidosis published in the last 18-24 months (Table 1) and discuss how advanced sequencing techniques may improve our knowledge and understanding of the genetics of this fascinating disease.

GENETIC ASSOCIATIONS WITH SPECIFIC DISEASE PHENOTYPES
Two studies have evaluated the role of genetic factors in the predisposition of sarcoidosis patients to develop ocular disease. Garman et al. performed the first GWAS in an ethnically diverse cohort of African-American (AA) and European-American (EA) patients with ocular sarcoidosis (OS) and healthy controls (n ¼ 260 and n ¼ 35 vs. n ¼ 1551 and n ¼ 2046, respectively) [7]. The authors confirmed previously reported associations between OS and HLA-DRB1Ã04:01 in EA -an association that was found to be ancestry-specific -and between sarcoidosis as a whole and HLA-DRB1Ã03:01, HLA-DRB1Ã11:01, and HLA-DRB1Ã12:01 alleles in AA. Eight non-HLA variants were also associated with OS in AA, including seven within MAGI1 (membrane-associated guanylate kinase WW and PDZ domain-containing protein 1), a scaffolding protein that is expressed in many tissues, including the lens and retina, where it ensures integrity of the epithelial barrier [14]. Notably, while none of the MAGI1 OS-specific single nucleotide polymorphisms (SNPs) achieved statistical significance (i.e., P < 5 Â 10 À8 ) in EA cases, their ORs in EA patients were similar to those in AA patients, suggesting the lack of significance was potentially due to the small sample size and lack of power.
The mutant (A) rs2076530 allele within Butyrophilin-like protein 2 (BTNL2), a member of the immunoglobulin superfamily involved in T cell activation [15], has consistently been associated with susceptibility to sarcoidosis [16][17][18]. A recent study compared the genotype frequencies of BTNL2 rs2076530 in patients with sarcoid uveitis (n ¼ 135), sarcoidosis without uveitis (n ¼ 196), uveitis without sarcoidosis (n ¼ 81) and healthy controls (n ¼ 271) [11]. As expected, carriage of the rs2076530 A allele was associated with increased risk of sarcoidosisspecifically, among patients without uveitis, carriage of the AA or AG genotype increased the risk of developing sarcoidosis by 5.06-fold and 2.63-fold, respectively, compared to carriage of the GG genotype. Conversely, rs2076530 A was not associated with increased susceptibility to sarcoid uveitis, suggesting that sarcoidosis with and without uveitis may represent genetically distinct disease subsets.

GENE-ENVIRONMENT INTERACTIONS
Epidemiological studies suggest that sarcoidosis results from a complex interplay between genetic and environmental factors, including smoking [4], although its role in disease development remains controversial. Indeed, while some studies have reported a decreased risk of developing sarcoidosis in smokers compared to nonsmokers [19,20] -an effect attributed to the ability of nicotine to suppress T-lymphocyte and macrophage phagocytic activity [21] -the negative correlation between sarcoidosis and smoking does not necessarily imply causation. In addition, a case-control study in India found no association [22]. To clarify the effect of smoking in sarcoidosis, Rivera et al. performed a gene-environment interaction study in 747 Swedish sarcoidosis cases (292 with Löfgren's syndrome [LS] and 455 without LS) and 2,966 healthy controls [23 & ]. Cases and controls were classified based on their smoking status and genotyped using the Immunochip platform. Assessment of smoking effects with inclusion of genetic information revealed 53 SNP-smoking additive interactions in LS and 34 in non-LS patients, with an increased disease risk of 56% and 62% in LS and non-LS patients, respectively. This study suggests that the effect of smoking as a risk factor for sarcoidosis is modulated by carriage of

KEY POINTS
Sarcoidosis is believed to result from a complex interplay between genetic and environmental factors. Genetics is also likely to contribute to the wide variety of clinical manifestations and prognosis observed in sarcoidosis.
Smoking has been associated with a reduced risk of developing sarcoidosis, but not all studies have supported this association. Gene-environment interaction studies suggest that smoking modulates sarcoidosis risk mainly in the presence of certain genetic variants.
Gene expression studies of circulating immune cells and micro-dissected granulomas within lung and mediastinal lymph nodes have identified shared and unique pathways between sarcoidosis and other granulomatous diseases.
Newer sequencing technologies are likely to increase the number of genetic variants associated with sarcoidosis. However, studying phenotypically and ethnically homogeneous patient subsets remains critically important regardless of the genetic approach used. certain genetic variants, and highlights the importance of integrating genetic information when assessing the relationship between sarcoidosis and environmental exposures.
Multiple environmental and occupational exposures have been associated with disease development [19]. Following the World Trade Center (WTC) attack, the incidence of sarcoidosis has increased significantly among firefighters of the New York City Fire Department [24]. Cleven et al. performed a case-control study that explored the genetic variations between WTC-exposed fire-fighters who developed sarcoidosis (n ¼ 55) and firefighters with similar demographics, smoking history and levels of exposure to WTC-associated dust who did not (n ¼ 100) [6 & ]. The authors sequenced at high density the enhancer/promoter, exonic and 5 0 untranslated regions of 51 candidate genes related to immune response, inflammation and granuloma formation and identified 17 variants associated with sarcoidosis, all within chromosomes 1 and 6. As expected, many of the associations were within or in close proximity to HLA genes and had been previously reported in sporadic sarcoidosis cases without known environmental exposures [25]. However, several novel variants associated with both sarcoidoses as a whole and extrathoracic disease were also identified. Interestingly, no interaction was found between the associated variants and the degree of WTC exposure, measured as arrival HLA-DRB1Ã0301 Presentation of antigens to elicit T cell response Susceptibility to sarcoidosis [7] HLA-DRB1Ã0401 Presentation of antigens to elicit T cell response Susceptibility to ocular sarcoidosis [7] HLA-DRB1Ã1101 Presentation of antigens to elicit T cell response Susceptibility to sarcoidosis [7] HLA-DRB1Ã1201 Presentation of antigens to elicit T cell response Susceptibility to sarcoidosis [7] HLA-DRB1Ã14
time to the WTC site, suggesting the following hypotheses: it is the exposure to dust particle rather than its amount that triggers the disease in genetically predisposed individuals, 2) the exposures were so high regardless of arrival time that a difference could not be ascertained, or 3) that these exposure measures were not able to differentiate the actual exposures relevant to disease pathogenesis.

ASSOCIATIONS WITHIN THE HLA REGION
A number of studies have confirmed the association between HLA alleles and sarcoidosis, suggesting that antigen-presenting molecules and immune mediators play a key role in disease pathogenesis. Wolin et al. used 89 tag SNPs to cover the HLA Class III region (i.e., HLA-DR Alpha (HLA-DRA), lymphotoxin alpha, tumor necrosis factor TNF, AGER (receptor for advanced glycation endproducts (AGER) and BTNL2) and look for associations with sarcoidosis susceptibility and disease subsets (LS, non-LS and disease behavior) in four European populations of patients (n ¼ 805) and controls (n ¼ 870) [26]. The discovery cohort consisted of Finnish patients and controls, whereas the Swedish, Dutch, and Czech datasets served as replication cohort. The associations found in the discovery and replication samples were verified in the joint analysis of the four study populations. Overall, 7 SNPs associated with non-LS, 8 with LS, and 5 with disease course, with the strongest associations being observed with rs3177928 (located downstream of HLA-DRA), rs3129843 (located between BTNL2 and HLA-DR4), and rs3129843, respectively. Owing to the high and variable LD existing in the HLA region, the authors sought to determine whether the observed associations were secondary to HLA-DRB1. When these associations and the HLA-DRB1 alleles were analyzed together, four variants located in the HLA-DRA/BTNL2 region were found to be independently associated, namely rs3135365 and rs3177928 with non-LS, rs6937545 with LS, and rs5007259 with disease activity. Notably, these four SNPs act as expression quantitative trait loci (eQTLs) for HLA-DRB1 and/ or HLA-DRB5, suggesting a role in the regulation of gene expression and potential functionality of these variants.

ASSOCIATIONS OUTSIDE THE HLA REGION
To date, consistent associations with variants outside the HLA region have been limited, likely due to the different phenotypes and ethnicity of populations studied. Meguro et al. recently performed a GWAS in a Japanese cohort of 700 sarcoidosis cases and 886 controls with replication in independent samples from Japan (931 cases and 1,042 controls) and Czech Republic (265 cases and 264 controls) [12]. Three loci outside the HLA region (CCL24 and STYXL1-SRRM3, which were novel and C1orf141-IL23R, which had been previously described) were associated with sarcoidosis. Notably, the disease-risk alleles in CCL24 and IL23R were associated with reduced CCL24 and IL23R expression, whereas the disease-risk allele in STYXL1-SRRM3 was associated with increased expression of POR (cytochrome p450 oxidoreductase) in many of the organs affected by sarcoidosis, including the lungs, skin, spleen, heart, and liver. POR is a flavoprotein with direct role in steroidogenesis. Enhanced endogenous steroidogenesis may induce immunosuppression thus diminishing host defense against pathogens [27,28], which are arguably among the most plausible triggers of sarcoidosis. Interestingly, the STYXL1-SRRM3 risk allele was also associated with LS and radiographic stages 0-II, which have generally a good prognosis and may even resolve spontaneously [29,30]. In this regard, increased levels of endogenous steroid hormones may contribute to the improvement of sarcoidosis that is observed during pregnancy (and is frequently lost after delivery) [31]. Therefore, the STYXL1-SRRM3T risk allele might have a dual effect, i.e., confer risk to disease development and contribute to spontaneous sarcoidosis remission by increasing endogenous steroid levels.

WHOLE-EXOME SEQUENCING
GWAS are hypothesis-free methods for identifying associations between genetic regions (loci) and phenotypic traits. However, because of the limited resolution of microarray-based genotyping platforms, the vast majority of the human genome is not genotyped directly in GWAS. In addition, in order to achieve an adequate statistical power, GWAS require large sample sizes that are difficult to collect in uncommon diseases like sarcoidosis. Exomes encompass about 1-2% of the human genome but harbor about 85% of all disease-causing variants [32], making relatively small datasets sufficient to identify novel disease-associated genes. Lahtela et al. [9 & ] performed a whole-exome sequencing (WES) in Finnish patients with sarcoidosis (n ¼ 72) with the aim of identifying genetic markers that could predict disease activity (i.e. resolving within 2years vs. persistent disease). Associations with resolving disease were found with AADACL3 and C1orf158 genes on chromosome 1p36.21, which has recently been associated with familial sarcoidosis in German [33] but not in French patients [34], and with LILRB4 on chromosome 19q13.42. Of note, the associations with AADACL3 and C1orf158 were independent of HLA-DRB1Ã03:01 and DRB1Ã04:01, two markers of good prognosis [5], although their functional relevance, if any, remains to be elucidated. LILRB4 encodes an inhibitory receptor that interacts with the nonclassical HLA class I antigen HLA-G and limits the activation of dendritic cells [35], and is, therefore, a plausible contributor to resolving disease.

FAMILIAL SARCOIDOSIS
Familial sarcoidosis, defined as the presence or history of sarcoidosis in one or more family members of index patients with the disease, has been reported for decades with rates ranging between 1% and 19% [36]. Calender et al. genotyped by WES 14 affected individuals from a cohort of five families with non-Lofgren's sarcoidosis as well as eight family matched nonaffected first-degree relatives [34]. Exome sequencing and pathogenicity network analysis identified mammalian target of rapamycin (mTOR) signaling and autophagy as potential contributor to the pathogenesis of familial sarcoidosis. Notably, mTORC1 signaling has been implicated in a mouse model of sarcoidosis [37] and has been identified in RNA-seq gene set enrichment data from a patient with cutaneous sarcoidosis who was successfully treated with the mTOR inhibitor rapamycin, suggesting this pathway as a therapeutic target [38]. These results overlap only partially with those from a WES study of 22 sarcoidosis cases from six German families by Kishore et al. [33]. In this study, the identified variants were prioritized using linkage and high-penetrance approaches and filtered to identify novel and rare variants. Further selection based on functional properties and validation resulted in a panel of 40 functional mutations that suggest regulation of immune/inflammatory response, leukocyte proliferation, and response to pathogens as the major mechanisms implicated in the pathogenesis of familiar sarcoidosis. Phenotypic complexity and locus heterogeneity may account for the inconsistent results between these two studies.
Rossides et al. estimated the familial aggregation and heritability of sarcoidosis using a case-controlfamily study design and population-based Swedish registers [39]. They found that having at least one first-degree relative with sarcoidosis increased the risk of developing sarcoidosis by 3.7-fold, with the relative risk increasing further in those with two or more relatives (relative risk 4.7) and in Lofgren's syndrome (relative risk 4.1). The heritability of the disease, however, was only 39%, suggesting that factors other than genetics, such as exposure/s, are important contributors to sarcoidosis risk.

THE FUTURE OF GENETIC RESEARCH IN SARCOIDOSIS
GWAS have provided insights into the mechanisms underlying a number of complex diseases, including sarcoidosis, through the identification of loci associated with disease risk (or protection) [40]. However, this study design remains controversial for several reasons including, among others, the modest fraction of the estimated heritability explained by the variants identified in GWAS for most complex traits; the difficulty to pinpoint causal variants and genes (i.e. GWAS identify loci not genes); the difficulty to determine the functional implications of the observed associations, the vast majority of which map to noncoding regions of the genome; the inability to detect rare and ultra-rare variants associated with the disease; and the difficulty to identify gene-gene interactions (i.e., epistasis), owing primarily to lack of statistical power and methodological challenges [41 && ]. Many of these limitations might potentially be resolved with a shift from the use of SNP array to WGS.
The term 'epigenetics' refers to any process that alters gene activity without changing the DNA sequence. Epigenetic processes are believed to be implicated in the pathogenesis of sarcoidosis by altering the response to environmental antigen/s, although only one epigenomics study has been performed in this disease to date [42 & ]. Moreover, as compared to chronic beryllium disease (CBD), a granulomatous disease with a well-established genetic background, DNA methylation changes in sarcoidosis are subtle and variable, possibly due to heterogeneity in disease manifestations and/or inciting antigen/s [42 & ]. Direct genomic interrogation of sarcoid tissues, may lead to the identification of dysregulated gene pathways or biomarker signatures. A recent study that compared gene expression profiles of sarcoidosis granulomas with those of pathogen-specific granulomas (coccidioidomycosis and tuberculosis) identified a number of sarcoidosisspecific immunological signaling related-pathways, thus providing important information for the future development of genomic-derived biomarkers for disease diagnosis and prognosis [43].
Circulating immune cells might reflect gene expression of sarcoid granulomas [44,45]. However, gene expression studies using heterogeneous cell mixtures and thus diluted cell-specific transcriptional signatures are unable to detect immune cell-specific pathways that are dysregulated in the periphery of sarcoidosis patients. In the first singlecell RNA-sequencing (scRNA-seq) study using peripheral immune cells from 35 patients and 13 controls, Garman et al. demonstrated that immune dysregulation of sarcoidosis involves persistent hyperactivation of innate and adaptive immunity via classical monocytes and CD4 na€ ıve T cells, regulatory T cell dysfunction, and effector T cell anergy [46]. More recently, Liao et al. used scRNA-seq focusing on macrophages to explore the shared and unique pathways between sarcoidosis and CBD [47 & ]. Pathways analysis revealed significant commonality between sarcoidosis and CBD with regard to a novel network centered on AP-1 complexes, a transcription factor that regulates a number of cellular processes, including proliferation, differentiation and apoptosis [48] as well as pathways related to the immune system, including Immune Response Antigen Presentation by MHC class II. However, some pathways were unique to sarcoidosis phenotypes (progressive [SarcP] vs. remitting [SarcR] disease) and CBD. For instance, the MAN receptor was downregulated in SarcR and CBD, CDC42, a GTPase related to micropinocytosis, was upregulated in SarcR, whereas MHC Class II beta chain expression was increased in SarcP compared to SarcR. Differential expression of these pathways, which are involved in the regulation of CD4þ T cells and stimulation of the immunogenic response, may contribute to the variable disease course and outcomes in sarcoidosis and CBD.

CONCLUSION
In recent years, genetic studies of complex diseases have evolved from GWAS to next-generation sequencing, genome-wide WES, transcriptomics, single-cell RNA sequencing, and whole-genome, but these new technologies have rarely been used in sarcoidosis. Though only partially understood, the genetic susceptibility to sarcoidosis is complex and multifactorial, with various genetic architectures and exposures/triggers likely to be involved in disease pathogenesis. Studying phenotypically and ethnically homogeneous patient subsets is critically important regardless of the genetic approach used. The observed associations may then be searched for in other phenotypic and ethnic groups to demonstrate if there are shared or distinct genes implicated in disease and phenotype risk and pathogenesis.