In silico analysis of hepatitis C virus: development of a novel fusion process hypothesis and study of drug resistance

Piano, M. A.

Worldwide between 200 - 300 million people are chronically infected with the Hepatitis C Virus (HCV). For up to 20% of infected patients, chronicity can lead to cirrhosis and hepato-cellular carcinomas. HCV is a member of the Flaviviridae family, such as Dengue virus (DENV) and West Nile Virus (WNV), which has been classified into its own, Hepacivirus genus due to major differences in genomic organization and amino acid sequences. The HCV genome is a positive-strand RNA of 9.6 kb encoding a polyprotein that is post-translationally processed into structural (Core, E1, E2 and p7) and non-structural (NS2, NS3, NS4A, NS4B, NS5A and NS5B) proteins. In the present work, a variety of computational methods and approaches are applied to investigate HCV proteins involved both in the fusion process mechanism (E1 and E2 envelope glycoproteins) and drug resistance (NS3 protease). The E1/E2 glycoprotein complex represents the surface of the virus which is largely responsible for virus antigenicity and is involved in important viral processes including virus attachment and cell-entry. The NS3/4A protease is responsible for several important biological functions in the HCV life cycle, including polyprotein cleavage, viral replication and inhibition of the host antiviral response. The protease domain is one of the main candidate targets for rational drug design. Today, structural knowledge about both the E1 and E2 glycoproteins is very limited. This is due to the lack of feasible expression systems for etherologous proteins, which has made attempts at crystallization unsuccessful. However, two models of the three-dimensional structure of E2, obtained by fold recognition have been proposed by Yagnik et al. in 2000 (Model-1) and by Spiga et al. in 2006 (Model-2). In both cases, the molecular model is based on the E protein of Tick-Borne Encephalitis Virus (TBEV), a virus belonging to the Flaviviridae family and thus evolutionary closely related to HCV. These models are compared and evaluated in terms of their reliability according to experimentally derived functional information. Model-1 seems to be the most consistent with the functions of E2 as supported by collected evidence. However, this model presents some weak points, the most noteworthy being that is does not take into account the location of the strictly conserved cysteine-residues forming nine disulphide bonds. However, the recently acquired knowledge of the E2 protein disulphide bonds, and experimentally derived findings provide sufficient constrains to reconstitute a new model not only for this protein but also for the E1/E2 complex. The current E1E2 model is constructed using the E1 glycoprotein of alphavirus Semliki Forest Virus (SFV) a Class II fusion protein as template. Class II fusion proteins are elongated molecules composed almost entirely of β-strands containing three domains. Domain I is connected to domain II by a highly flexible hinge region. The fusion loop is located on the tip of the domain II, loop crucial for the fusion mechanism. The immuno globulin-like domain III is located in a lateral position and is followed by a stem region connecting the protein ectodomain to the its transmembrane domain. Our HCV E2 model matches well with domains I and II of the template fusion protein while HCV E1 matches domain III. The most important feature of our model is that it takes into account the location of the strictly conserved cysteine residues forming nine disulphide bonds. Validation of this model is performed mapping the most important functional sites. The localization of principal functional sites is often in agreement with experimental data obtained so far. Furthermore there are some proposed novel features of HCV envelope proteins which present a structural hypothesis explaining the viral membrane fusion machinery architecture. This new E1E2 model shares the main features of Class II fusion proteins. In this case the fusion process is promoted by a dimer of two proteins instead of the single one in Class II proteins. The proposed E1E2 fusion complex involves E1 as anchor for the entire structure, and makes contact with E2 by their respective transmembrane and stem regions. An α-helix insertion in E1 also increases the interaction surface between them. In E2, the flexible loop together with the hinge region has a principal role in conformational changes during the fusion process. Moreover, the new model places the fusion loop very well on the tip of the elongated domain II, in which the GWG motif (mostly conserved among E2 HCV genotypes and the other members of the same family) is very exposed. This is important because we think that it is the principal structural feature of this sequence stretch of directly involved in the insertion into the host cell membrane, and it is able to bridge the gap between the viral and cell membranes to promoting their fusion. In the last part of this thesis the attention has been focused on the use an emergent method: residue interaction networks (RINs) where each node represents a residue in the protein and connections are used to indicate different interaction types (van-der-Waals contacts, salt bridges, pi-pi stacks or simple hydrophobic contacts). We used residue interaction networks to investigate molecular effects underlying drug resistance in the NS3 protease, one the current candidate target to develop inhibitors. All published NS3data about both natural occurring variants and drug induced mutations associated to a decrease susceptibility to protease inhibitors were collected. Attention has been focused on the study of drug resistance mechanisms against the two inhibitors Telaprevir and Boceprevir, currently in phase III of clinical trial. Two variants V36M and R155K, have been analyzed. The V36M variant affects the local conformation and the geometry of the hydrophobic cavity, as a consequence of the a higher number of interactions which confers higher rigidity to this site if compared with the WT strain. The mutations effect is reflected on the immediately close active site binding pocket. The R155K variant has an impact both on the local conformation in proximity of the beta-barrel domain involved in substrate binding but also in the active site binding pocket. In this case RIN analysis showed the importance of G140 located in the same loop as S139 (catalytic triad amino acid directly involved in inhibitors binding). G140 probably plays an important role in maintaining the flexibility of this loop in the WT strain while in the mutated protein this condition is lost due to an increased number of interactions such as a a new hydrogen bond with an amino acid responsible for substrate binding (F154) and directly interacting with S139. Applying filters based on residue conservation and their degree (number of interactions per residue), it has been possible to identify functionally and structurally important residues. As expected, some of these are part of functionally important sites such as the catalytic triad, hydrophobic cavity and substrate binding sites. Other are not involved in the known NS3function but probably, on the basis of these results, are critical to maintain the NS3structure.

Nel mondo circa 200-300 milioni di persone sono cronicamente infettate dal virus dell’Epatite C (HCV). Nel 20% dei casi la cronicità può portare a cirrosi ed epatocarcinoma. HCV fa parte della famiglia dei Flaviviridae, come Dengue Virus e West Nile Virus (WNV), ed è classificato nel genere Hepacivirus, per le differenze nella sequenza amminoacidica. Il genoma di HCV è costituito da un singolo filamento di RNA a polarità positiva di 9.6 kb che codifica per un'unica poliproteina la quale viene successivamente processata nelle rispettive proteine strutturali (Core, E1, E2, e p7) e nelle proteine non strutturali (NS2, NS3, NS4, NS4B, e NS5B). In questa tesi, sono stati applicati una serie di metodi computazionali e differenti approcci per studiare proteine del virus dell’ epatite C (HCV) coinvolte nel processo di fusione (le glicoproteine proteine dell’envelope E1 e E2) e nel meccanismo di resistenza ai farmaci. Le glicoproteine E1 e E2, costituiscono la superficie del virus e sono responsabili delle sue proprietà antigeniche. Sono inoltre coinvolte nel processo di interazione con la membrana della cellula ospite e dell’entrata del virus al suo interno. La proteina NS3/4A (proteasi virale) è responsabile di una serie di importanti funzioni nel ciclo di replicazione virale che includono: il processamento della poliproteina nelle rispettive proteine strutturali e non strutturali, la replicazione del virus e inibizione della risposta antivirale della cellula ospite. La proteasi è uno dei candidati target per la progettazione di farmaci antivirali. Al momento, la conoscenza delle caratteristiche strutturali delle glicoproteine E1 e E2 è molto limitata. Ciò è dovuto dalla mancanza di un sistema eterologo di espressione di queste proteine che rende difficoltosa cristallizzazione. Nonostante la mancanza di una struttura cristallografica, sono stati proposti due modelli tridimensionali per la proteina E2. Questi modelli sono stati ottenuti col metodo bioinformatico del “fold recognition” e sono stati proposti dal gruppo di Yagnik nel 2000 (Modello-1) e dal gruppo di Spiga nel 2006 (Modello-2). In entrambi i casi, il modello si basa sulla glicoproteina E dell’envelope del virus Tick-Borne Encephalitis virus (TBEV), un virus appartenente alla famiglia Flaviviridae e quindi evolutivamente correlata all’HCV. Questi due modelli sono stati comparati e valutati per la affidabilità considerando le informazioni ottenute sperimentalmente. Sulla base di questi risultati, il Modello-1 sembra essere più coerente con le funzioni di E2 come supportato dalle evidenze sperimentali. Tuttavia questo modello presenta dei punti deboli, il più importante è il fatto che non tiene conto del pattern delle cisteine che formano nove ponti disolfuro. La recente identificazione delle cisteine formanti i ponti disolfuro, e le evidenze sperimentali, hanno fornito una base sufficiente per costruire un nuovo modello, non solo della proteina E2 ma del complesso E1E2. L’attuale modello è stato costruito usando la glicoproteina E1 del virus Semliki Forest Virus (SFV) come templato, un virus appartenente al genere alphavirus e alla famiglia Togaviridae. La glicoproteina E1 di SVF appartiene alle proteine si fusione virali di classe II che sono strutture allungate composte quasi interamente da foglietti β e contengono tre domini. Il dominio I è connesso al dominio II tramite una regione cerniera molto flessibile (Hinge region). All’estremità del dominio II è localizzato il Il loop di fusione che ha un ruolo fondamentale nel processo di fusione. L’ immuno globuline-like dominio III, è localizzato lateralmente ed è seguito da una regione detta “stem region” che connette il dominio esterno al dominio transmenbrana. Il nostro modello della proteina E2 di HCV si adatta molto bene col dominio I e II, mentre E1 si adatta col dominio III della proteina di fusione usata come templato ed è in accordo col il pattern delle cisteine che formano i ponti disolfuro. Inoltre la bontà della struttura risultante è stata valutata mappando i siti funzionali più importanti e, la loro localizzazione è spesso in accordo con i dati sperimentali. Il modello E1E2 ha inoltre permesso di proporre una nuova ipotesi che spiega il meccanismo della fusione del virus con la membrana della cellula ospite. Questo modello inoltre condivide con le proteine di fusione di classe II una serie di caratteristiche, tranne il fatto che in queste ultime, la proteina di fusione, e il processo di fusione è promosso da un'unica proteina (E1) mentre in HCV da due proteine, E1 e E2. Il complesso di fusione E1E2 presentato in questo lavoro, propone l’ipotesi che E1 possa fungere da ancoraggio per l’intera struttura e inoltre prendere contatti con la proteina E2 mediante le rispettive regioni trans membrana e regioni “stem”. La presenza di un α elica nel modello di E1 incrementa l’interazione tra le due proteine. In E2, il loop flessibile situato nella regione “stem”, insieme alla regione cerniera “hinge”, svolge un ruolo principale durante i cambiamenti conformazionali a cui sono sottoposte queste proteine durante il processo di fusione. Nel modello è localizzato correttamente il loop di fusione nel quale, il motivo GWG (molto conservato nelle sequenze di E2 dei diversi genotipi) è molto esposto. Questo è molto importante perché pensiamo che il motivo GWG sia la più importante caratteristica strutturale/funzionale presente nel loop di fusione che lo vede direttamente coinvolto nell’inserzione nella membrana cellulare ospite e capace quindi di colmare il divario tra la membrana cellulare della cellula ospite e del virus, promuovendo la loro fusione. Nell’ultima parte del lavoro, l’attenzione è stata focalizzata sull’uso di un metodo emergente “reti di interazione dei residui amminoacidici” (RINs) dove ogni nodo corrisponde ad un amminoacidico della proteina e le connessioni rappresentano i diversi tipi di interazione (contatti van-der-Waals, ponti salini, legami π-π o semplici contatti idrofobici). Le reti di interazioni sono state utilizzate per studiare l’effetto molecolare che determina la resistenza ai farmaci nella proteina NS3 (proteasi di HCV), uno dei target per lo sviluppo di inibitori. Per questo studio sono state collezionate tutte le mutazioni associate alla resistenza indotta da due farmaci, Telaprevir e Boceprevir, attualmente in fase III di sperimentazione clinica e sono state analizzate due varianti V36M e R155K. La variante V36M influisce sulla conformazione locale e sulla geometria della cavità idrofobica della proteina, questo effetto è una conseguenza del fatto che la mutazione stabilisce un maggior numero di interazioni nella proteina mutata rispetto alla proteina WT. Questo effetto si riflette anche sulla tasca del sito attivo localizzato vicino ad essa. Nella variante R155K, invece l’effetto della mutazione si riflette sul cambiamento conformazionale in corrispondenza del domino ß-barrel coinvolto nel binding con il substrato e di conseguenza sulla vicina tasca del sito attivo. In quest’ultima analisi, le reti di interazione amminoacidiche hanno evidenziato l’importanza del residuo G140, localizzato nello stesso loop del residuo S139 (amminoacido del sito catalitico anche direttamente coinvolto nel legame con gli inibitori). G140 probabilmente ha un ruolo fondamentale nel mantenimento della flessibilità di questo loop. Nella proteina mutata questa flessibilità viene persa in conseguenza al fatto che G140 ha un maggior numero di interazioni, in particolare un nuovo legame idrogeno con l’amminoacido F154, direttamente coinvolto nel legame col substrato. Il residuo F154 interagisce direttamente con S139. Applicando filtri basati sulla conservazione e sul grado dei nodi (totale numero di interazioni di ogni residuo nella rete), è stato possibile identificare residui importanti sia dal punto di vista funzionale che strutturale. Come ci si aspettava, alcuni residui non sono conosciuti come funzionalmente importanti, ma questi probabilmente, sulla base dei risultati ottenuti potrebbero essere critici per il mantenimento della conformazione strutturale.

In silico analysis of hepatitis C virus: development of a novel fusion process hypothesis and study of drug resistance / Piano, M. A.. - (2011 Jan 30).