Sequences containing at least four runs of repetitive cytosines or guanines can form tetra-helical nucleic acid structures called i-Motifs and G-quadruplexes, respectively. The folding into the tetra-helixes is based on the non-canonical cytosine-cytosine+ and guanine-guanine Hoogsteen base pairs that form between the cytosines and guanines within the runs of the i-Motifs and G-quadruplexes. At now, the general patterns for potential i-Motif and G-quadruplex forming sequences, commonly reported as C2-5-N1-7-C2-5-N1-7-C2-5-N1-7-C2-5 and G2-5-N1-7-G2-5-N1-7-G2-5-N1-7-G2-5, are considered perfectly complementary, thus clustering them at the same genome locations. On this basis, bioinformatics tools were developed to search for these secondary structures, and, interestingly, a significant enrichment of potential quadruplex forming sequences was found at the telomers and the promoters of oncogenes. Moreover, it has been proved that they are folded in-vivo and play a role in the regulation of transcription in-vitro, making them promising targets for the development of new anti-cancer therapies. Therefore, we started by screening the proximal promoter of the EGFR oncogene for potential quadruplex-forming sequences having at least 3 guanines or cytosines in all the runs. Two sequences were found: EGFR-272 and EGFR-37, 272 and 37 nucleotides upstream of the transcription starting site of the oncogene, respectively. Focussing on the cytosine-rich strand of EGFR-37, we observed that it folds into an i-Motif that forms fewer cytosine-cytosine+ base pairings than the predicted ones. From the in-silico analysis, the central loop should have been one nucleotide long to properly form 6 cytosine-cytosine+ base pairs but our results indicated that two cytosines, one from each of the second and the third runs, did not pair to make the central loop longer. This result prompted us to consider that one nucleotide long loops may not fit with the i-Motifs as they do with G-quadruplexes. Therefore, we developed a novel step-by-step pipeline for the systematic screening of i-Motif models, and we applied it to determine the minimal length of the loops allowing the folding into an intra-molecular i-Motif with a focus on structures comprising only three cytosine-cytosine+ base pairs. Our data indicate that two and three nucleotides are required to connect the strands through the major and the minor grooves of the i-Motif, respectively. Moreover, as it was for the i-Motif of EGFR-37, we noticed that very often thymine-thymine base pairs are found to be in staking on the outermost cytosine-cytosine+ base pairs of the i-Motifs. Therefore, we decided to verify if, beyond the outside of the i-Motif core, they can form within the cytosine-cytosine+ base pairs as well. Thus, we studied CT dinucleotide repeats and we proved that they fold into i-Motif structures with alternating intercalation of cytosine-cytosine+ and thymine-thymine base pairs. These results prove that the oligonucleotide pattern of a potential i-Motif forming sequence is different from the G-quadruplex one and this potentially clusters these secondary structures at different genome locations, from which may derive different biological functions. This knowledge represents a step forward to the development of prediction tools for the proper identification of bio-functional i-Motifs as well as for the rational design of these secondary structures for technological applications.

Sequences containing at least four runs of repetitive cytosines or guanines can form tetra-helical nucleic acid structures called i-Motifs and G-quadruplexes, respectively. The folding into the tetra-helixes is based on the non-canonical cytosine-cytosine+ and guanine-guanine Hoogsteen base pairs that form between the cytosines and guanines within the runs of the i-Motifs and G-quadruplexes. At now, the general patterns for potential i-Motif and G-quadruplex forming sequences, commonly reported as C2-5-N1-7-C2-5-N1-7-C2-5-N1-7-C2-5 and G2-5-N1-7-G2-5-N1-7-G2-5-N1-7-G2-5, are considered perfectly complementary, thus clustering them at the same genome locations. On this basis, bioinformatics tools were developed to search for these secondary structures, and, interestingly, a significant enrichment of potential quadruplex forming sequences was found at the telomers and the promoters of oncogenes. Moreover, it has been proved that they are folded in-vivo and play a role in the regulation of transcription in-vitro, making them promising targets for the development of new anti-cancer therapies. Therefore, we started by screening the proximal promoter of the EGFR oncogene for potential quadruplex-forming sequences having at least 3 guanines or cytosines in all the runs. Two sequences were found: EGFR-272 and EGFR-37, 272 and 37 nucleotides upstream of the transcription starting site of the oncogene, respectively. Focussing on the cytosine-rich strand of EGFR-37, we observed that it folds into an i-Motif that forms fewer cytosine-cytosine+ base pairings than the predicted ones. From the in-silico analysis, the central loop should have been one nucleotide long to properly form 6 cytosine-cytosine+ base pairs but our results indicated that two cytosines, one from each of the second and the third runs, did not pair to make the central loop longer. This result prompted us to consider that one nucleotide long loops may not fit with the i-Motifs as they do with G-quadruplexes. Therefore, we developed a novel step-by-step pipeline for the systematic screening of i-Motif models, and we applied it to determine the minimal length of the loops allowing the folding into an intra-molecular i-Motif with a focus on structures comprising only three cytosine-cytosine+ base pairs. Our data indicate that two and three nucleotides are required to connect the strands through the major and the minor grooves of the i-Motif, respectively. Moreover, as it was for the i-Motif of EGFR-37, we noticed that very often thymine-thymine base pairs are found to be in staking on the outermost cytosine-cytosine+ base pairs of the i-Motifs. Therefore, we decided to verify if, beyond the outside of the i-Motif core, they can form within the cytosine-cytosine+ base pairs as well. Thus, we studied CT dinucleotide repeats and we proved that they fold into i-Motif structures with alternating intercalation of cytosine-cytosine+ and thymine-thymine base pairs. These results prove that the oligonucleotide pattern of a potential i-Motif forming sequence is different from the G-quadruplex one and this potentially clusters these secondary structures at different genome locations, from which may derive different biological functions. This knowledge represents a step forward to the development of prediction tools for the proper identification of bio-functional i-Motifs as well as for the rational design of these secondary structures for technological applications.

DNA i-Motif: from structure and thermodynamic to genome distribution / Ghezzo, Michele. - (2023 Apr 12).

DNA i-Motif: from structure and thermodynamic to genome distribution.

GHEZZO, MICHELE
2023

Abstract

Sequences containing at least four runs of repetitive cytosines or guanines can form tetra-helical nucleic acid structures called i-Motifs and G-quadruplexes, respectively. The folding into the tetra-helixes is based on the non-canonical cytosine-cytosine+ and guanine-guanine Hoogsteen base pairs that form between the cytosines and guanines within the runs of the i-Motifs and G-quadruplexes. At now, the general patterns for potential i-Motif and G-quadruplex forming sequences, commonly reported as C2-5-N1-7-C2-5-N1-7-C2-5-N1-7-C2-5 and G2-5-N1-7-G2-5-N1-7-G2-5-N1-7-G2-5, are considered perfectly complementary, thus clustering them at the same genome locations. On this basis, bioinformatics tools were developed to search for these secondary structures, and, interestingly, a significant enrichment of potential quadruplex forming sequences was found at the telomers and the promoters of oncogenes. Moreover, it has been proved that they are folded in-vivo and play a role in the regulation of transcription in-vitro, making them promising targets for the development of new anti-cancer therapies. Therefore, we started by screening the proximal promoter of the EGFR oncogene for potential quadruplex-forming sequences having at least 3 guanines or cytosines in all the runs. Two sequences were found: EGFR-272 and EGFR-37, 272 and 37 nucleotides upstream of the transcription starting site of the oncogene, respectively. Focussing on the cytosine-rich strand of EGFR-37, we observed that it folds into an i-Motif that forms fewer cytosine-cytosine+ base pairings than the predicted ones. From the in-silico analysis, the central loop should have been one nucleotide long to properly form 6 cytosine-cytosine+ base pairs but our results indicated that two cytosines, one from each of the second and the third runs, did not pair to make the central loop longer. This result prompted us to consider that one nucleotide long loops may not fit with the i-Motifs as they do with G-quadruplexes. Therefore, we developed a novel step-by-step pipeline for the systematic screening of i-Motif models, and we applied it to determine the minimal length of the loops allowing the folding into an intra-molecular i-Motif with a focus on structures comprising only three cytosine-cytosine+ base pairs. Our data indicate that two and three nucleotides are required to connect the strands through the major and the minor grooves of the i-Motif, respectively. Moreover, as it was for the i-Motif of EGFR-37, we noticed that very often thymine-thymine base pairs are found to be in staking on the outermost cytosine-cytosine+ base pairs of the i-Motifs. Therefore, we decided to verify if, beyond the outside of the i-Motif core, they can form within the cytosine-cytosine+ base pairs as well. Thus, we studied CT dinucleotide repeats and we proved that they fold into i-Motif structures with alternating intercalation of cytosine-cytosine+ and thymine-thymine base pairs. These results prove that the oligonucleotide pattern of a potential i-Motif forming sequence is different from the G-quadruplex one and this potentially clusters these secondary structures at different genome locations, from which may derive different biological functions. This knowledge represents a step forward to the development of prediction tools for the proper identification of bio-functional i-Motifs as well as for the rational design of these secondary structures for technological applications.
DNA i-Motif: from structure and thermodynamic to genome distribution.
12-apr-2023
Sequences containing at least four runs of repetitive cytosines or guanines can form tetra-helical nucleic acid structures called i-Motifs and G-quadruplexes, respectively. The folding into the tetra-helixes is based on the non-canonical cytosine-cytosine+ and guanine-guanine Hoogsteen base pairs that form between the cytosines and guanines within the runs of the i-Motifs and G-quadruplexes. At now, the general patterns for potential i-Motif and G-quadruplex forming sequences, commonly reported as C2-5-N1-7-C2-5-N1-7-C2-5-N1-7-C2-5 and G2-5-N1-7-G2-5-N1-7-G2-5-N1-7-G2-5, are considered perfectly complementary, thus clustering them at the same genome locations. On this basis, bioinformatics tools were developed to search for these secondary structures, and, interestingly, a significant enrichment of potential quadruplex forming sequences was found at the telomers and the promoters of oncogenes. Moreover, it has been proved that they are folded in-vivo and play a role in the regulation of transcription in-vitro, making them promising targets for the development of new anti-cancer therapies. Therefore, we started by screening the proximal promoter of the EGFR oncogene for potential quadruplex-forming sequences having at least 3 guanines or cytosines in all the runs. Two sequences were found: EGFR-272 and EGFR-37, 272 and 37 nucleotides upstream of the transcription starting site of the oncogene, respectively. Focussing on the cytosine-rich strand of EGFR-37, we observed that it folds into an i-Motif that forms fewer cytosine-cytosine+ base pairings than the predicted ones. From the in-silico analysis, the central loop should have been one nucleotide long to properly form 6 cytosine-cytosine+ base pairs but our results indicated that two cytosines, one from each of the second and the third runs, did not pair to make the central loop longer. This result prompted us to consider that one nucleotide long loops may not fit with the i-Motifs as they do with G-quadruplexes. Therefore, we developed a novel step-by-step pipeline for the systematic screening of i-Motif models, and we applied it to determine the minimal length of the loops allowing the folding into an intra-molecular i-Motif with a focus on structures comprising only three cytosine-cytosine+ base pairs. Our data indicate that two and three nucleotides are required to connect the strands through the major and the minor grooves of the i-Motif, respectively. Moreover, as it was for the i-Motif of EGFR-37, we noticed that very often thymine-thymine base pairs are found to be in staking on the outermost cytosine-cytosine+ base pairs of the i-Motifs. Therefore, we decided to verify if, beyond the outside of the i-Motif core, they can form within the cytosine-cytosine+ base pairs as well. Thus, we studied CT dinucleotide repeats and we proved that they fold into i-Motif structures with alternating intercalation of cytosine-cytosine+ and thymine-thymine base pairs. These results prove that the oligonucleotide pattern of a potential i-Motif forming sequence is different from the G-quadruplex one and this potentially clusters these secondary structures at different genome locations, from which may derive different biological functions. This knowledge represents a step forward to the development of prediction tools for the proper identification of bio-functional i-Motifs as well as for the rational design of these secondary structures for technological applications.
DNA i-Motif: from structure and thermodynamic to genome distribution / Ghezzo, Michele. - (2023 Apr 12).
File in questo prodotto:
File Dimensione Formato  
tesi_definitiva_Michele_Ghezzo.pdf

accesso aperto

Descrizione: tesi_definitiva_Michele_Ghezzo
Tipologia: Tesi di dottorato
Dimensione 4.99 MB
Formato Adobe PDF
4.99 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3476990
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact