Identification of the full complement of genes in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a crucial step towards gaining a fuller understanding of its molecular biology. However, short and/or overlapping genes can be difficult to detect using conventional computational approaches, whereas high-throughput experimental approaches – such as ribosome profiling – cannot distinguish translation of functional peptides from regulatory translation or translational noise. By studying regions showing enhanced conservation at synonymous sites in alignments of SARS-CoV-2 and related viruses (subgenus Sarbecovirus) and correlating the results with the conserved presence of an open reading frame (ORF) and a plausible translation mechanism, a putative new gene – ORF3c – was identified. ORF3c overlaps ORF3a in an alternative reading frame. A recently published ribosome profiling study confirmed that ORF3c is indeed translated during infection. ORF3c is conserved across the subgenus Sarbecovirus, and encodes a 40–41 amino acid predicted transmembrane protein.
Although synonymous site conservation can result from overlapping non-coding or coding elements, the conserved presence and conserved positions of the ORF3c start and stop codons suggests the latter interpretation. Moreover, the ribosome profiling study confirms that ORF3c is indeed translated during infection. The combination of comparative genomics showing purifying selection (which to a large extent is synonymous with functional importance) and ribosome profiling showing expression strongly suggests that 3c is a functional protein, conserved throughout sarbecoviruses. While the known SARS-CoV-2 genes have already been investigated in SARS-CoV-1 , 3c has never before been studied. Clearly, additional work with SARS-CoV reverse genetics systems will be required to elucidate the 3c protein function, and it may eventually provide a new target for vaccine or antiviral strategies. The synplot2 analysis has also revealed other functional elements embedded within the viral protein-coding genes (e.g. in ORF1a), which may also be worthy of experimental investigation.
During preparation of this manuscript, 3c was independently discovered.(where it is termed 3h), who performed a similar analysis with synplot2 but used far fewer sarbecovirus sequences, and hence achieved lower statistical significance for conserved elements. More recently, 3c was also independently discovered (where it is also termed 3c) using PhyloCSF, in which ORF3c–frame codon substitutions are compared with coding and non-coding evolutionary models, thus representing an independent approach that is completely different from that used in synplot2.
Reference & Source information : https://www.microbiologyresearch.org/
Read More on