
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), a novel evolutionary divergent RNA virus, is responsible for the present devastating COVID-19 pandemic. To explore the genomic signatures, we comprehensively analyzed 2,492 complete and/or near-complete genome sequences of SARS-CoV-2 strains reported from across the globe to the GISAID database up to 30 March 2020. Genome-wide annotations revealed 1,516 nucleotide-level variations at different positions throughout the entire genome of SARS-CoV-2. Moreover, nucleotide (nt) deletion analysis found twelve deletion sites throughout the genome other than previously reported deletions at coding sequence of the ORF8 (open reading frame), spike, and ORF7a proteins, specifically in polyprotein ORF1ab (n = 9), ORF10 (n = 1), and 3´-UTR (n = 2). Evidence from the systematic gene-level mutational and protein profile analyses revealed a large number of amino acid (aa) substitutions (n = 744), demonstrating the viral proteins heterogeneous. Notably, residues of receptor-binding domain (RBD) showing crucial interactions with angiotensin-converting enzyme 2 (ACE2) and cross-reacting neutralizing antibody were found to be conserved among the analyzed virus strains, except for replacement of lysine with arginine at 378th position of the cryptic epitope of a Shanghai isolate, hCoV-19/Shanghai/SH0007/2020 (EPI_ISL_416320). Furthermore, our results of the preliminary epidemiological data on SARS-CoV-2 infections revealed that frequency of aa mutations were relatively higher in the SARS-CoV-2 genome sequences of Europe (43.07%) followed by Asia (38.09%), and North America (29.64%) while case fatality rates remained higher in the European temperate countries, such as Italy, Spain, Netherlands, France, England and Belgium. Thus, the present method of genome annotation employed at this early pandemic stage could be a promising tool for monitoring and tracking the continuously evolving pandemic situation, the associated genetic variants, and their implications for the development of effective control and prophylaxis strategies.
In this study, only four core aa substitutions (Q57H, D614G, L3606F, and P4714L) were found across all the climatic zones (Fig. 3b). Notably, three replacements L3606F, P4714L, D614G found in all geographic and climatic zones indicating their special importance in universal infectivity and whole virus vaccine development for world population. Conversely, 379,187, 57, 27, and 4 unique nonsynonymous mutations were found in the SARS-CoV-2 genomes belonged to temperate, diverse, tropical, dry and continental climatic conditions, respectively. Moreover, 66, 52, 30, 26, and 5 residues positioned where mutations occurred at least in two climatic zones. Temperate European countries such as Italy, Spain, France, Netherlands and England are major COVID-19 infected countries with higher mortality rates at this initial stage. These finding are in line with Deshwal who also reported highest SARS-CoV-2 infections and case fatality rates in European countries. Besides, two previously reported nonsynonymous mutations (R203K and L3606F)11 were shared across ORFs of the SARS-CoV-2 genomes of six continents, and co-occurrence of those mutations were also common in different countries along with unique mutations. However, these unique or accessory mutations are driven by the geographic locations that may have impact on differntial divergence of the virus strains, possibly due to the changed environment, varied demography, the lower fidelity of the RNA dependent RNA polymerase (RdRp), and/or relatively less efficient proof-reading activity of the NSP14.
By comparing the death severity rates of SARS-CoV-2 infections in 45 countries across the globe according to different climatic conditions, we found that most of the temperate countries had higher mortality rates than other climatic zones (Fig. 3c). Among the temperate countries, highest mortality rate was found in Italy (11.04%) followed by Spain (8.29%), Netherlands (7.08%), France (6.56%), England (6.29%), China (4.02%), Belgium (3.98%), Hungary (3.36%), Denmark (3.01%), and rest of the countries had less than 3.0% SARS-CoV-2 case fatality rates (Fig. 3c, Supplementary Data 1). The death rates from SARS-CoV-2 infections greatly varied in diverse climatic conditions of Brazil, Pakistan, and Australia, where we found 2.92%, 1.11%, and 0.4% death rates, respectively. Similarly, in tropical climate conditions of India, Ecuador, Panama, Mexico, Peru, and Malaysia, we found 2.71%, 2.62%, 1.89%, 1.89%, 1.64%, and 0.38% mortality rates, respectively. The continental countries, USA and Lithuania had observed mortality rates of 1.72%, and 1.45%, respectively from SARS-CoV-2 infections. However, mortality rates in dry countries like Saudi Arabia were much lower (< 1.0%) compared to other climatic conditions (Fig. 3c, Supplementary Data 1). Comparatively higher mortality rates in European temperate countries might be correlated with higher unique mutations found in the viruses reported from this geo-climate region. Despite higher unique mutations in Asian and North American strains, these regions showed less case fatality rates compared to the European countries. These findings predicted the European unique mutations to be associated with higher pathogenicity of the virus. However, it is worth noting that reported disease severity (may not represent the actual severity) might be affected by several other factors, for example, health care facilities, average age group, genetic context of the population and control strategies adopted by the countries. Regardless of the importance of geography on the COVID-19 epidemiology, the effects of global mobility upon the genetic diversity and molecular evolution of SARS-CoV-2 are underappreciated and only beginning to be comprehended. Moreover, the recent monograph on the spatial epidemiology of COVID-19 makes no reference to the genetic variation of SARS-CoV-2.
The limitations we faced in this study are due to the nature of the SARS-CoV-2 genomic data where the sample collection dates might not reveal the actual infection dates, and not all countries faced SARS-CoV-2 infections have timely uploaded the sequences to the GISAID database. Thus, the mutation patterns might be an approximate finding. Moreover, many countries have not sequenced enough virus samples (such as African and Sub-Saharan countries), and some countries uploaded sequences collected from samples of single-source or zone of infection (Japan), hence the mutation pattern may be biased in specific country or continent. Nevertheless, our study had included the most complete available SARS-CoV-2 sequences up to March 30, 2020. This study, therefore, opens up new perspectives to determine whether one of these frequent mutations will lead to biological differences, and their correlation with different case fatality rates.
Conclusions This study reveals a number of unreported mutations, which cover both mismatches and deletions in translated and untranslated regions of the SARS-CoV-2 genomes. Moreover, the geo-climate distribution of the mutations deciphered higher unique mutations as well as disease severity in the European temperate countries. Further investigations should focus on structural validations and subsequent phenotypic consequences of the deletions and/or mismatches in transmission dynamics of the current epidemics and the immediate implications of these genomic markers to develop potential prophylaxis and mitigation for tackling the crisis of pandemic COVID-19. Moreover, the identification of the conformational changes in mutated protein structures and untranslated cis-acting elements is of significance for studying the virulence, pathogenicity and transmissibility of SARS-CoV-2. This mutational diversity should be investigated by further studies, including their metabolic functional pathway, intra-viral and virus-host interactions analyses.
Reference & Source information: https://www.nature.com/
Read More on: