SARS-CoV and SARS-CoV-2 do not appear to have functions of a hemagglutinin and neuraminidase. This is a mystery, because sugar binding activities appear essential to many other viruses including influenza and even most other coronaviruses in order to bind to and escape from the glycans (sugars, oligosaccharides or polysaccharides) characteristic of cell surfaces and saliva and mucin. The S1 N terminal Domains (S1-NTD) of the spike protein, largely responsible for the bulk of the characteristic knobs at the end of the spikes of SARS-CoV and SARS-CoV-2, are here predicted to be “hiding” sites for recognizing and binding glycans containing sialic acid. This may be important for infection and the ability of the virus to locate ACE2 as its known main host cell surface receptor, and if so it becomes a pharmaceutical target. It might even open up the possibility of an alternative receptor to ACE2. The prediction method developed, which uses amino acid residue sequence alone to predict domains or proteins that bind to sialic acids, is naïve, and will be advanced in future work. Nonetheless, it was surprising that such a very simple approach was so useful, and it can easily be reproduced in a very few lines of computer program to help make quick comparisons between SARS-CoV-2 sequences and to consider the effects of viral mutations.
Significance of the present work
The significance and innovation of the present work is that it proposes a sialic acid glycan binding function for the SARS-CoV-2 spike protein that has been largely neglected by other workers, apparently on the rationale that ACE-2 binding is the important first step in cell entry. Sites involved in the characteristic cap or knob of the spike protein appear partially persuasive in the light of their role as binding to host cells. There is a further possible site towards the base of the external part of the spike protein, which seems less likely by virtue of its position and weaker prediction. Interaction with sialic acid glycan with or without associated catalytic activity would be consistent with such functions observed in many respiratory and alimentary tract viruses, and not least in many or most other coronaviruses, and so such a function must be important to these viruses. On these grounds, it may be a target for therapeutic agents against SARS-CoV-2, particularly perhaps preventatives as well as means of impeding spread from lung cell to lung cell, and an exposed target for antibodies raised by synthetic vaccines. Although other authors have recently touched on such a glycan binding ability in SARS (as discussed in this paper above and particularly below), it has not been to the present author's knowledge analyzed in comparable detail and do not appear to relate to the same site. Nor do they propose a general prediction method for sialic acid glycan binding as described in the present paper. Of course, in the present paper this is still a prediction and not an experimental result, but it will hopefully encourage experimental researchers to investigate the glycan binding properties of SARS-CoV-2 more extensively. A further innovative feature is that predictive method, which is expected to be worthy of investigation for the proteins of other viruses and even of other organisms. Like many predictive methods in bioinformatics it is not perfect, i.e. there are false positives and false negatives in prediction, so it is actually conceivable that the method is useful even if it is not correct in the particular case of SARS-CoV-2. In that sense, it may emerge as the more important contribution.
The quality of predictions by the current SABR-P algorithm and future work
The current SABR-P predictive algorithm is naïve and it is not expected that it will resemble closely the final refined form of the algorithm, which will based on more rigorously on principles closer to those of the GOR method, the Hyperbolic Dirac Net, the association Q-UEL language, and the BionIngine implementation including its new algorithms. The impression of good performance for the current SABR-P method largely arises from the fact that it is only required to predict the sialic acid glycan binding properties of whole domains or proteins, not highly localized subsequences or surface patches. In essence, the method is really doing little more than capture and quantify in an algorithm the visual inspection of sugar binding domains and proteins and the observations of other workers as discussed above. However, the method was only required to help explore potential non-covalent sialic acid glycan binding sites in the spike glycoprotein, and in that regard it has proven adequate and valuable for present purposes. It also suggests a more refined approach may perform well because false positives and false negatives were mainly just over the boundary and just under it respectively. Resolution should be increased.
Distinguishing proteins binding different glycans and sugars
The current predictions also indicate a research direction in which to explore. The parameters for the general sugar binding capacity of amino acids residues are very different to those used by other workers and here the focus has been on sialic glycans versus other saccharide-based molecules. In this, perhaps the most surprising finding of all is the apparent ability of the method to distinguish between sialic acid containing glycans and other sugars in the case of lectins. This is because non-sialic sugars such as mannose and fucose can occur in sialic acid glycans, and prediction results hint that there is likely to be some distinguishing feature for a majority of cases that makes a specific recognition. In this respect it would seem initially of concern for the sensibleness of the predictions that, for example, mannose-binding lectin binds to a range of sugars that also include N-acetyl-d-glucosamine, N-acetyl-mannosamine, fucose and glucose. It is therefore possible that research in this direction will not be so profitable because the above distinguishing behavior of the algorithm might be to some extent coincidental. Be that as it may, the predictions are remarkably much better than expected, and should certainly be challenged by researchers in order to improve such methods.
Specifically, a larger sample may require a threshold adjustment or corresponding rescaling, perhaps resulting in a deterioration of performance particularly in regard to distinguishing lectins. Nonetheless, it is noteworthy that ultimately human glycan binding proteins have to overcome the same problem as the above kind of prediction algorithm. While this broad range of sugar recognition by the mannose-binding lection permits that lectin to interact with a wide selection of pathogens (viruses, bacteria, yeasts, fungi and protozoa) decorated with such sugars, there must be some kind of distinguishing aspect such that is not decoyed by the sialic acid glycans of the human host. A more mundane problem in extending the study is that the correct state as sialic acid glycan binding, other sugar binding, or not binding any kind of sugar, may be uncertain or a matter of degree. Further studies at time of writing suggested only about 70% for each of accuracy, sensitivity, and specificity, but this larger set is, as yet, of dubious quality for the purpose. Some proteins were believed, rather than known, to bind sialic acid glycans, binding might be weak or less specific or of multiple types, or the domain or approximate location of the binding site can be unclear. Related to that is a difficulty that the performance of any prediction method of this kind is defensible, and possibly unfairly defensible, in regard to false positives: it may be that experiment shows that a particular virus predicted to bind sialic acid glycans does not specifically do so, but perhaps it once did, in evolutionary terms. This is particularly relevant in regard to studying coronaviruses because, as discussed above, many coronaviruses certainly do bind sialic acid glycans. Of course, the prediction method would then still be subject to the criticism that it insufficiently sophisticated to manage the impact of small changes. For purely theoretical methods, that may be an issue for some time: in the present author's experience even simulations of binding of sugars to proteins in atomic detail tend to be difficult in view of the complex role of water molecules. For example, water molecules commonly represent protein-to-sugar bridges as discussed in this paper
Potential biological implications arguably support the above prediction for SARS-CoV-2 spike protein. That is, the story “makes sense”. Although the involvement of SARS-CoV and SARS-CoV-2 with sialic acid glycans has been rather neglected in the literature (but see below), such involvement represents a prominent and well known feature in the life history of influenza and other viruses, and appears no less important in the life of many other coronaviruses. Admittedly, HIV and many other enveloped viruses do not encode hemagglutinin for sialic acid binding. Instead, they interact using N-terminal sialic acid bound to envelope-associated proteins, like gp120 on HIV-1. However, the mode of infection is different. The SARS-CoV-2 coronavirus cannot jump in a magical way from contaminated surfaces to the lung, and it is doubtful that infective small loads of virus rely on chance to travel from the infecting person to the lung cell ACE2 receptor of the next human host. It is as yet unclear how many virus particles of SARS-CoV-2 are needed for infection, but the virus is clearly very contagious, and this may be because rather few particles are needed for infection.
In any event, the virus has to survive, and ideally even benefit for its survival, stages in a complex journey in mucus of a sneeze or on hands, face, eye, nose, or mouth, and in the various stages of the airway. Initial cell entry points are unlikely to be only the lung epithelium. SARS-CoV-2 entry factors are highly expressed in nasal epithelial cells together with innate immune genes . Viral mechanisms relating to these various surfaces could be fairly sophisticated. In biology in general, flexibility in carbohydrate recognition contributes to the targeting efficiency of carbohydrate-active enzymes in environments where there is diverse range of saccharides . In a virus, more than one saccharide-binding site or multiple sugar binding sites in a protein could act to increase or decrease the overall affinity and increase or decrease virus mobility at different locations, while conformational changes could make available some sites and not others could regulate the extent of movement of the virus. Some binding sites have evolved to distinguish not just the sugar residue components but several types of monosaccharide or glycosidic bond linkage.
Once having reached the vicinity of a cell with an ACE2 receptor, the virus still needs to recognize the cell surface and raft across the cell surface to reach the ACE2 receptor. Fantani et al.  argued that a new type of ganglioside-binding domain exists at the tip of the N-terminal domain of the SARS-CoV-2 S protein, and that the subsequence 111–158, conserved among clinical isolates, may improve attachment of the virus to lipid rafts and facilitate contact with the ACE-2 receptor. This study also showed that, in the presence of CLQ or its more active derivative, hydroxychloroquine, the spike protein is no longer able to bind gangliosides. The present study does not support (nor necessarily refute) their conclusions in terms of such specific details, but the general argument concerning guidance to the ACE-2 receptor is compatible. Very recently Milanetti and colleagues have made available a preprint  that is tune with such ideas, and specifically states that binding sialic acids provides a second means of entry, other than ACE2. Rather like the approach, this is based more on interactions between surfaces of molecules in three dimensional space.
The results and conclusions of this study are speculative in the sense that they are applications of computers (using the techniques of bioinformatics and a new predictive method), and hence they are essentially theoretical. Their role has been to highlight the likelihood that the SARS-CoV-2 spike has a biological function of binding host cell sialic acid glycans (and probably across cells surfaces by that means, as discussed below). In particular, a domain in the cap or knob of the SARS-CoV-2 spike, which has so far been somewhat neglected, is involved in the non-covalent binding of host sialic acid glycans. It is perhaps curious that subsequences found as conserved by use of bioinformatics tools such BLASTp and Clustal Omega (also used here as described in Methods Section 4.1), or detected as a known or new functional motif, often seem in the literature to be considered as having the status of experiment or observation, while consideration of more complex patterns with more sequence options tend to be treated as theory and prediction. This caution is justified in the present study because further study and confirmation is required along the lines discussed in Discussion section 5 above. above. To the extent that it is a prediction, it is a prediction for SARS-CoV-2 made in advance of experiment in order to provide an objective and fair test of the methodology and it is hoped that it will stimulate experimental study in this area whether the experiments confirm or refute that prediction. Either result would likely be of ultimate medical importance. This is essentially typical of the more interesting roles of computers in biomedical research, although the general infrastructure and support that they provide for more routine tasks is of course of great importance.
The present paper possibly still stands as the first reported attempt to establish means of making use of sequence motifs that could be recognized between strains, albeit that the order and to some extent precise nature of the amino acid residues appears less important, or perhaps more subtle, than has been considered in previous papers in this series, which is why it required the development of the predictive technique (SABR-P). This will be advanced in future work, but the present method already helps to make quick comparisons between SARS-CoV-2 sequences and to consider the effects of viral mutations. However, it was surprising that a very simple approach was so useful, and it can easily be reproduced in a very few lines of computer program. The important consequence of the present study, however, is that there are already a variety of inhibitors of sialic acid binding that may serve as anti-viral agents, and this will be examined elsewhere.
Reference & source information: ScienceDirect.com | Science, health and medical journals, full text articles and books.
Read more on: