Journal of Molecular Biology
The nature of the accessible and buried surfaces in proteins
Abstract
The accessible surface areas have been calculated for the individual residues in 12 proteins, and for the extended chains, the secondary structures and tertiary structure of six proteins. The results include the following:
- 1.
(1) The formation of α-helices and β-pleated sheets from an extended chain buries a greater proportion of polar surface than non-polar and gives 2 to 3 kcal/mol of hydrophobic free energy per residue.
- 2.
(2) The surfaces buried between the secondary structures are very hydrophobic: being two-thirds non-polar and having more than half the polar part formed by groups that hydrogen bond within their own piece of secondary structure, or which are partially accessible to the solvent.
- 3.
(3) As the six proteins increase in molecular weight they bury an increasing proportion of their non-polar surface (60 to 79%), but a constant proportion of their polar surface (75%).
The implications of these results for the theory of protein structure are discussed.
In the Appendix it is shown that the accessible surface area of folded proteins is simply proportional to the two-thirds power of their molecular weight.
References (19)
- J.J. Birktoft et al.
J. Mol. Biol.
(1972) - B.S. Hartley et al.
- W. Kauzmann
Advan. Protein Chem.
(1959) - R.H. Kretsinger et al.
J. Biol. Chem.
(1973) - B.W. Matthews et al.
J. Biol. Chem.
(1974) - M.F. Perutz et al.
J. Mol. Biol.
(1965) - F.A. Quiocho et al.
Advan. Protein Chem.
(1971) - F.M. Richards
J. Mol. Biol.
(1974) - J.D. Bernal et al.
Nature (London)
(1938)
Cited by (1078)
Sequence-based machine learning method for predicting the effects of phosphorylation on protein-protein interactions
2023, International Journal of Biological MacromoleculesProtein phosphorylation, catalyzed by kinases, is an important biochemical process, which plays an essential role in multiple cell signaling pathways. Meanwhile, protein-protein interactions (PPI) constitute the signaling pathways. Abnormal phosphorylation status on protein can regulate protein functions through PPI to evoke severe diseases, such as Cancer and Alzheimer's disease. Due to the limited experimental evidence and high costs to experimentally identify novel evidence of phosphorylation regulation on PPI, it is necessary to develop a high-accuracy and user-friendly artificial intelligence method to predict phosphorylation effect on PPI. Here, we proposed a novel sequence-based machine learning method named PhosPPI, which achieved better identification performance (Accuracy and AUC) than other competing predictive methods of Betts, HawkDock and FoldX. PhosPPI is now freely available in web server (https://phosppi.sjtu.edu.cn/). This tool can help the user to identify functional phosphorylation sites affecting PPI and explore phosphorylation-associated disease mechanism and drug development.
Druggable protein prediction using a multi-canal deep convolutional neural network based on autocovariance method
2022, Computers in Biology and MedicineDrug targets must be identified and positioned correctly to research and manufacture new drugs. In this study, rather than using traditional methods for drug expansion, the drug target is determined using machine learning. Machine learning has generated significant interest and desire in recent years and extensive research due to its low cost and speed of operation. As a result, it is critical to develop an intelligent classification system for drug proteins. This study proposes two distinct models for the prediction of druggable protein classes based on the deep learning method. The translation of drug-protein sequences is based on six physicochemical properties of amino acids. Following the application of the autocovariance method, converted sequences are used as fixed-length input vectors in deep stacked sparse auto-encoders (DSSAEs) network. The coded protein sequences are also considered and utilized as a six-channel input vector for the deep convolutional neural network model. The experimental results contributing to the deep convolution model are more efficient than previous studies for classifying druggable proteins. The proposed approach achieved a sensitivity of 96.92%, a specificity of 99.51%, and an accuracy of 98.29%.
Protein folding in vitro and in the cell: From a solitary journey to a team effort
2022, Biophysical ChemistryCorrect protein folding is essential for the health and function of living organisms. Yet, it is not well understood how unfolded proteins reach their native state and avoid aggregation, especially within the cellular milieu. Some proteins, especially small, single-domain and apparent two-state folders, successfully attain their native state upon dilution from denaturant. Yet, many more proteins undergo misfolding and aggregation during this process, in a concentration-dependent fashion. Once formed, native and aggregated states are often kinetically trapped relative to each other. Hence, the early stages of protein life are absolutely critical for proper kinetic channeling to the folded state and for long-term solubility and function. This review summarizes current knowledge on protein folding/aggregation mechanisms in buffered solution and within the bacterial cell, highlighting early stages. Remarkably, teamwork between nascent chain, ribosome, trigger factor and Hsp70 molecular chaperones enables all proteins to overcome aggregation propensities and reach a long-lived bioactive state.
We used the Moran's I index of global spatial autocorrelation with the aim of studying the distribution of the physicochemical or biological properties of amino acids within the genetic code table. First, using this index we are able to identify the amino acid property - among the 530 analyzed - that best correlates with the organization of the genetic code in the set of amino acid permutation codes. Considering, then, a model suggested by the coevolution theory of the genetic code origin - which in addition to the biosynthetic relationships between amino acids took into account also their physicochemical properties - we investigated the level of optimization achieved by these properties either on the entire genetic code table, or only on its columns or only on its rows. Specifically, we estimated the optimization achieved in the restricted set of amino acid permutation codes subject to the constraints derived from the biosynthetic classes of amino acids, in which we identify the most optimized amino acid property among all those present in the database. Unlike what has been claimed in the literature, it would appear that it was not the polarity of amino acids that structured the genetic code, but that it could have been their partition energy instead. In actual fact, it would seem to reach an optimization level of about 96% on the whole table of the genetic code and 98% on its columns. Given that this result has been obtained for amino acid permutation codes subject to biosynthetic constraints, that is to say, for a model of the genetic code consistent with the coevolution theory, we should consider the following conclusions reasonable. (i) The coevolution theory might be corroborated by these observations because the model used referred to the biosynthetic relationships between amino acids, which are suggested by this theory as having been fundamental in structuring the genetic code. (ii) The very high optimization on the columns of the genetic code would not only be compatible but would further corroborate the coevolution theory because this suggests that, as the genetic code was structured along its rows by the biosynthetic relationships of amino acids, on its columns strong selective pressure might have been put in place to minimize, for example, the deleterious effects of translation errors. (iii) The finding that partition energy could be the most optimized property of amino acids in the genetic code would in turn be consistent with one of the main predictions of the coevolution theory. Since the partition energy is reflective of the protein structure and therefore of the enzymatic catalysis, the latter might really have been the main selective pressure that would have promoted the origin of the genetic code. Indeed, we observe that the β-strands show an optimization percentage of 95.45%; so it is possible to hypothesize that they might have become the object of selection during the origin of the genetic code, conditioning the choice of biosynthetic relationships between amino acids. (iv) The finding that the polarity of amino acids is less optimized than their partition energy in the genetic code table might be interpreted against the physicochemical theories of the origin of the genetic code because these would suggest, for example, that a very high optimization of the polarity of amino acids in the code could be an expression of interactions between amino acids and codons or anticodons, which would have promoted its origin. This might now become less sustainable, given the very high optimization that is instead observed in favor of the partition energy but not polarity. Finally, (v) the very high optimization of the partition energy of amino acids would seem to make a neutral origin of error minimization, i.e. of the ability of the genetic code to buffer, for example, the deleterious effects of translation errors, very unlikely. Indeed, an optimization of about 100% would seem that it might not have been achieved by a simple neutral process, but this ability should probably have been generated instead by the intervention of natural selection. In actual fact, we show that the neutral theory of the origin of error minimization has been falsified for the model analyzed here. Therefore, we will discuss our observations within the theories proposed to explain the origin of the organization of the genetic code, reaching the conclusion that the coevolution theory is the most strongly corroborated theory.
Improved protein relative solvent accessibility prediction using deep multi-view feature learning framework
2021, Analytical BiochemistryThe accurate prediction of the relative solvent accessibility of a protein is critical to understanding its 3D structure and biological function. In this study, a novel deep multi-view feature learning (DMVFL) framework that integrates three different neural network units, i.e., bidirectional long short-term memory recurrent neural network, squeeze-and-excitation, and fully-connected hidden layer, with four sequence-based single-view features, i.e., position-specific scoring matrix, position-specific frequency matrix, predicted secondary structure, and roughly predicted three-state relative solvent accessibility probability, is developed to accurately predict relative solvent accessibility information of protein. On the basis of this newly developed framework, one new protein relative solvent accessibility predictor was proposed and called DMVFL-RSA, which employs a customized multiple feedback mechanism that helps to extract discriminative information embedded in the four single-view features. In benchmark tests on TEST524 and CASP14-derived (CASP14set) datasets, DMVFL-RSA outperforms other existing state-of-the-art protein relative solvent accessibility predictors when predicting two-state (exposure threshold of 25%), three-state (exposure thresholds of 9% and 36%), and four-state (exposure thresholds of 4%, 25%, and 50%) discrete values. For real-valued prediction on TEST524 and CASP14set, DMVFL-RSA has also gained high Pearson correlation coefficient values, indicating a positive correlation between the predicted and native relative solvent accessibility. Detailed analyses show that the major advantages of DMVFL-RSA lie in the high efficiency of the DMVFL framework, the applied multiple feedback mechanism, and the strong sensitivity of the sequence-based features. The web server of DMVFL-RSA is freely available at https://jun-csbio.github.io/DMVFL-RSA/for academic use. The standalone package of DMVFL-RSA is downloadable at https://github.com/XueQiangFan/DMVFL-RSA.
Toward mechanistic understanding of asphaltene adsorption onto quartz surface: The roles of size, concentration, and hydrophobicity of quartz, asphaltene composition, flow condition, and aqueous phase
2021, Journal of Petroleum Science and EngineeringThe nature of asphaltene and mineralogy of reservoir rock affect the adsorption of asphaltene onto the rock surface. The basic goal of the present study is to assess the adsorption of different asphaltenes on quartz as the main mineral of sandstone formations. For this purpose, asphaltenes were separated from four crude oil samples from different Iranian oil reservoirs and their specifications were analyzed applying various methods including Fourier-transform infrared spectroscopy, elemental analysis, dynamic light scattering, and field emission scanning electron microscopy. Also, X-ray fluorescence and Brunauer–Emmett–Teller methods were carried out to analyze the quartz mineral. Adsorption experiments were performed both statically and dynamically and the effects of the water phase and hydrophobicity of quartz on the adsorption of asphaltene on the mineral were evaluated. The calculated H/C ratios of asphaltenes are in the range of 1.2–1.3, and their average particle size is between 46 and 380 nm. The results of static tests showed that asphaltenes uptake by quartz adsorbents is from 1.16 to 6.78 mg/m2. The results of this work indicated that the adsorption amount of asphaltenes on the quartz surface is directly related to the nitrogen content and polarity of asphaltene, but it is not related to the average particle diameter of asphaltenes and their aromatic nature. By double increasing the flow rate in the dynamic adsorption process, the asphaltene uptake by quartz is reduced by about 35%, which could be an indication of the physical nature of asphaltene adsorption on quartz. Finally, compared to the asphaltene uptake by quartz in the two-phase system, the uptake in the three-phase system on average shows a decrease of more than 30%, which indicates the effect of the competitive presence of water with asphaltene for the surface-active sites. Also, the uptake of asphaltene onto the quartz samples increased with increasing the initial asphaltene concentration and decreasing quartz particle sizes. The effect of wettability on asphaltene adsorption in two- and three-phase systems was investigated using hydrophilic and hydrophobic quartz nanoparticles. In both systems, the adsorption of asphaltene on hydrophobic nanoparticles was higher and the adsorption reduction in the three-phase system was calculated to be between 5 and 10%, which was very small compared to hydrophilic nanoparticles with an adsorption reduction of 30–60%. Wettability of nanoparticles compared to their specific surface area showed more importance in asphaltene adsorption.