Which amino acid dimerizes




















Minute amounts obtained from the inclusions bodies mimicked rather the properties of the hydrophobic core, than the hinge loop variants displaying low thermal stability and high propensity to first dimerization and next precipitation.

Proteins from cystatin family are counted among the amyloidogenic ones Staniforth et al. This description stands for both stefins Zerovnik et al. The mechanism of proteins oligomerization and fibrillization is still being discussed, especially the transformation of the soluble oligomeric species into mature fibrils Uversky, and the role and structure of the oligomers and folding intermediates in the aggregation process Jahn and Radford, ; Skerget et al.

To make the picture more complex, there are also proposed different models of proteins' ordered aggregation into fibrillar structures Zerovnik et al. The direct proof for the connection between these two processes in the case of cystatin C was provided by Wahlbom and coworkers, who has shown, that inhibition of the domain swapping by covalent linking of the exchanging protein's parts abolishes also its fibrillization.

According to Wahlbom, cystatin C incubated in an acidic buffer pH 4. The hCC hinge loop mutants were also subjected to similar studies. Protein samples were incubated at conditions promoting oligomerization for 2 h and the results were checked using gel filtration on Superdex 75 column equilibrated in the assay buffer.

The analysis of the obtained data revealed that all studied proteins are capable of forming oligomers higher than dimers Figure 3.

For all three analyzed mutants the main oligomeric form had retention time about 11 min which, based on the column calibration, is expected for the hexameric assembly of the protein Figure 3 , peak H, MW ca. The highest content of monomers in the incubated samples was observed for the V57N protein, which was shown to be the least dimerization-prone. In the case of the initially dimeric proline mutant the majority of the protein was in the form of a putative hexamer.

Interestingly, significant amount of the dimeric form was observed only for the native hCC, for which the H peak was much lower than for the other studied proteins. Additionally, for the wt protein, formation of higher oligomers was observed Figure 3 , peak V. The exact molecular weight of this specimen could not be determined due to its elution in the void volume of the SEC column used in this experiment.

Application of the column with higher exclusion limit Superdex also did not bring reliable information. Figure 3. In the inset chromatograms of the protein samples before incubation are presented. The proline mutant formed at the same conditions few circular and irregular aggregates Figure 4D , whereas for the native protein a new type of a structural object was observed Figure 4A.

The population of these oligomers seems to be quite homogeneous. Dimensions of the observed specimen, that is the outer diameter bigger than the diameter of the typical amyloid fibril and much shorter length, suggest that, in such a case, additional conformational change would be necessary for the mature fibril formation. Additionally, these structures did not display any increase in ThT fluorescence observed for amyloid fibrils data not shown.

Figure 4. Bar representing 50 nm is shown as an indicator of the size. Here, the image unpublished before is given in order to enable better discussion. The lack of analogous structures for the rest of the studied hCC variants may suggest the important role of the domain swapping capability of a particular protein in their formation. Additionally, the lack of the rod-like oligomers for the hCC V57P, which is predominantly dimeric, may strengthen the hypothesis of the propagated manner of the domain exchange process.

Changes within these regions increase the propensity of the hCC toward dimerization and oligomerization, which cannot be easily compensated by stabilization of the hinge loop sequence. However, in the case of the wild-type protein, its stability in biologically active, monomeric form may be easily manipulated by changes in the hinge region sequence. The access to stable monomeric and dimeric forms of cystatin C, with marginally changed primary structure and available in high amounts is beneficial from the point of view of more detailed studies of the mechanism of the domain swapping in cystatin C as well as the implication between domain swapping, dimerization and oligomerization.

Our approach based on the rational mutagenesis yielded a pool of data that encourages us to perform further studies of the hCC dimerization mechanism, especially of the early stages of this process and the role of the helical part. The next steps are therefore the folding studies of all obtained cystatin C variants and dissecting the mechanism of their functional or aberrant outcome leading to protein fibrillization.

The knowledge concerning thermodynamics and kinetics of the hCC folding, especially the number and nature of intermediates, should provide the information crucial for better understanding of hCC aggregation process.

Additionally, in the future work we intend to compare the folding mechanism and fibrillization propensities of the cystatin C mutants with the data obtained for other amyloidogenic proteins, not only stefins Jenko et al.

The outcome of such studies could be used to verify the hypothesis about the connection between protein stability, the folding mechanism and fibrilization propensities. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. We would like to thank Prof. Anders Grubb, Prof. Krzysztof Liberek, and Dr. Piotr Skowron and their groups for their great help, valuable discussions and granting us free access to their laboratory resources.

Abrahamson, M. Pubmed Abstract Pubmed Full Text. Alvarez-Fernandez, M. Inhibition of mammalian legumain by some cystatins is due to a novel second reactive site. Behrendt, I. Bergdoll, M. Proline-dependent oligomerization with arm exchange. Structure 5, — Biernacka, D. Ceru, S. Size and morphology of toxic oligomers of amyloidogenic proteins: a case study of human stefin B.

Amyloid 15, — Clausen, J. Proteins in normal cerebrospinal fluid not found in serum. Craig-Schapiro, R. Multiplexed immunoassay panel identifies novel CSF biomarkers for Alzheimer's disease diagnosis and prognosis. Curhan, G. Cystatin C: a marker of renal function or something more? Dehouck, Y. Sequence-structure signals of 3D domain swapping in proteins.

Ding, F. Topological determinants of protein domain swapping. Structure 14, 5— Ekiel, I. Folding-related dimerization of human cystatin C. Internalization of cystatin C in human cell lines. FEBS J. Emekli, U. HingeProt: automated prediction of hinges in protein structures.

Proteins 70, — Gronenborn, A. Protein acrobatics in pairs-dimerization via domain swapping. Grubb, A. Diagnostic value of analysis of cystatin C and protein HC in biological fluids. Human gamma-trace, a basic microprotein: amino acid sequence and presence in the adenohypophysis. Serum concentration of cystatin C, factor D and beta 2-microglobulin as a measure of glomerular filtration rate.

Acta Med. Disulfide mapping reveals the domain swapping as the crucial process of the structural conversion of prion protein. Prion 5, 56— Jahn, T. Folding versus aggregation: polypeptide conformations on competing pathways. Jankowska, E. Thermal and guanidine hydrochloride-induced denaturation of human cystatin C. Janowski, R. Human cystatin C, an amyloidogenic protein, dimerizes through three-dimensional domain swapping.

Jaskolski, M. Acta Biochim. Jelinska, C. Modulation of contact order effects in the two-state folding of stefins A and B. Jenko, Kokalj S. Essential role of proline isomerization in stefin B tetramer formation. Jenko, S. Different propensity to form amyloid fibrils by two homologous proteins - Human stefins A and B: searching for an explanation. Proteins 55, — Jerala, R. In general, reasonable correlations are found between all three force fields, with deviations on the order of 1 kT in aqueous solvent.

Interestingly, even in cases where the dimerization free energies are similar, the binding mode may differ substantially between the force fields. This was found to be especially the case for aromatic residues. In addition to the inter-force-field comparison, we compared the various force fields to a knowledge-based potential. The two independent approaches show good correlation in aqueous solvent with an exception of aromatic residues for which the interaction strength is lower in the knowledge-based potentials.

Such files may be downloaded by article for research use if there is a public use license linked to the relevant article, that license may permit other uses. Theory Comput. View Author Information. Cite this: J. Article Views Altmetric -.

Citations Abstract The interactions between amino acid side chains govern protein secondary, tertiary, and quaternary structure formation. Supporting Information. Cited By. This article is cited by 44 publications. The Journal of Physical Chemistry C , 7 , Marrink , and D. Peter Tieleman. The Journal of Physical Chemistry B , 15 , Biomacromolecules , 16 9 , The Journal of Physical Chemistry B , 7 , Uusitalo , Marcelo F. Masman , Helgi I. Melo , Xavier Periole , Alex H.

Journal of Chemical Theory and Computation , 11 1 , The Journal of Physical Chemistry B , 41 , Eriksson , Aatto Laaksonen , Alexander P. Lyubartsev , and Leif A. Journal of Chemical Theory and Computation , 10 1 , Andrews and Adrian H.

Journal of Chemical Theory and Computation , 9 10 , The Journal of Physical Chemistry B , 39 , Stark , Casey T. Andrews , and Adrian H. Journal of Chemical Theory and Computation , 9 9 , Wassenaar , Helgi I. Marrink , and Lars V. The Journal of Physical Chemistry B , 13 , Martini Force Field Parameters for Glycolipids. Journal of Chemical Theory and Computation , 9 3 , When comparing DCA predictions with the contact maps of the two nucleotide-bound states, we observe a high number of true positives TPs , i.

To provide a more refined appraisal of the DCA results, each predicted contact is associated with the length of the shortest path SP between the corresponding residues, computed over the contact map of the crystallographic structure see Material and Methods.

In this context, the SP provides a topological measure of the distance between two residues that further characterizes our prediction.

As shown by Burger and van Nimwegen [ 58 ] in the context of mutual information coevolutionary networks, the shortest paths efficiently capture the mediation of coevolutions along chains of residues. We expect many of these seemingly wrong predictions to be correct, due to the definition of native contacts that depends on an arbitrary threshold here 8.

In the contact maps A-C , the lower triangular parts are the structure contacts at threshold 8. In D and F are shown 8 strongly allosteric contacts. In green are contacts that are true in the conformation, in red contacts that are false in the conformation. The shortest paths are taken as the minimum between the corresponding shortest paths in the two states. D Set of 8 strong allosteric contacts in the ATP state. Each histogram refers to the corresponding contact map. Counts are reported in log-scale.

Bins are coloured corresponding to the colour scheme of the contact maps. F Set of 8 strong allosteric contacts in the ATP state. However since DCA analysis is expected to capture contacts present in all the functionally relevant conformers [ 48 , 59 ], the most appropriate strategy is to compare DCA predictions with a contact map corresponding to the union of those relative to single-states.

In the following, we refer to such contacts as allosteric contacts. Furthermore, it must be noted that the predicted allosteric contacts appear early in the score ranking S1 Table , indicating their evolutionary relevance. Among these first six predicted DCA contacts, four display clear electrostatic interactions. We can evaluate the probability that the observed allosteric contacts are the result of random errors by calculating the corresponding p-value see Material and Methods.

The resulting value of 1. A DCA contact map: The lower triangular part is the structure contacts at threshold 8. B Histogram of shortest paths of the predicted DCA contacts. Each monomer is coloured by domains, with the Nucleotide Binding domain in darker shade and the Substrate Binding Domain in lighter tones. As discussed in the introduction, Hsps are eukaryotic remote homologues of Hsp70s that have retained high sequence similarity [ 60 ] and are known to form functional hetero-dimers with Hsp70s with a dimerization pattern extremely similar to that observed in DnaK crystals.

Several arguments can be brought forward to discard this possibility. Furthermore, to exclude the possibility that the observed dimerization pattern is a consequence of the presence of Hsp sequences, we performed a more stringent filtering of our MSA.

To this aim we limit our MSA only to sequences explicitly tagged in Uniprot by the canonical Hsp70 gene names hspa1a , hspa1b , hsp70 , ssa1 and DnaK , resulting in a subset containing sequences. DCA performed on this reduced set see S5 Fig resulted in an overall higher noise level, due to lower statistics.

However, all the six originally predicted dimeric contacts were retained in the reduced set and an additional dimer-compatible contact appeared in the top predictions. We can therefore safely conclude that coevolutionary analysis predicts Hsp70 homo-dimerization with a quaternary arrangement similar to that observed in the HspHsp complex S4 Fig. We further investigate if this feature of the Hsp70 family is equally present in the different domains of life.

To this aim, we performed DCA artificially varying the relative weights of sequences belonging to eukaryotes and prokaryotes in the MSA. Following this approach, we measured the relative strength of the dimeric contacts as the ratio between their average DCA score and that of the original predicted contacts and we report in Fig 4A this quantity as a function of the weight of eukaryotic sequences in the MSA.

The dependence of the TP rate on the same quantity is shown in Fig 4B in order to check if sequence reweighting perturbs the overall quality of the structural predictions. We observe that the relative strength of the dimeric contacts decreases as the eukaryotic weight increases, thus suggesting that Hsp70 homo-dimerization has bacterial origin. This behaviour is observed in a range of relative weights W E in 0. Moving away from this region, the relative weights are too unbalanced in either direction resulting into poorer statistics and less reliable predictions.

All these observations strongly suggest that the predicted homo-dimerization of Hsp70s emerges mainly from bacterial sequences whereas this feature is absent or significantly less conserved in eukaryotes.

In red are the points corresponding to the unweighted cases. The relative contribution W E is then dictated by the relative abundance of Eukaryotic to total sequences. A Average DCA score for the 6 predicted dimer contacts, normalized by the average over the top predictions.

In abscissa is the relative weight of the Eukaryotic sequences in the total alignment. The function of Hsp70s depends on multiple conformational changes. The structure of the NBD varies with the nature of the bound nucleotide, and the conformational changes induced by nucleotide binding or hydrolysis are propagated to the SBD, thus modulating the Hsp70 interactions with the substrate during the chaperones biochemical cycle.

The functional necessity of this orchestrated gymnastic has left a profound footprint in the evolutionary history of these chaperones. It is thus not surprising that sequence analysis methods based on coevolution can effectively provide structural information on all the functional conformers, thanks to the taxonomic breadth of the available Hsp70 sequences.

It is noteworthy that even though DCA has already been used to detect multiple conformers in proteins [ 44 , 48 , 53 , 56 ], the analysis of the Hsp70 family presented here resulted into an unprecedented characterization of a large-scale conformational transition due to the identification of an appreciable fraction of relevant allosteric contacts.

Furthermore, we introduced here a topological measure inspired from graph theory to further characterize the quality of DCA predictions. This measure allows a finer appraisal than the binary true or false classification of contacts based on a hard cut-off. These results suggest that the remaining apparently wrong predictions may actually correspond to yet uncharacterized structural features. In this respect, we observe that DCA predicts a group of 6—7 contacts that are compatible with the interface between the two Hsp70 molecules in the DnaK crystal.

While it has been noted that the two DnaK monomers in the crystal possess an interface reflecting a molecular dimer, the weak propensity for dimerization in vitro questioned the functional relevance of the dimer in the chaperone cycle [ 22 ].

Our results indicate that in the Hsp70 family this dimerization interface is evolutionary conserved in a statistically significant way, thus strongly suggesting an important role for the homo-dimer in the cellular function of Hsp70s. This intriguing hypothesis is actually corroborated by our finding that the co-evolutionary conservation of the dimer interfaces is stronger if bacterial sequences are assigned more statistical weight than eukaryotic ones.

Indeed, because bacterial genomes are likely under evolutionary pressure to remain short [ 61 , 62 ], we make the hypothesis that bacteria cannot afford having too many specialized versions of the same protein. As a consequence Hsp70 monomers in the homo-dimer may have to play the same role that the specialized Hsps perform in the eukaryotic hetero-dimer.

In this work, we based our analysis on the a priori knowledge of the existence of multiple conformers, and the availability of their respective structures. Furthermore, we had at hand a crystallographic homo-dimeric arrangement of two Hsp70 monomers. These data allowed us an in-depth analysis of coevolutionary conservation of structural contacts in DnaK and the prediction of the functional relevance of such a quaternary complex of two bacterial Hsp70s.

In general, the blind prediction of multiple conformations or multi-meric arrangements of proteins based solely on coevolutionary information remains an important and challenging problem, whose solution would greatly improve the predictive capabilities of DCA and other coevolutionary methods to the study of previously uncharacterized protein families.

DCA has already shown its impressive potential in reproducing known structural information both at the single protein and at the protein-protein interaction level. The rapid growth of the number of available protein sequences, combined with the improvement of inference algorithms, foreshadow a near future when the use of evolutionary information will be fully exploited as a powerful predictive tool.

Thanks to its ubiquity and its evolutionary conservation, together with the state-of-the-art Pseudo-Likelihood optimization method, the Hsp70 family offers a glimpse of these opportunities. The added sequences were chosen to cover a wide range of organisms, stemming from different taxonomy see S3 Table. All utilities were run with default parameters. An alternative consisting of reweighting the sequences based on their mutual identity leads to nearly identical results.

As the DCA computation time grows linearly with the number of sequences in the MSA, we chose to filter the sequences by identity rather than reweighting them. Direct Coupling Analysis DCA was performed using the symmetric version of the pseudo-likelihood method described in [ 52 ], which was first introduced in the context of protein contact prediction by Balakrishnan et al.

DCA is based on the use of the maximum entropy principle, constrained to reproduce the observed single- and two-site amino-acid frequencies, leading to a state corresponding to the 20 natural amino acids and the gap state Potts model defined by where S is a sequence in the MSA, s i the amino acid at position i , N the sequence length, and h i and J ij the model parameters to be optimized.

The parameters h i and J ij are efficiently but approximately learned through the numerical optimization of the induced Pseudo-Likelihood with respect to the observed sequences in the MSA [ 52 ].

The use of the approximate Maximum Pseudo-Likelihood, in contrast to full Maximum-Likelihood method, allows avoiding the computation of the intractable full partition function Z.

DCA results come under the form of N x N matrices. Each entry S ij is computed as the Frobenius norm of the local 21x21 coupling matrix J ij and represents the intensity of the evolutionary coupling between residues i and j. An average product correction [ 63 ] is finally applied to correct for entropic effects.

In our analysis, we retained the N top pairs having the highest coupling scores S ij. The optimal regularization parameters of the original method by Ekeberg et al. The top N DCA predictions are compared to the binary contact maps of the available crystal structures. Contact maps are built by considering two residues in contact if the smallest distance between their heavy non-hydrogen atoms is lower than 8.

The DCA predictions are thus aligned to the contact maps of the structures, considering only DCA scores where the crystal structure contains residues. This implies that not all residues in the structures have corresponding positions in the DCA predictions.

Conversely, not all DCA predictions correspond to residues in all structures. In order to assess the quality of predicted contacts, we compute the shortest path between the two residues of the contact. The shortest path SP is computed considering the binary contact map as an adjacency matrix of an unweighted and undirected network. Each residue corresponds to a node in the graph, and a link connects two nodes if the corresponding residues are in contact in the protein structure.

The shortest path between two residues is the smallest number of links in the graph needed to join two nodes. By definition, physical contacts have SP of 1, while higher SPs indicate a higher topological separation between the residues.

The use of the SP analysis helps highlighting the number of intermediary contacts that would be needed to explain an observed DCA prediction. DCA may not be fully capable of disentangling all indirect correlations in the data, and consequently some residual strong co-evolutionary correlations between residues not in contact in the structure might be found in the predictions.



0コメント

  • 1000 / 1000