Co-reporter:Manasi A. Pethe, Aliza B. Rubenstein, Sagar D. Khare
Journal of Molecular Biology (20 January 2017) Volume 429(Issue 2) pp:220-236
Publication Date(Web):20 January 2017
DOI:10.1016/j.jmb.2016.11.031
•Develop a general, structure-based approach for predicting protease substrate specificity using Rosetta and AMBER MMPBSA.•Recapitulate known protease specificity profiles with accuracy comparable to sequence-only methods.•Combining sequence and structure energy features using machine learning helps increase discrimination performance.•Validated approach experimentally in yeast cells.•Discovered novel sequence specificities for HCV NS3 4A protease using our computational approach.Characterizing the substrate specificity of protease enzymes is critical for illuminating the molecular basis of their diverse and complex roles in a wide array of biological processes. Rapid and accurate prediction of their extended substrate specificity would also aid in the design of custom proteases capable of selectively and controllably cleaving biotechnologically or therapeutically relevant targets. However, current in silico approaches for protease specificity prediction, rely on, and are therefore limited by, machine learning of sequence patterns in known experimental data. Here, we describe a general approach for predicting peptidase substrates de novo using protein structure modeling and biophysical evaluation of enzyme–substrate complexes. We construct atomic resolution models of thousands of candidate substrate–enzyme complexes for each of five model proteases belonging to the four major protease mechanistic classes—serine, cysteine, aspartyl, and metallo-proteases—and develop a discriminatory scoring function using enzyme design modules from Rosetta and AMBER's MMPBSA. We rank putative substrates based on calculated interaction energy with a modeled near-attack conformation of the enzyme active site. We show that the energetic patterns obtained from these simulations can be used to robustly rank and classify known cleaved and uncleaved peptides and that these structural-energetic patterns have greater discriminatory power compared to purely sequence-based statistical inference. Combining sequence and energetic patterns using machine-learning algorithms further improves classification performance, and analysis of structural models provides physical insight into the structural basis for the observed specificities. We further tested the predictive capability of the model by designing and experimentally characterizing the cleavage of four novel substrate motifs for the hepatitis C virus NS3/4 protease using an in vivo assay. The presented structure-based approach is generalizable to other protease enzymes with known or modeled structures, and complements existing experimental methods for specificity determination.Download high-res image (200KB)Download full-size image
Co-reporter:Sagar D. Khare, Sarel J. Fleishman
FEBS Letters (17 April 2013) Volume 587(Issue 8) pp:1147-1154
Publication Date(Web):17 April 2013
DOI:10.1016/j.febslet.2012.12.009
Recent years have seen the first applications of computational protein design to generate novel catalysts, binding pairs of proteins, protein inhibitors, and large oligomeric assemblies. At their core these methods rely on a similar hybrid energy function, composed of physics-based and database-derived terms, while different sequence and conformational sampling approaches are used for each design category. Although these are first steps for the computational design of novel function, crystal structures and biochemical characterization already point out where success and failure are likely in the application of protein design. Contrasting failed and successful design attempts has been used to diagnose deficiencies in the approaches and in the underlying hybrid energy function. In this manner, design provides an inherent mechanism by which crucial information is obtained on pressing areas where focused efforts to improve methods are needed. Of the successful designs, many feature pre-organized sites that are poised to perform their intended function, and improvements often result from disfavoring alternative functionally suboptimal states. These rapid developments and fundamental insights obtained thus far promise to make computational design of novel molecular function general, robust, and routine.