In this tutorial we will
To this end we will use the software CYANA.
For viewing the molecular structures, we will use VMD.
Content
|
Please follow the following steps carefully (exact Linux commands are given below; you may copy them to a terminal):
cd "download path"/cyana-3.98.15
#optional
sudo xattr -rd com.apple.quarantine cyanaexe.macarm-gfortran
./cyana
CYANA 3.98.15 (macarm-gfortran)
Copyright (c) 2002-22 Peter Guentert. All rights reserved.
___________________________________________________________________
Demo license valid for specific sequences until 2024-12-31.
Library file "/Users/julienorts/Downloads/cyana-3.98.15/lib/cyana.lib" read, 41 residue types.
To be able to execute cyana in any directory, one options is to create an alias. For that,
Hint: More information on the CYANA commands etc. is in the CYANA 3.0 Reference Manual.
Remark: CYANA is a proprietary software. For any installation problem, contact Peter Güntert, the author of CYANA.
In real experimental situation, we have never complete set of distances between each pair of protons. The NOE crosspeaks are detectable for distances up to around 5 Angstrom. From these, many signal would share the
same frequency in the spectrum, and thus, assigment between signal and atom (atom pair) can be done only within some group, or not at all. Furthermore, the experimentally-derived distances contain various sources of error.
Partly, it is due to random noise, but partly due to inclompletely resolved relayed transfer and partly due to different (local) dynamics influencing the cross-relaxation rate.
Let's start anyway with the unreallistic situation, where we know all the distances within 5.5 A, accuratelly.
They are written Upper distance Limit files (here PxP.upl) file, which is used by CYANA.
There is no closed-form formula to calculate the conformation (structure) from a set of distances. The setup starts with defining an energy penalty for every experimental distance not fulfilled by the molecular conformation. These are also called distance restraints. Starting from one chosen conformation, and trying to minimize the structure (using steepest descent or other local method) to fulfill the distances measured by NOE (or any other means) would fail: the structure would end-up in a local minimum. Instead we have to search for a global minimum. A commonly used algorithm for a global minimum is called simulated annealing, where the molecule is heated up such that high energy barriers (due to van-der Waals clashes) can be surpassed. By a subsequent cooling, the imposed distance restraints will drive the molecule towards the conformation with minimal violation of the distance restraints. Many attempts will nevertheless end up in different local minima, and hence, only a subset of resulting conformers, the lowest-energy conformers will be likely to represent the global minimum.
In practice, we have to input the knowledge about the covalent (bonding) structure of the molecule, and the distance restraints. The bonding structure can be as simple as the chain of amino acids, as the standard programs would have libraries of the actual atomic bonding (topology) for those. For an unknown molecule, we have to supply a full topology ourselves. These would be different for different programs.
We will use a program specialized in structure calculation from NMR restraints: CYANA by Prof. Dr. Peter Günter.
CYANA can obtain the bonding topology from a .mol2 molecular structure file, converting it into its own (library) format, a .lib file. This library file will contain information about one molecule, but since biopolymers - proteins contain chain (sequence) of building blocks like amino acid residues of nucleotides, there has to be also information about the sequence of those building blocks. In our case, it contains only one line: the name and the index of our ligand molecule.
It used to be common to only simply classify the experimental NOE intensities to strong, medium and weak, and assign corresponding distance ranges of around 2 Angsrom for the strongest and 5.5 to the weakest.
031
Here we use a technique, which would be a starting point for more accurate methods. It uses spectra recorded for different mixing times. As the NOESY spectrum of a protein can take days to record, recording series of them is a large investment. In that the NOE crosspeak volume (intensity) is divided by a geometric mean of the corresponding diagonal volumes. The slope is a better approximation of the crossrelaxation rate than if this "linearization" or "normalization" was not done. In the case of two isolated nuclei, such buildup curve is approximately linear over longer mixing times.
Here we probe three effortless options to obtain a better set of distances. The first comes from an idea, that short distances are much less likely to be affected by relayed transfer (spin diffusion), so we try to keep only those, within 2.5 Angstrom.
In the next simple attempt, we multiply all the distances by a constant factor (1.75) and use again those within 5.5A.
Next, we can think that the number of (inaccurate) constraints is simply too large. Commonly we would expect up to
aroound 10 constraints per amimoacid residue. In the previous example, it was still around 12800/97 > 100 restraints
per residue. Here we try to use, only every 10th restraints.
.
We will investigate the effect of exchange of the ligand between the free state in the solution and bound to the protein.
The NOESY spectrum of the ligand (the intramolecular NOEs) would be formed as a population average of the bound and free form. In the free form, cross-relaxation rate is commonly negative, whereas in the bound form, it is positive, the same as the protein. Moreover, it is commonly much larger (absolute value).
What is even more important is the population weighting of the intermolecular cross-rates (hence also NOEs)
The "Re weighted" matrix contains includes the correction, whereas the "weight averaged over P, L, PL" does not.
We have exact distances (assume we are able to obtain them), but we ignore the populations, so the intermolecular calibration is wrong.
Here the populations are take into account correctly.
Download the zip file for the workshop: workshopData.zip
Open the simulation_data.xlsx in part1/task1_to_3 directory. These are taken as the experimental data for this task. We will start with the structure calculation of the ligand (drug-like) molecule. From the NOESY spectra, we obtain the cross-relaxation rates using the intensity (volume) of the cross peaks at a known (experimentally set) mixing time.These are directly related to the interatomic distances, which will determine the conformation of the molecule.
The NOESY spectrum contain also diagonal cross-peaks. These correspond to the (non-equilibrium Z-) magnetisation decay of the spin-themselves: autorelaxation.
Rii = ρi = b2/dij6 (J(0) + 3J(ω) + 6J(2ω)) #contribution of one neighbor spin "j" in distance "d".
And the cross-relaxation rate between H-spins "i" and "j" is:
Rij = σij = b2/dij6 (-J(0) + 6J(2ω)),
where "b" is the dipole-dipole interaction strength, and "J(ω)" is the spectral density at angular frequency "ω". The spectral density J is the Fourier transformation of the rotational correlation function, and it shows the distribution of frequencies of the molecular rotational motion.
The initial rate of the NOESY cross-peak, is directly proportional to 1/distance6 between the respective spins (protons). The other dependency comes from the rotational correlation time of the molecule, which is dependent on temperature, solvent viscosity, solvation shell of the molecule, shape of the molecule, and in the case of a small molecule partly bound to a larger protein, the effective correlation time is also modulated by this partial bounding - the chemical exchange. Therefore, it is practical to leave these dependencies aside, and calibrate the relation between the NOE buildup rate (cross-relaxation rate) and the interproton distance using a known distance. Fortunately, there are many proton pairs in the molecule with fixed distance, simply due to the covalent structure.
For the reference pair of protons. We have now also the reference sigma σRef.
Use the formula
rij = rRef * (σRef/σij)^(1/6). [Eq. 1] Vogeli 2014, Eq. 63b
to calculate the other distances between other atoms (in a separate column).
There is no closed-form formula to calculate the conformation (structure) from a set of distances. The setup starts with defining an energy penalty for every experimental distance not fulfilled by the molecular conformation. These are also called distance restraints. Starting from one chosen conformation, and trying to minimize the structure (using steepest descent or other local method) to fulfill the distances measured by NOE (or any other means) would fail: the structure would end-up in a local minimum. Instead we have to search for a global minimum. A commonly used algorithm for a global minimum is called simulated annealing, where the molecule is heated up such that high energy barriers (due to van-der Waals clashes) can be surpassed. By a subsequent cooling, the imposed distance restraints will drive the molecule towards the conformation with minimal violation of the distance restraints. Many attempts will nevertheless end up in different local minima, and hence, only a subset of resulting conformers, the lowest-energy conformers will be likely to represent the global minimum.
In practice, we have to input the knowledge about the covalent (bonding) structure of the molecule, and the distance restraints. The bonding structure can be as simple as the chain of amino acids, as the standard programs would have libraries of the actual atomic bonding (topology) for those. For an unknown molecule, we have to supply a full topology ourselves. These would be different for different programs.
We will use a program specialized in structure calculation from NMR restraints: CYANA by Prof. Dr. Peter Günter.
CYANA can obtain the bonding topology from a .mol2 molecular structure file, converting it into its own (library) format, a .lib file. This library file will contain information about one molecule, but since biopolymers - proteins contain chain (sequence) of building blocks like amino acid residues of nucleotides, there has to be also information about the sequence of those building blocks. In our case, it contains only one line: the name and the index of our ligand molecule.
We will use a ready mol.lib file in our exercise. Besides of the physical atoms H, there are also pseudoatoms Q created to replace the chemically equivalent H atoms. We complete the information by a .seq file with a "sequence" containing only one line : the name and number of the residue (MOL 999).
The other information: the distance restraints are obtained by CYANA from a separate text file, where the pairs of atoms are identified by three and three columns, and the distance in Ångströms.
ResidueNumber1 residueName1 atomName1 residueNumber2 residueName2 atomName2 distance.
In our first calculation of a single molecule (residue), obviously only the atomName1 and atomName2 would be different. Further instructions for CYANA are read from the .cya file.
Note that the atom numbers and names have to exactly match the MOL.lib file.
Note also, that the distance has to be in a numerical format using "dot" as a decimal separator and not a "comma"!
In a little while, the calculation is ready.
We have now the structure file: demo.pdb
and the overview file about the calculation: demo.ovw
From the theoretical introduction about NOE, we know that the existence of cross-peak between two spins does not have to be caused by the direct through-space transfer of magnetisation between them. Instead, magnetisation transfer via a third nucleus can occur. This is called spin diffusion.
Compare the structures, check the .ovw file.
Instead of the first mixing time in Task 1,
choose the last mixing time and proceed all the way to calculate the structure in CYANA.
Note the differences in extracted distances and in the resulting structures.
In this short exercise, we will calculate the protein structure using ready distances stored in the final_protein.upl file. We do not need any extra library file, as this time, the sequence file (demoShort.seq) contains only standard
amino acid residues.
Look at the structures using chimera molecular viewer.
cat final_protein.upl mol.upl intermolecular.upl > complex.upl
Note! if using VMD as a molecular viewer, it will refuse to recognize large parts of the secondary structure! Or in other words, the resulting protein-ligand structure has the secondary structure further away from the standard definition.
The calibration of the distances from the measured NOE intensities, using a known fixed distance, can be inaccurate due to several reasons. In this exercise, we correct some of them in two steps. First, the NOE intensities will be normalized using the diagonal intensities (their geometric mean) corresponding to the NOE cross-peak. This corrects for
This normalization is not final for those peaks stemming from multiplet(s) N1 x N2 of equivalent spin groups 1 and 2. In such cases, the average normalized buildup (-->σi,j) is obtained by further multiplication by (N1 x N2)^(1/2) /(N1 x N2). (See the full formula below.)
The step above makes the cross-relaxation rates σi,j in a correct mutual proportion. The actual values can be still inaccurate due to inaccurate reference distance, rRef in Eq. 1, or rather σRef being not correctly proportional to the rRef due to spin diffusion or other effects. It is therefore recommendable to correct the median of the measured distances such that it corresponds to the distances conserved among these organic molecules: around 4.2 Å for intramolecular distances and 4.4 Å for the intermolecular distances. When correcting the derived distances, we include a constant to multiply the distances in order to obtain the desired median.
There is an important technical detail about how the CYANA handles the sites with multiple spins - such as -CH3, with three spins with the same chemical shift. These three spins combined into a pseudoatom Q, but, the distance between this atom and another atom is not the average but instead a shorter distance, calculated from the three-fold intensity of the NOE cross-peak. Similarly for cross-peak between two methyl groups, the cross-peak would be 9x larger than expected from one atom, and it will be translated to a very short distance expected by CYANA.
In the previous parts, the protein-ligand complex had a low dissociation constant of 0.1 nM. With the concentrations given, this means, that there is almost no free ligand and no free protein. In this exercise, we assume that the dissociation constant is 500µM, equal to the concentration of the components. The fraction of the free ligand will have much higher mobility, and therefore, the autorelaxation rate will be smaller. The cross-relaxation rate will be also strongly affected, particularly the intermolecular NOE, which will build up only for the fraction staying in the complex. On the other hand, relaxation rate of the protein will remain very similar, with some exceptions near the binding site, but those will not be analyzed here.
NMR2 runs via the platform SAMSON.
Further workshop contributors:
Colin Schmoll contributed by providing the simulated data (.xlsx sheets) and plots, Dr. Jiří Mareš assembled part of the text and workshop materials. (https://bionmr.univie.ac.at/people/)