Can anyone explain a little about Rmerge in x-ray cristallography, e. g, right value, relation with redundancy, etc.

You might want to look at;

1. G J Kleywegt, "Validation of protein crystal structures" (Topical review), Acta Crystallographica, D56, 249-265 (2000).

2. G J Kleywegt, "On vital aid: the why, what and how of validation", Acta Crystallographica, D65, 134-139 (2009).

3. R J Read, P D Adams, W Bryan Arendall III, A T Brunger, P Emsley, R P Joosten, G J Kleywegt, E B Krissinel, T Lütteke, Z Otwinowski, A Perrakis, J S Richardson, W H Sheffler, J L Smith, I J Tickle, G Vriend and P H Zwart, "A new generation of crystallographic validation tools for the Protein Data Bank"

The use of Rmerge can be a thorny subject, just look at the CCP4 bulletin boards (if you do crystallography and are not signed up with CCP4 I would strongly recommend it). Since Rmerge can (in very simple terms) measure the difference between the same reflection but collected at different times ie symmetry related, collect large angular spread of data or from a different crystal (Rsym vs Rmerge which some people interchange between) the redundancy is very important as this shows the number of times the reflection has been collected. The higher the redundancy can sometimes mean a higher Rmerge even though you will have a more precise averaged intensity. As a very broad rule of thumb people Some reviewers) like to use 0.5 as a cut-off for the highest resolution shell.

Of course I/sigI is also very important to look at the signal to noise as well for the data. The main thing is, do the higher resolution terms improve the map?

Charles S Bond

This CCP4bb posting was a good one, and also gives the reference for Phil Evans paper describing the various "R"s: http://www.mail-archive.com/[email protected]/msg04478.html

Felix Frolow

Just a short example (before I will have a time to write longer message on this topic).

Suppose we measure diffraction intensities using single photon counter. More or less so called counting statistics (in the schools where physics is properly studies it is one of the laboratory tasks that require very simple equipment) which imply that measurement of I counts will have standard deviation of sqrt(I) if no background radiation exists. In the presence of a background the relation is more complicated, but for the matter of argument backgroundless counting statistics will do.

Let us say we measure one reflection wich is 100 counts, standard deviation 10 counts (meaning statistical error 10%), redundancy 12 times. Our unweighted Rmerge(sym) will be 10%. Is it good, is it bad, is it right value, is it properly related to redundancy? When have we expect Rmerge/sym of 2%, and what will happened when we will have systematic errors? What will be source of these systematic errors?

Simple question become complicated one, simple answer become complicated one, and this is our life.... After we have diffracting crystals, we have to measure and process our best data! Complicated.....

BTW my best ever data set was 0.67% (less then 1%, fractional 0.0067). It was data collected for a mixture of asparagine-aspartic acide crystals using High Flax Neutron Source in 1983. Needless to save that we were able to distinguish between O, and N and their protonation.

So 10% Rmerge/sym - can be excellent of terrible - depends on circumstances

Salud Rossana

Felix Frolow

On the other hand, if we are playing the same game, 10000 counts will give (without systematic errors) 1% Rmerge/sym. That is what we see in good measurements in low resolutions were reflections are strong.

Boguslaw Stec

For the definition of terms look at the books or web resources mentioned above. However, the entire issue needs to be seen through two lenses. A simple, lay person lens and through an advanced user lens.

In simple lay person perspective the slogans are:

(A) On the data and crystal level:

1) The lower the R(merge or sym), the better (in both small molecules and macromolecules stuff above 10% is not so good)

2) The higher redundancy, the better (above 3 is acceptable)

3) The higher signal to noise ratio I/sigma(I), the better (above 2 in the last resolution shell is widely recommended)

4) The higher the resolution of the data (routinely we collect ~0.7Å for small molecules and ~2Å for proteins), the better (the best protein data stands at ~0.4Å)

5) The higher the completeness of the data, the better (100% is ideal)

(B) On the refinement level

1) The lower the R(factor) and Rfree, the better

2) The smaller the maxima of the difference electron density maps (negative as well as positive), the better

3) The better fit to the density as measured by a direct space R (or the correlation coefficient), the better

4) The more conforming geometrical features to the standard dictionaries (regardless whether the small molecules or macromolecules), the better

5) The more complete the structure (including the solvent), the better

For advanced user every single of the above statements comes with a lot of caveats.

There is certainly no time and space to discuss in depth even the most important caveats but several examples can serve the purpose of alerting medium advanced users about validity of some of the commonly used "rules of thumb".

For instance, (A1) assumes that we are dealing with well formed and macroscopically well-defined "single crystal". Inverted commas mean that we can deal with a lot of situations to which a common understanding of a single crystal does not apply. So powder diffraction of microscopic crystals must definitely border on violating this assumption because the mathematics of the entire classical crystallography is build on assumptions of infinite crystals and powders mean that in individual crystallites there are only a few (depending on the size of the molecules in question) unit cells. So assuming the 10A=1nanometer unit cell a crystallite smaller then 1 micron would have less than 1000 unit cells. So we can safely assume that topological identity of the lattice is destroyed after 1000 repeats. This is not such an abstract situation because typically in the protein crystallography a 100 microns across protein crystal can have less repeats than that. So the smaller number of unit cells and the molecules the higher the error of the measurement of diffracted X-rays or neutrons or electrons. Additionally, Rmerge is becoming a measure of the symmetry ideality. In macromolecular crystallography we know that solvent most likely disobeys the general symmetry considerations.

For instance, (A2) is not a good indicator because it strongly depends on the crystal symmetry. Moreover, one has to assume a perfect data collection protocol from a perfect crystal to really benefit from this recommendation. Normal crystals have defects and inhomogeneities (regardless whether it is a small molecule, mineral sample or the biological molecules). There are always subsections that produce better data, therefore lower redundancy from the better sectors is by definition better.

For instance, (A3) from the statistical point of view this is an absolute recommendation. But from a practical point of view doubling of number of observations lowers your sigma significantly. So in a real world the I/SigI is an arbitrary number that is used a a prop for so called "good data". In reality every good experimenter would strife to collect the best data possible that involves tweaking of a lot of experimental parameters including time of exposure, redundancy dependent on the symmetry, integration box, profile matching, etc. etc.

For instance, (A4) is a deceitful recommendation as with more, and more precise data we are running against handling of more and more details in our models. And models, as one of the giants said, should be simple and comprehensive but not simpler. The same applies to the invocation, but also, not too complex. The general observation I have is that more data in macromolecular crystallography leads to more disorder (not less) and less defined models as the geometrical constraints are somewhat inadequate for closely spaced atoms.

For instance, (A5) appears a universal recommendation but at least two caveats apply. (i) not for all crystal symmetries and orientations it is practical to collect all the data, particularly when it does not lead to any additional physical or biological insights. (ii) more data invokes longer time. This results inexorably in sample damage. Sample damage leads to obscuring the very observations we are seeking. So be judicial in selecting the time, completeness, quality, resolution of your data to fit your goals and needs.

The very similar caveats apply to the model refinement and extracting useful results. Even a simple multiple structure determination of the same object can provide a useful learning experience about validity of the main observations.

Rossana García-Fernández

Thanks for all replys. I really appreciate them.

Rossana

Felix Frolow

Boguslav, In the past (I was part of this past) we have "estimated" X-ray intensities of the diffraction spots comparing them with intensities of calibrated strips of intensities.

The differences between neighboring intensities were sqrt(20 =1.4. This means that we were prone to make a mistake of 40% if we were choosing one of the neighboring intensities. If we would calculate our Rsym (I did it much later post mortem with more modern programs), our Rsym was about 30%. Is it good or bad? It depends what for. We were able to refine our small structures with anisotropic B's to about 5% Rfactor.

And it is clear why. The intensities are characterized by wide dynamic range. We can normalise our intensities to larger increment and decide that we change a continouse dynamic range to 12 values. We will still be able to refine our structure.

So calling larger than 10% "bad" again depends what for. Not always gaining precision we gain accuracy. If our intensities are not strong, Rsym can be as high as 14% and

still be excellent. We have made our way to Science and Nature with such Rsym's sometimes. :-)

¿Where to buy the Aspergillus oryzae NSAR1 strain?

How is the preparation to obtain secondary metabolic extracts?

C3D file, how to extract a vídeo file from it?

Who is the use of Generative Artificial Intelligence in Qualitative Research?

Is there a plugin for weka to read *.sav files from SPSS?

Which term is preferable between Fictional Language, Fictitious Language, Artificial Language, or Constructed Language in fiction?

Can you use the diluted plasma from the Ficoll gradient separation technique for further experiments?

¿Alguien conoce algún cuestionario para evaluar los conocimientos, creencia o competencias sobre el suicidio?

What is the minimum amount of liquid and beads I can use for bead-beating in a 2ml vial?

Can I replace some of the buffers from the first steps of QIAamp Mini kit by DNAshield?

How can I prepare virus for a TEM or SEM imaging?

I need JCPDS file of LSFCO nanomaterial. Can anyone provide me?

How to understand this crystallographic phenomenon of low temperature crystals in zeolite?

Weak DAPI staining after immunohistochemistry - how to improve?

Please explain how the plastic input value should be considered from the true stress-strain curve for the bilinear elastoplastic material model ?

How to fix background error in rietveld refinement of one XRD peak using GSAS-II?

The Curse of Evolution and Complexity?

Is it necessary to covary exogenous constructs in a structural model?

How to calculate the molar proportions of the oxides from the XRF analyses?

Hello, regarding Mxene 2D titanium carbide?