Merging reflection intensities in xia2 — when a WAVELENGTH is not a wavelength
Many crystallographers who use DIALS to integrate their rotation data don’t actually interact directly with the DIALS commands.
Instead, they often use xia2 to co-ordinate the different components of the DIALS pipeline automatically.
Here, we explore the concept of WAVELENGTH in xia2 and discover that it doesn’t necessarily mean what it says.
In fact, it turns out simply to be a label denoting a group of sweeps that should be merged together.
We also learn how to manually assign sweeps of images to such groups, when we want to control how reflection intensities should be merged.
By exploring what WAVELENGTH is, we will also learn a little about the key features of xia2’s logical model of a diffraction experiment.
The structure of a project in xia2
To use the terminology of the .xinfo file, which forms the instruction list for a data processing job, xia2 organises data by PROJECT, CRYSTAL, WAVELENGTH and SWEEP.
Each xia2 run consists of a PROJECT, which has one or more CRYSTALs, each of which may have one or more WAVELENGTHs, each of which in turn may have one or more SWEEPs.
This much is detailed in the xia2 paper.
What isn’t so clear from the paper, is that integrated data are scaled per CRYSTAL and merged per WAVELENGTH.
This probably won’t surprise you if you do multiple-wavelength studies like MAD, but may not be so obvious if you don’t have that background.
If your work typically involves a single sample measured at a single wavelength, perhaps with multiple sweeps, the distinction (scaling by sample but merging by X-ray wavelength) may have passed you by.
Exploiting WAVELENGTH to customise merging
There are really two different ways of running xia2:
- The standard way.
Simply pass in data and allow xia2 (through
xia2setup.py) to automatically construct anautomatic.xinfoinstruction list for integration and reduction of the data. ThePROJECT(namedAUTOMATIC, by default) will have a singleCRYSTAL(namedDEFAULT, by default). Each sweep of images will be assigned to aSWEEP(by default, namedSWEEP1–SWEEP<n>for n sweeps). The image headers will be interrogated to ascertain the set of unique measurement wavelengths used, according to some tolerance for determining equivalence. TheSWEEPs will be grouped by wavelength, each group being assigned, unsurprisingly, to a separateWAVELENGTH(by default, if there is only oneWAVELENGTH, it is calledNATIVE, unless you have told xia2 that the data are anomalous, in which case it is calledSAD, or if there are N > 1 wavelengths, they are namedWAVE1–WAVE<N>). There is an implied equivalence here between the labelWAVELENGTHand the physical wavelength at which all the constituentSWEEPs were measured. - The I-know-what-I’m-doing-get-out-of-my way.
Write or edit a
.xinfofile yourself. This allows yourPROJECTto have multipleCRYSTALs and also allows you to apportionSWEEPs to differentWAVELENGTHlabels, irrespective of the actual physical wavelength used to collect the data. For example, this means that you can retain a distinction between different groups of sweeps that you wish to merge group-by-group, rather than all together. You can retain this distinction even though all the data may have been collected using the same physical X-ray wavelength. As such,WAVELENGTHbecomes a misnomer, it is really just a categorisation meaning ‘group ofSWEEPs to be merged together’.
Method (1) is pretty familiar if you’ve ever used xia2.
But you may not have been aware that the .xinfo file allowed for method (2).
In fact, if you knew about it at all, you might have thought the .xinfo instruction list was simply a legacy feature and was just part of the plumbing of method (1).
That’s understandable.
Beyond the description here, method (2) is not well documented nor sufficiently well tested. In fact, at the time of writing, it is actually a bit broken for the standard all-DIALS pipeline, see xia2/xia2#560.
Method (2) affords a great deal of flexibility for data reduction in less straightforward experiments.
For example, you could specify that several sweeps of images collected from the same sample should be indexed together with the command line parameter multi_sweep_indexing=True but, by separating the SWEEPs into different WAVELENGTH groups in the .xinfo file, keep the reflection intensities in each group separate for the purposes of merging.
There exists a tool xia2.setup, which imports the image data and constructs a .xinfo file from normal xia2 input, without performing any processing.
Surely that’s quite complicated enough?
As an aside, just to confuse the terminology further, CCP4 uses a logical model of a diffraction experiment with a slightly simpler hierarchy, PROJECT▸CRYSTAL▸DATASET, where DATASET combines the concepts of WAVELENGTH and SWEEP.
In order to support the CCP4 scaling and merging tools in xia2, this model creeps in from time to time, such as here:
https://github.com/xia2/xia2/blob/c8e4b8fc6fd6f70b881f3ac55c85d71d0994b334/src/xia2/Schema/Interfaces/Integrater.py#L357-L364