This documentation page refers to a previous release of DIALS (2.2).

# dials.cosym¶

## Introduction¶

This program implements the methods of Gildea, R. J. & Winter, G. (2018). Acta Cryst. D74, 405-410 for determination of Patterson group symmetry from sparse multi-crystal data sets in the presence of an indexing ambiguity.

The program takes as input a set of integrated experiments and reflections, either in one file per experiment, or with all experiments combined in a single models.expt and observations.refl file. It will perform analysis of the symmetry elements present in the datasets and, if necessary, reindex experiments and reflections as necessary to ensure that all output experiments and reflections are indexed consistently.

Examples:

dials.cosym models.expt observations.refl

dials.cosym models.expt observations.refl space_group=I23

dials.cosym models.expt observations.refl space_group=I23 lattice_group=I23


## Basic parameters¶

partiality_threshold = 0.99
unit_cell_clustering {
threshold = 5000
log = False
}
normalisation = kernel quasi *ml_iso ml_aniso
d_min = Auto
min_i_mean_over_sigma_mean = 4
min_cc_half = 0.6
lattice_group = None
space_group = None
lattice_symmetry_max_delta = 5.0
dimensions = Auto
use_curvatures = True
weights = count standard_error
min_pairs = 3
termination_params {
max_iterations = 100
max_calls = None
drop_convergence_test_n_test_points = 5
drop_convergence_test_max_drop_eps = 1.e-5
drop_convergence_test_iteration_coefficient = 2
}
cluster {
method = dbscan bisect minimize_divide agglomerative *seed
n_clusters = auto
dbscan {
eps = 0.5
min_samples = 5
}
bisect {
axis = 0
}
seed {
min_silhouette_score = 0.2
}
}
nproc = 1
relative_length_tolerance = 0.05
absolute_angle_tolerance = 2
min_reflections = 10
seed = 230
output {
suffix = "_reindexed"
log = dials.cosym.log
experiments = "symmetrized.expt"
reflections = "symmetrized.refl"
json = dials.cosym.json
html = dials.cosym.html
}


## Example¶

Run dials.cosym, providing the integrated experiment (.expt) and reflection (.refl) files as input:

dials.cosym experiments_0.expt experiments_1.expt experiments_2.expt experiments_3.expt reflections_0.refl reflections_1.refl reflections_2.refl reflections_3.refl


The first step is to analyse the metric symmetry of the input unit cells, and perform hierarchical unit cell clustering to identify any outlier datasets that aren’t consistent with the rest of the datasets. The largest common cluster is carried forward in the analysis. You can modify the threshold that is used for determining outliers by setting the unit_cell_clustering.threshold parameter.

Hierarchical clustering of unit cells
Using Andrews-Bernstein distance from Andrews & Bernstein J Appl Cryst 47:346 (2014)
Distances have been calculated
Unit cell: (68.3603, 68.3603, 103.953, 90, 90, 90)

0 singletons:

Point group    a           b           c          alpha        beta         gamma

1 cluster:

Cluster_id       N_xtals  Med_a         Med_b         Med_c         Med_alpha    Med_beta     Med_gamma   Delta(deg)
4 in P422.
cluster_1        4        68.36 (0.01 ) 68.36 (0.01 ) 103.95(0.02 ) 90.00 (0.00) 90.00 (0.00) 90.00 (0.00)
P 4/m m m (No. 123)  68.36         68.36         103.95        90.00        90.00        90.00         0.0

Standard deviations are in brackets.
Each cluster:
Input lattice count, with integration Bravais setting space group.
Cluster median with Niggli cell parameters (std dev in brackets).
Highest possible metric symmetry and unit cell using LePage (J Appl Cryst 1982, 15:255) method, maximum delta 3deg.


In this case, the unit cell analysis found 1 cluster of 4 datasets in $$P\,4\,2\,2$$. As a result, all datasets will be carried forward for symmetry analysis.

Each dataset is then normalised using maximum likelihood isotropic Wilson scaling, reporting an estimated Wilson $$B$$ value and scale factor for each dataset:

Normalising intensities for dataset 1

ML estimate of overall B value:
13.46 A**2
ML estimate of  -log of scale factor:
-3.04

--------------------------------------------------------------------------------

Normalising intensities for dataset 2

ML estimate of overall B value:
11.06 A**2
ML estimate of  -log of scale factor:
-3.50

--------------------------------------------------------------------------------

Normalising intensities for dataset 3

ML estimate of overall B value:
11.45 A**2
ML estimate of  -log of scale factor:
-2.96

--------------------------------------------------------------------------------

Normalising intensities for dataset 4

ML estimate of overall B value:
12.14 A**2
ML estimate of  -log of scale factor:
-2.67


A high resolution cutoff is then determined by analysis of CC½ and <I>/<σ(I)> as a function of resolution:

Estimation of resolution for Laue group analysis

Resolution estimate from <I>/<σ(I)> > 4.0 : 2.10
Resolution estimate from CC½ > 0.60: 1.80
High resolution limit set to: 1.80
Selecting 188689 reflections with d > 1.80


Next, the program performs automatic determination of the number of dimensions for analysis. This calculates the functional of equation 2 of Gildea, R. J. & Winter, G. (2018) for each dimension from 2 up to the number of symmetry operations in the lattice group. The analysis needs to use sufficient dimensions to be able to separate any indexing ambiguities that may be present, but using too many dimensions reduces the sensitivity of the procedure. In this case, it is determined that 3 dimensions will be used for the analysis:

Automatic determination of number of dimensions for analysis
+--------------+--------------+
|   Dimensions |   Functional |
|--------------+--------------|
|            1 |      14.5345 |
|            2 |      15.6716 |
|            3 |      14.608  |
|            4 |      15.0478 |
|            5 |      15.2777 |
|            6 |      14.7404 |
|            7 |      14.8183 |
|            8 |      14.8374 |
+--------------+--------------+
Best number of dimensions: 6


Once the analysis has been performed in the appropriate number of dimensions, the results are analysed to score all possible symmetry elements, using algorithms similar to those of POINTLESS, using the fact that the angles between vectors represent genuine systematic differences between datasets, as made clear by equation 5 of Diederichs, K. (2017).

Scoring individual symmetry elements
+--------------+--------+------+-----+-----------------+
|   likelihood |   Z-CC |   CC |     | Operator        |
|--------------+--------+------+-----+-----------------|
|        0.947 |   9.93 | 0.99 | *** | 2 |(1, 1, 0)    |
|        0.946 |   9.92 | 0.99 | *** | 2 |(-1, 1, 0)   |
|        0.947 |   9.93 | 0.99 | *** | 2 |(0, 1, 0)    |
|        0.946 |   9.91 | 0.99 | *** | 4^-1 |(0, 0, 1) |
|        0.947 |   9.93 | 0.99 | *** | 4 |(0, 0, 1)    |
|        0.946 |   9.92 | 0.99 | *** | 2 |(0, 0, 1)    |
|        0.946 |   9.92 | 0.99 | *** | 2 |(1, 0, 0)    |
+--------------+--------+------+-----+-----------------+


Scores for the possible Laue groups are obtained by analysing the scores for the symmetry elements that are present or absent from each group, and the groups are ranked by their likelihood.

Scoring all possible sub-groups
+-------------------+-----+--------------+----------+--------+--------+---------+--------------------+
| Patterson group   |     |   Likelihood |   NetZcc |   Zcc+ |   Zcc- |   delta | Reindex operator   |
|-------------------+-----+--------------+----------+--------+--------+---------+--------------------|
| P 4/m m m         | *** |            1 |     9.92 |   9.92 |   0    |       0 | -a,b,-c            |
| P m m m           |     |            0 |     0    |   9.93 |   9.92 |       0 | -a,b,-c            |
| C m m m           |     |            0 |     0    |   9.92 |   9.92 |       0 | a-b,a+b,c          |
| P 4/m             |     |            0 |    -0.01 |   9.92 |   9.93 |       0 | -a,b,-c            |
| C 1 2/m 1         |     |            0 |     0.01 |   9.93 |   9.92 |       0 | a-b,a+b,c          |
| P 1 2/m 1         |     |            0 |     0.01 |   9.93 |   9.92 |       0 | -a,b,-c            |
| P 1 2/m 1         |     |            0 |    -0    |   9.92 |   9.92 |       0 | -b,-a,-c           |
| P 1 2/m 1         |     |            0 |    -0    |   9.92 |   9.93 |       0 | -a,-c,-b           |
| C 1 2/m 1         |     |            0 |    -0.01 |   9.92 |   9.93 |       0 | a+b,-a+b,c         |
| P -1              |     |            0 |    -9.92 |   0    |   9.92 |       0 | -a,b,-c            |
+-------------------+-----+--------------+----------+--------+--------+---------+--------------------+
Best solution: P 4/m m m
Unit cell: (68.3603, 68.3603, 103.953, 90, 90, 90)
Reindex operator: -a,b,-c
Laue group probability: 1.000
Laue group confidence: 1.000


The program then concludes by reporting any reindexing operations that are necessary to ensure consistent indexing between datasets. In this case, no indexing ambiguity is present, so the reindexing operator is simply the identity operator for all datasets.

Reindexing operators:
x,y,z
[0, 1, 2, 3]


The correctly reindexed experiments and reflections are then saved to file, along with a HTML report:

Writing html report to: dials.cosym.html
Writing json to: dials.cosym.json
Saving reindexed experiments to symmetrized.expt
Saving reindexed reflections to symmetrized.refl


The full log file can be viewed here:

Show/Hide Log
DIALS 2.2.9-g061426e04-release
The following parameters have been modified:

input {
experiments = experiments_0.expt
experiments = experiments_1.expt
experiments = experiments_2.expt
experiments = experiments_3.expt
reflections = reflections_0.refl
reflections = reflections_1.refl
reflections = reflections_2.refl
reflections = reflections_3.refl
}

Hierarchical clustering of unit cells
Using Andrews-Bernstein distance from Andrews & Bernstein J Appl Cryst 47:346 (2014)
Distances have been calculated
Unit cell: (68.3603, 68.3603, 103.953, 90, 90, 90)

0 singletons:

Point group    a           b           c          alpha        beta         gamma

1 cluster:

Cluster_id       N_xtals  Med_a         Med_b         Med_c         Med_alpha    Med_beta     Med_gamma   Delta(deg)
4 in P422.
cluster_1        4        68.36 (0.01 ) 68.36 (0.01 ) 103.95(0.02 ) 90.00 (0.00) 90.00 (0.00) 90.00 (0.00)
P 4/m m m (No. 123)  68.36         68.36         103.95        90.00        90.00        90.00         0.0

Standard deviations are in brackets.
Each cluster:
Input lattice count, with integration Bravais setting space group.
Cluster median with Niggli cell parameters (std dev in brackets).
Highest possible metric symmetry and unit cell using LePage (J Appl Cryst 1982, 15:255) method, maximum delta 3deg.
Filtering reflections for dataset 0
Selected 54367 reflections integrated by profile and summation methods
Combined 1127 partial reflections with other partial reflections
Removed 491 reflections below partiality threshold
Removed 0 intensity.sum.value reflections with I/Sig(I) < -5
Removed 12 intensity.prf.value reflections with I/Sig(I) < -5
Filtering reflections for dataset 1
Selected 54845 reflections integrated by profile and summation methods
Combined 1284 partial reflections with other partial reflections
Removed 554 reflections below partiality threshold
Removed 0 intensity.sum.value reflections with I/Sig(I) < -5
Removed 10 intensity.prf.value reflections with I/Sig(I) < -5
Filtering reflections for dataset 2
Selected 54461 reflections integrated by profile and summation methods
Combined 1404 partial reflections with other partial reflections
Removed 541 reflections below partiality threshold
Removed 0 intensity.sum.value reflections with I/Sig(I) < -5
Removed 7 intensity.prf.value reflections with I/Sig(I) < -5
Filtering reflections for dataset 3
Selected 53877 reflections integrated by profile and summation methods
Combined 1062 partial reflections with other partial reflections
Removed 514 reflections below partiality threshold
Removed 0 intensity.sum.value reflections with I/Sig(I) < -5
Removed 3 intensity.prf.value reflections with I/Sig(I) < -5
Patterson group: P 4/m m m

--------------------------------------------------------------------------------

Normalising intensities for dataset 1

ML estimate of overall B value:
13.46 A**2
ML estimate of  -log of scale factor:
-3.04

--------------------------------------------------------------------------------

Normalising intensities for dataset 2

ML estimate of overall B value:
11.06 A**2
ML estimate of  -log of scale factor:
-3.50

--------------------------------------------------------------------------------

Normalising intensities for dataset 3

ML estimate of overall B value:
11.45 A**2
ML estimate of  -log of scale factor:
-2.96

--------------------------------------------------------------------------------

Normalising intensities for dataset 4

ML estimate of overall B value:
12.14 A**2
ML estimate of  -log of scale factor:
-2.67

--------------------------------------------------------------------------------

Estimation of resolution for Laue group analysis

Resolution estimate from <I>/<σ(I)> > 4.0 : 2.10
Resolution estimate from CC½ > 0.60: 1.80
High resolution limit set to: 1.80
Selecting 188689 reflections with d > 1.80
================================================================================

Automatic determination of number of dimensions for analysis
+--------------+--------------+
|   Dimensions |   Functional |
|--------------+--------------|
|            1 |      14.5345 |
|            2 |      15.6716 |
|            3 |      14.608  |
|            4 |      15.0478 |
|            5 |      15.2777 |
|            6 |      14.7404 |
|            7 |      14.8183 |
|            8 |      14.8374 |
+--------------+--------------+
Best number of dimensions: 6
Using 6 dimensions for analysis
Principal component analysis:
Explained variance: 0.0022, 0.0016, 0.0013, 0.0011, 0.00094, 7.3e-05
Explained variance ratio: 0.31, 0.22, 0.18, 0.15, 0.13, 0.01
Scoring individual symmetry elements
+--------------+--------+------+-----+-----------------+
|   likelihood |   Z-CC |   CC |     | Operator        |
|--------------+--------+------+-----+-----------------|
|        0.947 |   9.93 | 0.99 | *** | 2 |(1, 1, 0)    |
|        0.946 |   9.92 | 0.99 | *** | 2 |(-1, 1, 0)   |
|        0.947 |   9.93 | 0.99 | *** | 2 |(0, 1, 0)    |
|        0.946 |   9.91 | 0.99 | *** | 4^-1 |(0, 0, 1) |
|        0.947 |   9.93 | 0.99 | *** | 4 |(0, 0, 1)    |
|        0.946 |   9.92 | 0.99 | *** | 2 |(0, 0, 1)    |
|        0.946 |   9.92 | 0.99 | *** | 2 |(1, 0, 0)    |
+--------------+--------+------+-----+-----------------+
Scoring all possible sub-groups
+-------------------+-----+--------------+----------+--------+--------+---------+--------------------+
| Patterson group   |     |   Likelihood |   NetZcc |   Zcc+ |   Zcc- |   delta | Reindex operator   |
|-------------------+-----+--------------+----------+--------+--------+---------+--------------------|
| P 4/m m m         | *** |            1 |     9.92 |   9.92 |   0    |       0 | -a,b,-c            |
| P m m m           |     |            0 |     0    |   9.93 |   9.92 |       0 | -a,b,-c            |
| C m m m           |     |            0 |     0    |   9.92 |   9.92 |       0 | a-b,a+b,c          |
| P 4/m             |     |            0 |    -0.01 |   9.92 |   9.93 |       0 | -a,b,-c            |
| C 1 2/m 1         |     |            0 |     0.01 |   9.93 |   9.92 |       0 | a-b,a+b,c          |
| P 1 2/m 1         |     |            0 |     0.01 |   9.93 |   9.92 |       0 | -a,b,-c            |
| P 1 2/m 1         |     |            0 |    -0    |   9.92 |   9.92 |       0 | -b,-a,-c           |
| P 1 2/m 1         |     |            0 |    -0    |   9.92 |   9.93 |       0 | -a,-c,-b           |
| C 1 2/m 1         |     |            0 |    -0.01 |   9.92 |   9.93 |       0 | a+b,-a+b,c         |
| P -1              |     |            0 |    -9.92 |   0    |   9.92 |       0 | -a,b,-c            |
+-------------------+-----+--------------+----------+--------+--------+---------+--------------------+
Best solution: P 4/m m m
Unit cell: (68.3603, 68.3603, 103.953, 90, 90, 90)
Reindex operator: -a,b,-c
Laue group probability: 1.000
Laue group confidence: 1.000
Space groups:
P 4 2 2
[0, 1, 2, 3]
Reindexing operators:
x,y,z
[0, 1, 2, 3]
Writing html report to: dials.cosym.html
Writing json to: dials.cosym.json
Saving reindexed experiments to symmetrized.expt
Saving reindexed reflections to symmetrized.refl


## Full parameter definitions¶

partiality_threshold = 0.99
.help = "Use reflections with a partiality above the threshold."
.type = float(allow_none=True)
unit_cell_clustering {
threshold = 5000
.help = "Threshold value for the clustering"
.type = float(value_min=0, allow_none=True)
log = False
.help = "Display the dendrogram with a log scale"
.type = bool
}
normalisation = kernel quasi *ml_iso ml_aniso
.type = choice
d_min = Auto
.type = float(value_min=0, allow_none=True)
min_i_mean_over_sigma_mean = 4
.type = float(value_min=0, allow_none=True)
min_cc_half = 0.6
.type = float(value_min=0, value_max=1, allow_none=True)
lattice_group = None
.type = space_group
space_group = None
.type = space_group
lattice_symmetry_max_delta = 5.0
.type = float(value_min=0, allow_none=True)
dimensions = Auto
.type = int(value_min=2, allow_none=True)
use_curvatures = True
.type = bool
weights = count standard_error
.type = choice
min_pairs = 3
.help = "Minimum number of pairs for inclusion of correlation coefficient in"
"calculation of Rij matrix."
.type = int(value_min=1, allow_none=True)
termination_params {
max_iterations = 100
.type = int(value_min=0, allow_none=True)
max_calls = None
.type = int(value_min=0, allow_none=True)
.type = bool
.type = float(allow_none=True)
drop_convergence_test_n_test_points = 5
.type = int(value_min=2, allow_none=True)
drop_convergence_test_max_drop_eps = 1.e-5
.type = float(value_min=0, allow_none=True)
drop_convergence_test_iteration_coefficient = 2
.type = float(value_min=1, allow_none=True)
}
cluster {
method = dbscan bisect minimize_divide agglomerative *seed
.type = choice
n_clusters = auto
.type = int(value_min=1, allow_none=True)
dbscan {
eps = 0.5
.type = float(value_min=0, allow_none=True)
min_samples = 5
.type = int(value_min=1, allow_none=True)
}
bisect {
axis = 0
.type = int(value_min=0, allow_none=True)
}
seed {
min_silhouette_score = 0.2
.type = float(value_min=-1, value_max=1, allow_none=True)
}
}
nproc = 1
.help = "The number of processes to use."
.type = int(value_min=1, allow_none=True)
relative_length_tolerance = 0.05
.type = float(value_min=0, allow_none=True)
absolute_angle_tolerance = 2
.type = float(value_min=0, allow_none=True)
min_reflections = 10
.help = "The minimum number of reflections per experiment."
.type = int(value_min=1, allow_none=True)
seed = 230
.type = int(value_min=0, allow_none=True)
output {
suffix = "_reindexed"
.type = str
log = dials.cosym.log
.type = str
experiments = "symmetrized.expt"
.type = path
reflections = "symmetrized.refl"
.type = path
json = dials.cosym.json
.type = path
html = dials.cosym.html
.type = path
}