dials.cosym

Introduction

This program implements the methods of Gildea, R. J. & Winter, G. (2018). Acta Cryst. D74, 405-410 for determination of Patterson group symmetry from sparse multi-crystal data sets in the presence of an indexing ambiguity.

The program takes as input a set of integrated experiments and reflections, either in one file per experiment, or with all experiments combined in a single models.expt and observations.refl file. It will perform analysis of the symmetry elements present in the datasets and, if necessary, reindex experiments and reflections as necessary to ensure that all output experiments and reflections are indexed consistently.

Examples:

dials.cosym models.expt observations.refl

dials.cosym models.expt observations.refl space_group=I23

dials.cosym models.expt observations.refl space_group=I23 lattice_group=I23

Basic parameters

partiality_threshold = 0.99
unit_cell_clustering {
  threshold = 5000
  log = False
}
normalisation = kernel quasi *ml_iso ml_aniso
d_min = Auto
min_i_mean_over_sigma_mean = 4
min_cc_half = 0.6
lattice_group = None
space_group = None
lattice_symmetry_max_delta = 5.0
best_monoclinic_beta = True
dimensions = Auto
use_curvatures = True
weights = count standard_error
min_pairs = 3
termination_params {
  max_iterations = 100
  max_calls = None
  traditional_convergence_test = True
  traditional_convergence_test_eps = 1
  drop_convergence_test_n_test_points = 5
  drop_convergence_test_max_drop_eps = 1.e-5
  drop_convergence_test_iteration_coefficient = 2
}
cluster {
  method = dbscan bisect minimize_divide agglomerative *seed
  n_clusters = auto
  dbscan {
    eps = 0.5
    min_samples = 5
  }
  bisect {
    axis = 0
  }
  seed {
    min_silhouette_score = 0.2
  }
}
nproc = 1
relative_length_tolerance = 0.05
absolute_angle_tolerance = 2
min_reflections = 10
seed = 230
output {
  suffix = "_reindexed"
  log = dials.cosym.log
  experiments = "symmetrized.expt"
  reflections = "symmetrized.refl"
  json = dials.cosym.json
  html = dials.cosym.html
}

Example

Run dials.cosym, providing the integrated experiment (.expt) and reflection (.refl) files as input:

dials.cosym experiments_0.expt experiments_1.expt experiments_2.expt experiments_3.expt reflections_0.refl reflections_1.refl reflections_2.refl reflections_3.refl

The first step is to analyse the metric symmetry of the input unit cells, and perform hierarchical unit cell clustering to identify any outlier datasets that aren’t consistent with the rest of the datasets. The largest common cluster is carried forward in the analysis. You can modify the threshold that is used for determining outliers by setting the unit_cell_clustering.threshold parameter.

Hierarchical clustering of unit cells
Using Andrews-Bernstein distance from Andrews & Bernstein J Appl Cryst 47:346 (2014)
Distances have been calculated
                       Unit cell: (68.3603, 68.3603, 103.953, 90, 90, 90)

0 singletons:

Point group    a           b           c          alpha        beta         gamma       

1 cluster:

Cluster_id       N_xtals  Med_a         Med_b         Med_c         Med_alpha    Med_beta     Med_gamma   Delta(deg)
4 in P422.
cluster_1        4        68.36 (0.01 ) 68.36 (0.01 ) 103.95(0.02 ) 90.00 (0.00) 90.00 (0.00) 90.00 (0.00)
     P 4/m m m (No. 123)  68.36         68.36         103.95        90.00        90.00        90.00         0.0   


Standard deviations are in brackets.
Each cluster:
Input lattice count, with integration Bravais setting space group.
Cluster median with Niggli cell parameters (std dev in brackets).
Highest possible metric symmetry and unit cell using LePage (J Appl Cryst 1982, 15:255) method, maximum delta 3deg.

In this case, the unit cell analysis found 1 cluster of 4 datasets in \(P\,4\,2\,2\). As a result, all datasets will be carried forward for symmetry analysis.

Each dataset is then normalised using maximum likelihood isotropic Wilson scaling, reporting an estimated Wilson \(B\) value and scale factor for each dataset:

Normalising intensities for dataset 1

ML estimate of overall B value:
   13.46 A**2
ML estimate of  -log of scale factor:
  -3.04

--------------------------------------------------------------------------------

Normalising intensities for dataset 2

ML estimate of overall B value:
   11.06 A**2
ML estimate of  -log of scale factor:
  -3.50

--------------------------------------------------------------------------------

Normalising intensities for dataset 3

ML estimate of overall B value:
   11.45 A**2
ML estimate of  -log of scale factor:
  -2.96

--------------------------------------------------------------------------------

Normalising intensities for dataset 4

ML estimate of overall B value:
   12.14 A**2
ML estimate of  -log of scale factor:
  -2.67

A high resolution cutoff is then determined by analysis of CC½ and <I>/<σ(I)> as a function of resolution:

Estimation of resolution for Laue group analysis

Resolution estimate from <I>/<σ(I)> > 4.0 : 2.12
Resolution estimate from CC½ > 0.60: 1.83
High resolution limit set to: 1.83
Selecting 188527 reflections with d > 1.83

Next, the program performs automatic determination of the number of dimensions for analysis. This calculates the functional of equation 2 of Gildea, R. J. & Winter, G. (2018) for each dimension from 2 up to the number of symmetry operations in the lattice group. The analysis needs to use sufficient dimensions to be able to separate any indexing ambiguities that may be present, but using too many dimensions reduces the sensitivity of the procedure. In this case, it is determined that 3 dimensions will be used for the analysis:

Automatic determination of number of dimensions for analysis
+--------------+--------------+
|   Dimensions |   Functional |
|--------------+--------------|
|            1 |      14.5337 |
|            2 |      15.676  |
|            3 |      14.6061 |
|            4 |      15.0468 |
|            5 |      14.8961 |
|            6 |      14.739  |
|            7 |      14.8166 |
|            8 |      14.83   |
+--------------+--------------+
Best number of dimensions: 3

Once the analysis has been performed in the appropriate number of dimensions, the results are analysed to score all possible symmetry elements, using algorithms similar to those of POINTLESS, using the fact that the angles between vectors represent genuine systematic differences between datasets, as made clear by equation 5 of Diederichs, K. (2017).

Scoring individual symmetry elements
+--------------+--------+------+-----+-----------------+
|   likelihood |   Z-CC |   CC |     | Operator        |
|--------------+--------+------+-----+-----------------|
|        0.944 |   9.88 | 0.99 | *** | 2 |(-1, 1, 0)   |
|        0.942 |   9.82 | 0.98 | *** | 4^-1 |(0, 0, 1) |
|        0.941 |   9.82 | 0.98 | *** | 4 |(0, 0, 1)    |
|        0.945 |   9.9  | 0.99 | *** | 2 |(0, 0, 1)    |
|        0.944 |   9.87 | 0.99 | *** | 2 |(1, 1, 0)    |
|        0.943 |   9.85 | 0.98 | *** | 2 |(0, 1, 0)    |
|        0.941 |   9.81 | 0.98 | *** | 2 |(1, 0, 0)    |
+--------------+--------+------+-----+-----------------+

Scores for the possible Laue groups are obtained by analysing the scores for the symmetry elements that are present or absent from each group, and the groups are ranked by their likelihood.

Scoring all possible sub-groups
+-------------------+-----+--------------+----------+--------+--------+---------+--------------------+
| Patterson group   |     |   Likelihood |   NetZcc |   Zcc+ |   Zcc- |   delta | Reindex operator   |
|-------------------+-----+--------------+----------+--------+--------+---------+--------------------|
| P 4/m m m         | *** |            1 |     9.85 |   9.85 |   0    |       0 | -a,b,-c            |
| C m m m           |     |            0 |     0.06 |   9.88 |   9.82 |       0 | a+b,-a+b,c         |
| P m m m           |     |            0 |     0.01 |   9.85 |   9.85 |       0 | -a,b,-c            |
| P 4/m             |     |            0 |    -0    |   9.85 |   9.85 |       0 | -a,b,-c            |
| P 1 2/m 1         |     |            0 |     0.06 |   9.9  |   9.84 |       0 | b,c,a              |
| C 1 2/m 1         |     |            0 |     0.03 |   9.88 |   9.84 |       0 | a+b,-a+b,c         |
| C 1 2/m 1         |     |            0 |     0.03 |   9.87 |   9.85 |       0 | a-b,a+b,c          |
| P 1 2/m 1         |     |            0 |    -0    |   9.85 |   9.85 |       0 | -a,b,-c            |
| P 1 2/m 1         |     |            0 |    -0.05 |   9.81 |   9.86 |       0 | -b,-a,-c           |
| P -1              |     |            0 |    -9.85 |   0    |   9.85 |       0 | -a,b,-c            |
+-------------------+-----+--------------+----------+--------+--------+---------+--------------------+
Best solution: P 4/m m m
Unit cell: (68.3603, 68.3603, 103.953, 90, 90, 90)
Reindex operator: -a,b,-c
Laue group probability: 1.000
Laue group confidence: 1.000

The program then concludes by reporting any reindexing operations that are necessary to ensure consistent indexing between datasets. In this case, no indexing ambiguity is present, so the reindexing operator is simply the identity operator for all datasets.

Reindexing operators:
x,y,z
[0, 1, 2, 3]

The correctly reindexed experiments and reflections are then saved to file, along with a HTML report:

Writing html report to: dials.cosym.html
Writing json to: dials.cosym.json
Saving reindexed experiments to symmetrized.expt
Saving reindexed reflections to symmetrized.refl

The full log file can be viewed here:

Show/Hide Log

DIALS 3.dev.232-ga04c60b29
The following parameters have been modified:

input {
  experiments = experiments_0.expt
  experiments = experiments_1.expt
  experiments = experiments_2.expt
  experiments = experiments_3.expt
  reflections = reflections_0.refl
  reflections = reflections_1.refl
  reflections = reflections_2.refl
  reflections = reflections_3.refl
}

Hierarchical clustering of unit cells
Using Andrews-Bernstein distance from Andrews & Bernstein J Appl Cryst 47:346 (2014)
Distances have been calculated
                       Unit cell: (68.3603, 68.3603, 103.953, 90, 90, 90)

0 singletons:

Point group    a           b           c          alpha        beta         gamma       

1 cluster:

Cluster_id       N_xtals  Med_a         Med_b         Med_c         Med_alpha    Med_beta     Med_gamma   Delta(deg)
4 in P422.
cluster_1        4        68.36 (0.01 ) 68.36 (0.01 ) 103.95(0.02 ) 90.00 (0.00) 90.00 (0.00) 90.00 (0.00)
     P 4/m m m (No. 123)  68.36         68.36         103.95        90.00        90.00        90.00         0.0   


Standard deviations are in brackets.
Each cluster:
Input lattice count, with integration Bravais setting space group.
Cluster median with Niggli cell parameters (std dev in brackets).
Highest possible metric symmetry and unit cell using LePage (J Appl Cryst 1982, 15:255) method, maximum delta 3deg.
Filtering reflections for dataset 0
Read 76079 predicted reflections
Selected 54367 reflections integrated by profile and summation methods
Combined 1127 partial reflections with other partial reflections
Removed 491 reflections below partiality threshold
Removed 0 intensity.sum.value reflections with I/Sig(I) < -5
Removed 12 intensity.prf.value reflections with I/Sig(I) < -5
Filtering reflections for dataset 1
Read 75607 predicted reflections
Selected 54845 reflections integrated by profile and summation methods
Combined 1284 partial reflections with other partial reflections
Removed 554 reflections below partiality threshold
Removed 0 intensity.sum.value reflections with I/Sig(I) < -5
Removed 10 intensity.prf.value reflections with I/Sig(I) < -5
Filtering reflections for dataset 2
Read 77983 predicted reflections
Selected 54461 reflections integrated by profile and summation methods
Combined 1404 partial reflections with other partial reflections
Removed 541 reflections below partiality threshold
Removed 0 intensity.sum.value reflections with I/Sig(I) < -5
Removed 7 intensity.prf.value reflections with I/Sig(I) < -5
Filtering reflections for dataset 3
Read 76468 predicted reflections
Selected 53877 reflections integrated by profile and summation methods
Combined 1062 partial reflections with other partial reflections
Removed 514 reflections below partiality threshold
Removed 0 intensity.sum.value reflections with I/Sig(I) < -5
Removed 3 intensity.prf.value reflections with I/Sig(I) < -5
Patterson group: P 4/m m m

--------------------------------------------------------------------------------

Normalising intensities for dataset 1

ML estimate of overall B value:
   13.46 A**2
ML estimate of  -log of scale factor:
  -3.04

--------------------------------------------------------------------------------

Normalising intensities for dataset 2

ML estimate of overall B value:
   11.06 A**2
ML estimate of  -log of scale factor:
  -3.50

--------------------------------------------------------------------------------

Normalising intensities for dataset 3

ML estimate of overall B value:
   11.45 A**2
ML estimate of  -log of scale factor:
  -2.96

--------------------------------------------------------------------------------

Normalising intensities for dataset 4

ML estimate of overall B value:
   12.14 A**2
ML estimate of  -log of scale factor:
  -2.67

--------------------------------------------------------------------------------

Estimation of resolution for Laue group analysis

Resolution estimate from <I>/<σ(I)> > 4.0 : 2.12
Resolution estimate from CC½ > 0.60: 1.83
High resolution limit set to: 1.83
Selecting 188527 reflections with d > 1.83
================================================================================

Automatic determination of number of dimensions for analysis
+--------------+--------------+
|   Dimensions |   Functional |
|--------------+--------------|
|            1 |      14.5337 |
|            2 |      15.676  |
|            3 |      14.6061 |
|            4 |      15.0468 |
|            5 |      14.8961 |
|            6 |      14.739  |
|            7 |      14.8166 |
|            8 |      14.83   |
+--------------+--------------+
Best number of dimensions: 3
Using 3 dimensions for analysis
Principal component analysis:
Explained variance: 0.0082, 0.0052, 8.2e-05
Explained variance ratio: 0.61, 0.39, 0.0061
Scoring individual symmetry elements
+--------------+--------+------+-----+-----------------+
|   likelihood |   Z-CC |   CC |     | Operator        |
|--------------+--------+------+-----+-----------------|
|        0.944 |   9.88 | 0.99 | *** | 2 |(-1, 1, 0)   |
|        0.942 |   9.82 | 0.98 | *** | 4^-1 |(0, 0, 1) |
|        0.941 |   9.82 | 0.98 | *** | 4 |(0, 0, 1)    |
|        0.945 |   9.9  | 0.99 | *** | 2 |(0, 0, 1)    |
|        0.944 |   9.87 | 0.99 | *** | 2 |(1, 1, 0)    |
|        0.943 |   9.85 | 0.98 | *** | 2 |(0, 1, 0)    |
|        0.941 |   9.81 | 0.98 | *** | 2 |(1, 0, 0)    |
+--------------+--------+------+-----+-----------------+
Scoring all possible sub-groups
+-------------------+-----+--------------+----------+--------+--------+---------+--------------------+
| Patterson group   |     |   Likelihood |   NetZcc |   Zcc+ |   Zcc- |   delta | Reindex operator   |
|-------------------+-----+--------------+----------+--------+--------+---------+--------------------|
| P 4/m m m         | *** |            1 |     9.85 |   9.85 |   0    |       0 | -a,b,-c            |
| C m m m           |     |            0 |     0.06 |   9.88 |   9.82 |       0 | a+b,-a+b,c         |
| P m m m           |     |            0 |     0.01 |   9.85 |   9.85 |       0 | -a,b,-c            |
| P 4/m             |     |            0 |    -0    |   9.85 |   9.85 |       0 | -a,b,-c            |
| P 1 2/m 1         |     |            0 |     0.06 |   9.9  |   9.84 |       0 | b,c,a              |
| C 1 2/m 1         |     |            0 |     0.03 |   9.88 |   9.84 |       0 | a+b,-a+b,c         |
| C 1 2/m 1         |     |            0 |     0.03 |   9.87 |   9.85 |       0 | a-b,a+b,c          |
| P 1 2/m 1         |     |            0 |    -0    |   9.85 |   9.85 |       0 | -a,b,-c            |
| P 1 2/m 1         |     |            0 |    -0.05 |   9.81 |   9.86 |       0 | -b,-a,-c           |
| P -1              |     |            0 |    -9.85 |   0    |   9.85 |       0 | -a,b,-c            |
+-------------------+-----+--------------+----------+--------+--------+---------+--------------------+
Best solution: P 4/m m m
Unit cell: (68.3603, 68.3603, 103.953, 90, 90, 90)
Reindex operator: -a,b,-c
Laue group probability: 1.000
Laue group confidence: 1.000
Space groups:
P 4 2 2
[0, 1, 2, 3]
Reindexing operators:
x,y,z
[0, 1, 2, 3]
Writing html report to: dials.cosym.html
Writing json to: dials.cosym.json
Saving reindexed experiments to symmetrized.expt
Saving reindexed reflections to symmetrized.refl

Full parameter definitions

partiality_threshold = 0.99
  .help = "Use reflections with a partiality above the threshold."
  .type = float(allow_none=True)
unit_cell_clustering {
  threshold = 5000
    .help = "Threshold value for the clustering"
    .type = float(value_min=0, allow_none=True)
  log = False
    .help = "Display the dendrogram with a log scale"
    .type = bool
}
normalisation = kernel quasi *ml_iso ml_aniso
  .type = choice
d_min = Auto
  .type = float(value_min=0, allow_none=True)
min_i_mean_over_sigma_mean = 4
  .type = float(value_min=0, allow_none=True)
min_cc_half = 0.6
  .type = float(value_min=0, value_max=1, allow_none=True)
lattice_group = None
  .type = space_group
space_group = None
  .type = space_group
lattice_symmetry_max_delta = 5.0
  .type = float(value_min=0, allow_none=True)
best_monoclinic_beta = True
  .help = "If True, then for monoclinic centered cells, I2 will be preferred"
          "over C2 if it gives a more oblique cell (i.e. smaller beta angle)."
  .type = bool
dimensions = Auto
  .type = int(value_min=2, allow_none=True)
use_curvatures = True
  .type = bool
weights = count standard_error
  .type = choice
min_pairs = 3
  .help = "Minimum number of pairs for inclusion of correlation coefficient in"
          "calculation of Rij matrix."
  .type = int(value_min=1, allow_none=True)
termination_params {
  max_iterations = 100
    .type = int(value_min=0, allow_none=True)
  max_calls = None
    .type = int(value_min=0, allow_none=True)
  traditional_convergence_test = True
    .type = bool
  traditional_convergence_test_eps = 1
    .type = float(allow_none=True)
  drop_convergence_test_n_test_points = 5
    .type = int(value_min=2, allow_none=True)
  drop_convergence_test_max_drop_eps = 1.e-5
    .type = float(value_min=0, allow_none=True)
  drop_convergence_test_iteration_coefficient = 2
    .type = float(value_min=1, allow_none=True)
}
cluster {
  method = dbscan bisect minimize_divide agglomerative *seed
    .type = choice
  n_clusters = auto
    .type = int(value_min=1, allow_none=True)
  dbscan {
    eps = 0.5
      .type = float(value_min=0, allow_none=True)
    min_samples = 5
      .type = int(value_min=1, allow_none=True)
  }
  bisect {
    axis = 0
      .type = int(value_min=0, allow_none=True)
  }
  seed {
    min_silhouette_score = 0.2
      .type = float(value_min=-1, value_max=1, allow_none=True)
  }
}
nproc = 1
  .help = "The number of processes to use."
  .type = int(value_min=1, allow_none=True)
relative_length_tolerance = 0.05
  .type = float(value_min=0, allow_none=True)
absolute_angle_tolerance = 2
  .type = float(value_min=0, allow_none=True)
min_reflections = 10
  .help = "The minimum number of reflections per experiment."
  .type = int(value_min=1, allow_none=True)
seed = 230
  .type = int(value_min=0, allow_none=True)
output {
  suffix = "_reindexed"
    .type = str
  log = dials.cosym.log
    .type = str
  experiments = "symmetrized.expt"
    .type = path
  reflections = "symmetrized.refl"
    .type = path
  json = dials.cosym.json
    .type = path
  html = dials.cosym.html
    .type = path
}