DIALS Meeting 2021-05-12

Agenda

Experiment identifiers - Graeme’s proposed changes
CCP4 APS workshop (mid June) - in CDT timezone (UTC-5), (BST-6)
dials.scale PRs adding bulk absorption corrections parameters
CBFlib and pycbf
Discussions

Dials Scaling Absorption Corrections

James discussed how DIALS doesn’t do much for absorption correction by default. There are weight parameters in place but somewhat complicated for users. Wanted to add some sensible “low/medium/high” high-level configuration options.
Have been added to dials.scale and xia2 in dials/dials#1688 and xia2/xia2#592, but not making them turned on by default yet.
Everyone was asked if they were generally happy having in this sort of broad percentage setting?
A discussion on what the use case for “Medium” correction would be - people seemed generally satisfied that it was worthwhile
Putting in an automatic helper e.g. in xia2 is more complicated, not planning to get in for this release, only a user option for now
Richard asks if we should set it automatically if requesting anomalous scaling - James agreed that it would be worth looking at, but we should spend some time using it first, so leaving off by default.

CCP4 APS Workshop

CCP4 is running a workshop at APS week of 14th June(?). Graeme is giving several talks - wants to know if David has any more details from inside CCP4.
Configuration is possibly more challenging than other workshops - Argonne security means that cannot run workstations for people to NX into, discussions about deploying and using machines on the cloud.
Graeme asks if several people are available for assistance during the practical data sessions afternoon of Wednesday 16th June - raised as possibly useful for James/Elena
They start 8am (CST/UTC-5) - 14:00 Local time (UTC+1/BST)
Nick: When running BAG?
- Ask in beamlines slack channel - no response

Experiment Identifiers

Graeme has been delving into our experiment and ExperimentList handling as part of dials/dials#1694 and found it intersecting again with the proposals described in dials/dials#1029.
Prepared slides for discussion…

Experiment identifier not list position
                           ┌──────┐
     ┌──────┐   ┌──────┐┌─▶│      │
     │      │◀──┤      ├┘  ├──────┤
     ├──────┤   ├──────┤┌─▶│      │
     │      │◀──┤      ├┘  └──────┘
     ├──────┤   ├──────┤             Nope!
     │      │◀──┤      ├┐  ┌──────┐
     ├──────┤   ├──────┤└─▶│      │
     │      │◀──┤      ├┐  ├──────┤
     └──────┘   └──────┘└─▶│      │
                           └──────┘
       Refl       Expt     Parallel
                              Expt

We use the id column in reflection table identifiers as corresponding to experiment position in list
This is the worst possible way - cannot split into subsets as now the numbering is wrong if you decide to only refine one experiment out of list of 4 - cannot
Moving to ID as an internal reflection table value, with a corresponding internal map to identifiers was the original motivation - but never got fully implemented - for convenience, existing experiments were labelled as their index, and this was then relied upon
When doing work in dials/dials#1694 it was pointed out that this could be considered Yet Another Identifier
Richard pointed out that identifiers were supposed to help with bookkeeping problems with multicrystal/multiplex - but never really clarified how it was supposed to work
Ben noted that the intent has been here before - not a new idea, still good intent - but the difficulty is that plumbing is torturous
Graeme estimates the effort levels to fix this are ~150 places in the code. With a couple of days and couple of people could be done as a development push
Zeroth order task is coming up with pattern to iterate over experiment list - efficient way of coming up with reflections that map to identifier

    Create new experiment, not overwrite

      Import                Index
     ┌──────┐              ┌──────┐
     │      │        ╔════▶│      │
     └──────┘        ║     └──────┘
╔════════════════╗   ║
║                ╠═══╝
║    ┌──────┐    ║         ┌──────┐
║    │      │    ║         │      │
║    └──────┘    ║         └──────┘
║            Didn't
╚════════════index,
              Gone!

AKA dials/dials#1029
When we import experiments and assign reflections - we currently throw away lots of information about where a reflection came from - if we import multiple experiments then index - these are now gone! If we index, then then improve geometry to point where unindexed reflections could be indexed - this information is now gone!
Issues with data model breaking because no guaranteed mapping between reflections and experiments
Believe that we should create new experiments with crystals in for indexing, without overwriting or discarding the existing experiment
Just need to filter out experiments without lattices in downstream processing
We can’t just have experiments with no crystal because of multiple lattices - each lattice needs to be in a separate experiment
When asked if there were any objections to this philosophy, there was broad ~~silence~~agreement
The question was raised if there would ever be a reason to go beyond two layers - the danger of it becoming a tree. Graeme elaborated that the intention was that it might not even need a pointer to the parent - would be derivable through shared models

   Create new Experiment, not overwrite

 Import
┌──────┐         ┌──────┐
│      ├───┬─┬──▶│      │  Experiment still
└──────┘   │ │   └──────┘       exists
┌──────┐   │ │   ┌──────┐    -> Spots not
│      ├────────▶│      │      orphaned
└──────┘   │ │   └──────┘
           │ │   ┌──────┐
           │ └──▶│      │
           │     └──────┘
           │     ┌──────┐
           └────▶│      │
                 └──────┘

Point of principle: No orphaned reflections

Better data container - HDF5

  /expt/...
  /refl/...
  
Create @ spotfind, integrate
Update Otherwise

If we kept all of our data in HDF5 files - reflections and Experiments - we could just read data you needed to read - would not need to read all of it - shoeboxes being the main example here
Makes threat of problems from exploding our experiment count with the previous changes go away
Some discussion about whether using HDF5 would put it at risk of being co-opted or coerced into strict NxMX compliance. Although there are some potential areas where we could overlap, deficiencies in the standard and things that we use that aren’t represented (e.g. shoeboxes, scan-varying models) would mean that it wouldn’t really be suitable for our purposes

            Reflections with Experiments

                        ┌──────┐
          import───────▶│ expt │
                        └───┬──┘ ┌─────▶refine
             ┌──────────────┘    │       │
             ▼             ┌─────┴┐      │
        find_spots────────▶│ rflx │◀─────┘
                           └──────┘
                            │ ▲ │
           index ◀──────────┘ │ │
             │                │ │
             └────────────────┘ │
                       ┌────────┘
                       ▼          ┌──────┐
                 integrate───────▶│ rflx │
                                  └──────┘

If we have reflections and data in HDF5 format, can put in same file
Could give serious thought to not making copies of data files, only updating.
Some discussion over whether this would work as an option or be the only way to interact. It was discussed as a user story, whether users would end up just saving _1, _2, _final copies of the data files
Some discussions about how this would affect reproduceability, and what user tooling would be required to make this manageable
The topic of external data (e.g. masks) was discussed
Two points where we create these files: Spotfinding and integration

Would reduce working disk space

┌───────────────────────┬───────────────────────┐
│                       │                       │
│                       │                       │
│      Reflection/      │.─.     Create         │
│     Expt Identity         )  Experiments      │
│                       │`─'                    │
│         .───.         │         .───.         │
│        (     )        │        (     )        │
├─────────`   '─────────┼─────────`   '─────────┤
│                       │                       │
│                       │                       │
│      Update not       │.─.    New data        │
│     create files          )   container       │
│                       │`─'                    │
│                       │                       │
│                       │                       │
└───────────────────────┴───────────────────────┘

Four things going on! All interconnected!
A short discussion was had over how much work would be involved in doing this. Graeme thought that it would take four people a few days.
Nick thought that was optimistic, but that it might be worth spending that time as a task force - even if it failed, we would have a much better handle on the problem

DIALS // knowledgebase

community resources, howtos, meeting minutes

DIALS Meeting 2021-05-12

Agenda

Discussions

Dials Scaling Absorption Corrections

CCP4 APS Workshop

Experiment Identifiers