DIALS core meeting 2020-12-09
Previous Actions
- MG+ND: write a proposal for ‘overall architecture discussion’ and ‘new installer’
- MG+ND: Come up with a proposal to move away from all code being in header files and consolidate into a single library
- ND: to review xia2 DC1 (#528)
- ND: update
CONTRIBUTING
in a (separate) pull request to include a soft-imposition of stable master, dials#1353. - MG: organise a typing intro lecture.
- cbflib conda-forge package
- ND to set up an agenda for a code-review meeting
- then: ASB to set up a meeting with HJB, BKP, ASB, ND
- ASB to explore what should happen to cbflib maintenance in the future
- ND to set up a
cbflib_adaptbx
pull request for MAR detector fixes. Verify withlabelit_regression
.
Agenda
This meeting instance is ring-fenced to discuss pull requests with the word ‘range’ and/or ‘lazy’ in the subject, and performance issues baked into dxtbx.
Definition of trusted_range
Discuss PR Definition of trusted_range
(dxtbx#182)
image_range
vs. array_range
baked in off-by-one errors
Discuss image_range
vs. array_range
baked in off-by-one errors (dxtbx#186)
- DW made a proposal and started work on that, but it proved messy
- There is a new, simpler, proposal but this version kills off the
image_range
concept rather thanarray_range
- Discuss, again, what we want here and whether the new proposal is useful
dxtbx json/msgpack performance
ASB reports json being problematic when importing e.g. 50,000 experiments, as everything is stored in a single file, so everything needs to be read at once.
- GW: cf. Load before heat death (dxtbx#118). At the core of it are bad design choices in dxtbx.
- Future discussion topics:
- treatment of large serial datasets
- using an HDF5 backend for reflection files
- dials#1407 specific proposal
- ASB: I want
check_format
to go away.
Previous discussion outcomes:
- ASB:
- General consensus that this is needed.
- Make models and store them
- Load them in parallel with minimal memory requirements
- Version all models and rewind them
- This needs a lot of thought and at this time we don’t have agreement here.
- Not need to copy data like shoeboxes when they aren’t being changed, operate by reference
GW dump of thoughts on above
Long list of pertinent (and unresolved) issues indicates we probably have some rather fundamental work to do here. This will boil down to essentially a dx2bx which would have some features listed below.
- dxtbx#219 - Eiger reading unnecessarily slow
- dxtbx#222 - get raw data should return native data types
- dxtbx#187 - remove the idea of check_format
- dxtbx#186 - baked in off-by-one errors
- dxtbx#159 - record format class in experiments
- dials#1238 - experiments contain references to pickle files
- dials#1407 - reflections and models in single (reflex) file
- dxtbx#118 - [merged] load before heat death
- dials#1014 - [merged] import still image sequences as M experiments
- dials#1153 - spot counts per image does weird things with images starting from 0
- dxtbx#141 - [merged] handle slicing when images start from 0
- dials#1104 - eliminate legacy imageset_id in reflection tables
- dials#1023 - images should do nothing but get raw data
- dials#845 - sensitivity to mask types?
- dials#615 - rationalise experiment id assumptions
- dials#504 - dials import no work well for large numbers of images
- dxtbx#13 - dxtbx assumes you have no more than one scan in a file
Smaller issues which could still be fixed “properly”
- dxtbx#255 - mark classes as abstract
dx2bx features:
- not constrained by need to be back-compatible with dxtbx → dials 4.x series by implication
- only read the metadata on import
- use C++ file readers for the experiment data (e.g. CBF, HDF5, mmap → TIFF-like, SMV)
- Format classes become factory functions → create the experiment models, which are saved and used henceforth
- faster, random access, non-text container for experiment models (and ideally reflection data)
- dxtbx project ‘Design limitations’ ← fix all of these
This is likely to be a big body of work but I think is on the critical path for what we need to do.
Discussion
Features of dxtbx we want to preserve in a re-write:
- Most of the models (beam, detector, crystal, gonio, scan)
- The registry
- In-memory representation of data (like what we get from MemImageSet). Supports streaming.
Things that should go away
- Imageset/imagesweep
- Lazy will be unnecessary because models are random access and read on demand
- Datablocks
- Detectorbase
check_format
(implicit in random access/read on demand)
New features that are desired
- Retain more details of the goniometer stack
- Proper definition of scan
- ImageSetData, Reader
- numpy back end as option?
- would likely enable pybind11
- Match the crystal B matrix convention to IUCr convention.
get_image_size
is fast/slow but get raw data is slow/fast- Count everything from zero
- Array dimensions are in C order
- Remove magic from option parser
- Assume filenames are “sensible” i.e.
.nxs
are nexus files, etc. - Assume
.expt
is experiments,.refl
is reflections - Formats have list of supported filename extensions :thinking_face:
- Define ‘half object’ conventions (pixel coordinates, U matrix rotations)
- Fast deserialization
Conclusion of discussion: consensus was to take this forward to a project proposal to active collaborators, with an explicit rename such that dxtbx continues along it’s existing path for non-DIALS users.
Deferred Agenda Items
Aaron’s PR Requests
Aaron requested that a slew of pull requests be reviewed by the next meeting after 2020-11-25:
dxtbx:
- Allow certain classes to be labeled as ‘abstract’ (#255)
- Deprecate h5rawdata and use
Format.ignore()
in the Registry (#261) - Make
FormatHDF5SaclaMPCCD
lazy (#227)
Dials:
- Add
is_stills
parameter to the spotfinder API to allowdials.stills_process
to work with scans (#1508) - Redo test to support new h5 file parsing behavior in dxtbx (#1499)
- Multiple stills view (#1463)
- dials.import: individually select models from reference expt (#1371)
renaming master
branch → main
not discussed in this meeting
- Nothing much will happen on this any time soon - either way we’ll wait for GitHub support on this one
- General assent that this is useful
- since 1st of October new GitHub repositories have
main
as the default - proposed outcome: Agree to migrate to
main
. Turn this into dxtbx/dials/xia2 issues and move once GitHub provides a migration path. Use xia2 as a test-bed and move dials and dxtbx two weeks later.
DIALS documentation build
not discussed in this meeting
Should the DIALS documentation build outside a cctbx environment?
- Currently you can only build the DIALS documentation inside a cctbx environment. This means a remote documentation build needs to build cctbx first, so we can’t easily run the documentation build on every commit. When things break they are difficult to fix. We can’t use readthedocs.
- MG: I believe the main obstacles are three cctbx Sphinx plugins that may need to be extracted or removed from DIALS documentation, and some phil parsing logic.
- MG: the cctbx conda-forge package may be useful here
move the active dxtbx
repository into the DIALS organisation
not discussed in this meeting
Having the main repository in cctbx_project has a few distinct disadvantages. The most annoying is the eternal fight against the lockdown bot, but we also don’t get LGTM output.
- proposed outcome: move the active
dxtbx
repository into the DIALS organisation. Do it in such a way that the impact on existing installations, issues, pull requests is minimal.
dials.index: create new experiments
not discussed in this meeting
dials.index should (or should not) create new experiments with crystals in rather than modifying in place existing experiments as proposed in this issue
Next meeting
Thursday, January 14th, 4pm UK time, 8am PDT.