diff options
Diffstat (limited to 'doc/man/indexamajig.1')
-rw-r--r-- | doc/man/indexamajig.1 | 220 |
1 files changed, 220 insertions, 0 deletions
diff --git a/doc/man/indexamajig.1 b/doc/man/indexamajig.1 new file mode 100644 index 00000000..fcb1afc4 --- /dev/null +++ b/doc/man/indexamajig.1 @@ -0,0 +1,220 @@ +.\" +.\" indexamajig man page +.\" +.\" (c) 2009-2011 Thomas White <taw@physics.org> +.\" +.\" Part of CrystFEL - crystallography with a FEL +.\" + +.TH INDEXAMAJIG 1 +.SH NAME +indexamajig \- bulk indexing and data reduction program +.SH SYNOPSIS +.PP +.B indexamajig +[options] + +.SH DESCRIPTION + +The "indexamajig" program takes as input a list of diffraction image files, +currently in HDF5 format. For each image, it attempts to find peaks and then +index the pattern. If successful, it will measure the intensities of the peaks +at Bragg locations and produce a list in the form "h k l I", with some extra +information about the locations of the peaks. + +For minimal basic use, you need to provide the list of diffraction patterns, +the method which will be used to index, a file describing the geometry of the +detector, a PDB file which contains the unit cell which will be used for the +indexing, and that you'd like the program to output a list of intensities for +each successfully indexed pattern. Here is what the minimal use might look like +on the command line: + +indexamajig -i mypatterns.lst -j 10 -g mygeometry.geom --indexing=mosflm,dirax --peaks=hdf5 --cell-reduction=reduce -b myxfel..beam -o test.stream -p mycell.pdb --record=integrated + +More typical use includes all the above, but might also include a noise or +common mode filter (--filter-noise or --filter-cm respectively) if detector +noise causes problems for the peak detection. The HDF5 files might be in some +folder a long way from the current directory, so you might want to specify a +full pathname to be added in front of each filename. You'll probably want to +run more than one indexing job at a time (-j <n>). + +You can include a table of saturation values for in the HDF5 file, if you have +a method for estimating the intensities of saturated peaks. It goes in +/processing/hitfinder/peakinfo_saturated, and should be an n*3 two dimensional +array, where the first two columns contain fast scan and slow scan coordinates +(in that order) and the third contains the value which should belong in a peak +at the given location. The value will be spread in a small cross centred on +that location. + +See doc/geometry for information about how to create a geometry description +file. + +You can control what information is included in the output stream using +' --record=<flags>'. Possible flags are: + + pixels Include a list of sums of pixel values within the + integration domain, correcting for individual pixel + solid angles. + + integrated Include a list of reflection intensities, produced by + integrating around predicted peak locations. + + peaks Include peak locations and intensities from the peak + search. + + peaksifindexed As 'peaks', but only if the pattern could be indexed. + + peaksifnotindexed As 'peaks', but only if the pattern could NOT be indexed. + +So, if you just want the integrated intensities of indexed peaks, use +"--record=integrated". If you just want to check that the peak detection is +working, used "--record=peaks". If you want the integrated peaks for the +indexable patterns, but also want to check the peak detection for the patterns +which could not be indexed, you might use +"--record=integrated,peaksifnotindexed" and then use "check-peak-detection" from +the "scripts" folder to visualise the results of the peak detection. + +.SH PEAK DETECTION + +You can control the peak detection on the command line. Firstly, you can choose +the peak detection method using "--peaks=<method>". Currently, two possible +values for "method" are available. "hdf5" will take the peak locations from the +HDF5 file. It expects a two dimensional array at /processing/hitfinder/peakinfo +where size in the first dimension is the number of peaks and the size in the +second dimension is three. The first two columns contain the x and y +coordinate (see the "Note about data orientation" in geometry.txt for details), +the third contains the intensity. However, the intensity will be ignored since +the pattern will always be re-integrated using the unit cell provided by the +indexer on the basis of the peaks. + +The "zaef" method uses a simple gradient search after Zaefferer (2000). You can +control the overall threshold and minimum gradient for finding a peak using the +"--threshold" and "--min-gradient" options. Both of these have units of "ADU" +(i.e. units of intensity according to the contents of the HDF5 file). + +A minimum peak separation can also be provided in the geometry description file +(see geometry.txt for details). This number serves two purposes. Firstly, +it is the maximum distance allowed between the peak summit and the foot point +(where the gradient exceeds the minimum gradient). Secondly, it is the minimum +distance allowed between one peak and another, before the later peak will be +rejected "by proximity". + +You can suppress peak detection altogether for a panel in the geometry file by +specifying the "no_index" value for the panel as non-zero. + + +.SH INDEXING METHODS + +You can choose between a variety of indexing methods. You can choose more than +one method, in which case each method will be tried in turn until the later cell +reduction step says that the cell is a "hit". Choose from: + + dirax : invoke DirAx + mosflm : invoke MOSFLM (DPS) + +Depending on what you have installed. For "dirax" and "mosflm", you need to +have the dirax or ipmosflm binaries in your PATH. + +Example: --indexing=dirax,mosflm + +.SH CELL REDUCTION + +You can choose from various options for cell reduction with the +"--cell-reduction=" option. The choices are "none", "reduce" and "compare". +This choice is important because all autoindexing methods produce an "ab +initio" estimate of the unit cell (nine parameters), rather than just finding +the orientation of the target cell (three parameters). It's clear that this is +not optimal, and will hopefully be fixed in future versions. + +With "none", the raw cell from the autoindexer will be used. The cell probably +won't match the target cell, but it'll still get used. Use this option to test +whether the patterns are basically "indexable" or not, or if you don't know the +cell parameters. In the latter case, you'll need to plot some kind of histogram +of the resulting parameters from the output stream to see which are the most +popular. If you're lucky, this will reveal the true unit cell. + +With "reduce", linear combinations of the raw cell will be checked against the +target cell. If at least one candidate is found for each axis of the target +cell, the angles will be checked to correspondence. If a match is found, this +cell will be used for further processing. This option should generate the most +matches, but might produce spurious results in many cases. The predicted peaks +are always checked to verify that at least 10% of the predicted peaks are close +to peaks located by the peak search. If not, the next candidate unit cell is +tried until there are no more options. + +The "compare" method is like "reduce", but linear combinations are not taken. +That means that the cell must either match or match after a simple permutation +of the axes. This is useful when the target cell is subject to reticular +twinning, such as if one cell axis length is close to twice another. With +"reduce", there is a possibility that the axes might be confused in this +situation. This happens for lysozyme (1VDS), so watch out. + +The tolerance for matching with "reduce" and "compare" is hardcoded as 5% in +the reciprocal axis lengths and 1.5 degrees in the (reciprocal) angles. Cells +from these reduction routines are further constrained to be right-handed. The +unmatched raw cell might be left-handed: CrystFEL doesn't check this for you. +Always using a right-handed cell means that the Bijvoet pairs can be told +apart. + +If the unit cell is centered (i.e. if the space group begins with I, R, C, A or +F), you should be careful when using "compare" for the cell reduction, since +(for example) DirAx will always find a primitive unit cell, and this cell must +be converted to the non-primitive conventional cell from the PDB. + + +.SH TUNING CPU AFFINITIES FOR NUMA HARDWARE + +If you are running indexamajig on a NUMA (non-uniform memory architecture) +machine, a performance gain can sometimes be made by preventing the kernel from +allowing a process or thread to run on a CPU which is distant from the one on +which it started. Distance, in this context, might mean that the CPU is able to +access all the memory visible to the original CPU, but perhaps only relatively +slowly via a cable link. In many cases a group of CPUs will have direct access +to a certain region of memory, and so the process may be scheduled on any CPU in +that group without any penalty. However, scheduling the process to any CPU +outside the group may be slow. When running under Linux, indexamajig is able to +avoid such sub-optimal process scheduling by setting CPU affinities for its +threads. The CPU affinities are also inherited by subprocesses (e.g. MOSFLM or +DirAx). + +To do this usefully, you need to give indexamajig some information about your +hardware's architecture. Specify the size of the CPU groups using +"--cpugroup=<n>". You also need to specify the overall number of CPUs, so that +the program knows when to 'wrap around'. Using "--cpuoffset=<n>", where "n" is +a group number (not a CPU number), allows you to manually skip a few CPUs, which +may be useful if you do not want to use all the available CPUs but want to avoid +running all your jobs on the same ones. + +Note that specifying the above options is NOT the same thing as giving the +number of analyses to run in parallel (the 'number of threads'), which is done +with "-j <n>". The CPU tuning options provide information to indexamajig about +how to set the CPU affinities for its threads, but it does not specify how many +threads to use. + +Example: 72-core Altix UV 100 machine at the author's institution + +This machine consists of six blades, each containing two 6-core CPUs and some +local memory. Any CPU on any blade can access the memory on any other blade, +but the access will be slow compared to accessing memory on the same blade. +When running two instances of indexamajig, a sensible choice of parameters might +be: + +1: --cpus=72 --cpugroup=12 --cpuoffset=0 -j 36 +2: --cpus=72 --cpugroup=12 --cpuoffset=36 -j 36 + +This would dedicate half of the CPUs to one instance, and the other half to the +other. + + +.SH A NOTE ABOUT UNIT CELL SETTINGS + +CrystFEL's core symmetry module only knows about one setting for each unit cell. +You must use the same setting. That means that the unique axis (for cells which +have one) must be "c". + + +.SH KNOWN BUGS + +Don't run more than one indexamajig jobs simultaneously in the same working +directory - they'll overwrite each other's DirAx or MOSFLM files, causing subtle +problems which can't easily be detected. |