Using the pipeline
Basics
The primary user-interface for this code is a command line tool
util_RIFT_pseudo_pipe.py
which is available after following the installation
instructions. To see the help for this tool, run
$ util_RIFT_pseudo_pipe.py --help
If you leave options unspecified, the pipeline will often try to make guesses about the options you want to use, based on previous experience performing real GW inference. We strongly recommend most users fully specify the options involved via an ini file. To compare the options for ini file use versus command line specified:
$ util_RIFT_pseudo_pipe.py --use-ini my.ini --use-coinc coinc.xml
# or, for a freestanding example
$ util_RIFT_pseudo_pipe.py --gracedb-id G329473 --approx IMRPhenomD --calibration C01 --make-bw-psds --l-max 2 --choose-data-LI-seglen
Note that the code will use selected environment variables to identify optional external dependencies necessary for various features. The most important feature for most users is their accounting access information
export LIGO_USER_NAME=albert.einstein
export LIGO_ACCOUNTING=ligo.dev.o4.rift
When you run util_RIFT_psuedo_pipe.py
, the pipeline will create a directory structure as follows:
long_directory_name_here/
-> local.cache
-> iteration_0_ile/
-> iteration_0_cip/
-> iteration_0_con/
-> iteration_0_test/
-> iteration_1_ile/
-> ...
-> ILE.sub
-> CIP*.sub
-> marginalize_intrinsic_parameters_BasicIterationWorkflow.dag
Inside each iteration file is a logs subdirectory. The various iterations will be initially be empty, except for the logfile locations. The top level directory contains several *.sub submission scripts, along with the top-level dag submission script.
The standard RIFT pipeline only works within an HTCondor scheduling environment. To submit the workflow, use
$ condor_submit_dag marginalize_intrinsic_parameters_BasicIterationWorkflow.dag
Before you submit a workflow, however, we recommend you first confirm you’ve set it up correctly by running one of the worker jobs interactively from the command line. This is a great way to catch common configuration errors. Within the directory, there should be a script called command-single.sh
. This contains a single worker job, so you can simply run this script to confirm that your worker jobs will proceed smoothly.
$ ./command-single.sh
This command will run anywhere; however, it will only test the GPU configuration if you run it on a machine with a GPU, like pcdev13 or pcdev11 at CIT. You’ll see a lot of output about reading in data, defining parameters, et cetera. Wait until you start seeing large arrays of numbers interspersed with the words Weight entropy (after histogram)
. At this point you may kill the script and submit the DAG as described above. Feel free to watch the job progress by:
$ watch condor_q
The workflow loosely consists of two parts: worker ILE jobs, which evaluate the marginalized likelihood; and fitting/posterior jobs, which fit the marginalized likelihood and estimate the posterior distribution. Other nodes help group the output of individual jobs and iterations together.
As your run proceeds, files will begin to appear in your directory. A description of some of the files is as follows:
overlap-grid-0.xml.gz
: The initial grid used in the iterative analysis. You’re free to use any grid you want (e.g., the output of some previous analysis), and the workflow can also do the initial grid creation.ILE.sub
: The submit file for the individual worker ILE jobs. This contains the command line arguments passed tointegrate_likelihood_extrinsic
. If something is going wrong when your ILE jobs run, this file is a good place to check to ensure the code is using the settings you intended.CIP.sub
: The submit file for the individual fitting jobs.iteration_*
: Directories holding the output of each iteration, including log files.
As the workflow progresses, you’ll see the following additional files
consolidated_*
: These files (particularly those ending in .composite) are the output of each iteration’s ILE jobs. Each file is a list of intrinsic parameters and the value of the marginalized likelihood at those parameters. (The remaining files provide provenance for how the .composite file was produced.)output-grid-?.xml.gz
: These files are inferred intrinsic, detector-frame posterior distributions from that iteration, expressed as an XML file.posterior-samples-*.dat
: These files are reformatted versions of the corresponding XML file, using the command convert_output_format_ile2inference. This data format should be compatible with LALInference and related postprocessing tools. The final output posterior samples are used to create PP plots. Corner plots for a user specified number of iterations are also created using these files.
Understanding ILE and CIP
ILE.sub
The ILE.sub
file contains the call to and arguments for integrate_likelihood_extrinsic_batchmode. This step integrates the extrinsic parameters of the prefactored likelihood function. The marginalized likelihood is calculated during individual parallel worker jobs to be passed to the next (CIP) step.
CIP.sub
The file called CIP.sub
contains the call to and arguments for util_ConstructIntrinsicPosterior_GenericCoordinates.py. During this step, the log-likelihoof data is loaded in and the peak is fitted using some particular coordinate system. This is passed as an input to the Monte Carlo sampler where samples are drawn from the posterior distribution. These samples become the inputs for the successive iteration.
Initialization: PSDs and grids
The RIFT pipeline by default can attempt to generate a PSD using existing tools. However, most users will want to supply a contemporary PSD for analysis on real GW events. RIFT uses an XML format, requiring some conversion for the input PSDs.
Strongly recommended dependencies
We strongly recommend you install cuda and cupy, and properly define your environment variables for such an install
# should be provided by igwn
export CUDA_DIR=/usr/local/cuda # only needed for GPU code
export PATH=${PATH}:${CUDA_DIR}/bin # only needed for GPU code
Additional environment variables are needed if you want to use waveforms through a non-lalsimulation interface. Such waveforms include the python implementation of surrogate waveforms; NR waveforms; or the C++ implementation of TEOBResumS. While we provide the necessary environment variables below, please contact one of the developers for appropriate settings, and keep in mind some surrogates and/or simulations and/or waveforms may be provided in advance of publication or release to the broader community.
export NR_BASE=/home/oshaughn/unixhome/PersonalNRArchive/Archives/
export GW_SURROGATE= # your installation of gwsurrogate goes here
export PYTHONPATH=${PYTHONPATH}:${GW_SURROGATE}
util_RIFT_pseudo_pipe.py help
For reference, here is the full output of
$ util_RIFT_pseudo_pipe.py --help
usage: util_RIFT_pseudo_pipe.py [-h] [--use-production-defaults] [--use-subdags] [--use-ini USE_INI] [--use-rundir USE_RUNDIR]
[--use-online-psd-file USE_ONLINE_PSD_FILE] [--use-coinc USE_COINC] [--manual-ifo-list MANUAL_IFO_LIST] [--online]
[--extra-args-helper EXTRA_ARGS_HELPER] [--manual-postfix MANUAL_POSTFIX] [--gracedb-id GRACEDB_ID] [--gracedb-exe GRACEDB_EXE]
[--use-legacy-gracedb] [--internal-use-gracedb-bayestar] [--event-time EVENT_TIME] [--calibration CALIBRATION] [--playground-data]
[--approx APPROX] [--use-gwsurrogate] [--l-max L_MAX] [--no-matter] [--assume-nospin] [--assume-precessing]
[--assume-nonprecessing] [--assume-matter] [--assume-lowlatency-tradeoffs] [--assume-highq] [--assume-well-placed]
[--internal-marginalize-distance] [--internal-marginalize-distance-file INTERNAL_MARGINALIZE_DISTANCE_FILE]
[--internal-distance-max INTERNAL_DISTANCE_MAX] [--internal-correlate-default]
[--internal-force-iterations INTERNAL_FORCE_ITERATIONS] [--internal-flat-strategy] [--internal-use-amr]
[--internal-use-amr-bank INTERNAL_USE_AMR_BANK] [--internal-use-amr-puff] [--internal-use-aligned-phase-coordinates]
[--external-fetch-native-from EXTERNAL_FETCH_NATIVE_FROM] [--internal-propose-converge-last-stage] [--add-extrinsic]
[--batch-extrinsic] [--fmin FMIN] [--fmin-template FMIN_TEMPLATE] [--data-LI-seglen DATA_LI_SEGLEN] [--choose-data-LI-seglen]
[--fix-bns-sky] [--ile-sampler-method ILE_SAMPLER_METHOD] [--ile-n-eff ILE_N_EFF] [--cip-sampler-method CIP_SAMPLER_METHOD]
[--cip-fit-method CIP_FIT_METHOD] [--cip-internal-use-eta-in-sampler] [--ile-jobs-per-worker ILE_JOBS_PER_WORKER] [--ile-no-gpu]
[--ile-force-gpu] [--fake-data-cache FAKE_DATA_CACHE] [--spin-magnitude-prior SPIN_MAGNITUDE_PRIOR]
[--force-chi-max FORCE_CHI_MAX] [--force-mc-range FORCE_MC_RANGE] [--force-eta-range FORCE_ETA_RANGE]
[--force-hint-snr FORCE_HINT_SNR] [--force-initial-grid-size FORCE_INITIAL_GRID_SIZE] [--hierarchical-merger-prior-1g]
[--hierarchical-merger-prior-2g] [--link-reference-pe] [--link-reference-psds] [--make-bw-psds] [--link-bw-psds]
[--use-online-psd] [--ile-retries ILE_RETRIES] [--general-retries GENERAL_RETRIES]
[--ile-runtime-max-minutes ILE_RUNTIME_MAX_MINUTES] [--fit-save-gp] [--cip-explode-jobs CIP_EXPLODE_JOBS]
[--cip-explode-jobs-last CIP_EXPLODE_JOBS_LAST] [--cip-quadratic-first] [--n-output-samples N_OUTPUT_SAMPLES]
[--internal-cip-cap-neff INTERNAL_CIP_CAP_NEFF] [--internal-cip-temper-log] [--internal-ile-sky-network-coordinates]
[--internal-ile-freezeadapt] [--internal-ile-adapt-log] [--manual-initial-grid MANUAL_INITIAL_GRID]
[--manual-extra-ile-args MANUAL_EXTRA_ILE_ARGS] [--verbose] [--use-quadratic-early] [--use-gp-early] [--use-cov-early] [--use-osg]
[--use-osg-file-transfer] [--condor-local-nonworker] [--condor-nogrid-nonworker] [--use-osg-simple-requirements]
[--archive-pesummary-label ARCHIVE_PESUMMARY_LABEL] [--archive-pesummary-event-label ARCHIVE_PESUMMARY_EVENT_LABEL]
Expand below for a description of each of the optional arguments:
Args
- --h, --help
show this help message and exit
- --use-production-defaults
Use production defaults. Intended for use with tools like asimov or by nonexperts who just want something to run on a real event. Will require manual setting of other arguments!
- --use-subdags
Use CEPP_Alternate instead of CEPP_BasicIteration
- --use-ini USE_INI
Pass ini file for parsing. Intended to reproduce lalinference_pipe functionality. Overrides most other arguments. Full path recommended
- --use-rundir USE_RUNDIR
Intended to reproduce lalinference_pipe functionality. Must be absolute path.
- --use-online-psd-file USE_ONLINE_PSD_FILE
Provides specific online PSD file, so no downloads are needed
- --use-coinc USE_COINC
Intended to reproduce lalinference_pipe functionality
- --manual-ifo-list MANUAL_IFO_LIST
Overrides IFO list normally retrieve by event ID. Use with care (e.g., glitch studies) or for events specified with –event-time.
- --online
online
- --extra-args-helper EXTRA_ARGS_HELPER
Filename with arguments for the helper. Use to provide alternative channel names and other advanced configuration (–channel-name, data type)!
- --manual-postfix MANUAL_POSTFIX
manual postfix
- --gracedb-id GRACEDB_ID
event id from gracebd
- --gracedb-exe GRACEDB_EXE
exe from gracedb
- --use-legacy-gracedb
gracedb legacy
- --internal-use-gracedb-bayestar
Retrieve BS skymap from gracedb (bayestar.fits), and use it internally in integration with –use-skymap bayestar.fits.
- --event-time EVENT_TIME
Event time. Intended to override use of GracedbID. MUST provide –manual-initial-grid
- --calibration CALIBRATION
calibration
- --playground-data
Passed through to helper_LDG_events, and changes name prefix
- --approx APPROX
Approximant. REQUIRED
- --use-gwsurrogate
Attempt to use gwsurrogate instead of lalsuite.
- --l-max L_MAX
lmax
- --no-matter
Force analysis without matter. Really only matters for BNS
- --assume-nospin
Force analysis with zero spin
- --assume-precessing
Force analysis with transverse spins
- --assume-nonprecessing
Force analysis without transverse spins
- --assume-matter
Force analysis with matter. Really only matters for BNS
- --assume-lowlatency-tradeoffs
Force analysis with various low-latency tradeoffs (e.g., drop spin 2, use aligned, etc)
- --assume-highq
Force analysis with the high-q strategy, neglecting spin2. Passed to ‘helper’
- --assume-well-placed
If present, the code will adopt a strategy that assumes the initial grid is very well placed, and will minimize the number of early iterations performed. Not as extrme as –propose-flat-strategy
- --internal-marginalize-distance
If present, the code will marginalize over the distance variable. Passed diretly to helper script. Default will be to generate d_marg script on the fly
- --internal-marginalize-distance-file INTERNAL_MARGINALIZE_DISTANCE_FILE
Filename for marginalization file. You MUST make sure the max distance is set correctly
- --internal-distance-max INTERNAL_DISTANCE_MAX
If present, the code will use this as the upper limit on distance (overriding the distance maximum in the ini file, or any other setting). required to use internal-marginalize-distance in most circumstances
- --internal-correlate-default
Force joint sampling in mc,delta_mc, s1z and possibly s2z
- --internal-force-iterations INTERNAL_FORCE_ITERATIONS
If integer provided, overrides internal guidance on number of iterations, attempts to force prolonged run. By default puts convergence tests on
- --internal-flat-strategy
Use the same CIP options for every iteration, with convergence tests on. Passes –test-convergence,
- --internal-use-amr
Changes refinement strategy (and initial grid) to use. PRESENTLY WE CAN’T MIX AND MATCH AMR, CIP ITERATIONS, so this is fixed for the whole run right now; use continuation and ‘fetch’ to augment
- --internal-use-amr-bank INTERNAL_USE_AMR_BANK
Bank used for template
- --internal-use-amr-puff
Use puffball with AMR (as usual). May help with stalling
- --internal-use-aligned-phase-coordinates
If present, instead of using mc…chi-eff coordinates for aligned spin, will use SM’s phase-based coordinates. Requires spin for now
- --external-fetch-native-from EXTERNAL_FETCH_NATIVE_FROM
Directory name of run where grids will be retrieved. Recommend this is for an ACTIVE run, or otherwise producing a large grid so the retrieved grid changes/isn’t fixed
- --internal-propose-converge-last-stage
Pass through to helper
- --add-extrinsic
add extrinsic params
- --batch-extrinsic
batch
- --fmin FMIN
Mininum frequency for integration. template minimum frequency (we hope) so all modes resolved at this frequency
- --fmin-template FMIN_TEMPLATE
Mininum frequency for template. If provided, then overrides automated settings for fmin-template = fmin/Lmax
- --data-LI-seglen DATA_LI_SEGLEN
If specified, passed to the helper. Uses data selection appropriate to LI. Must specify the specific LI seglen used.
- --choose-data-LI-seglen
choose data seglen
- --fix-bns-sky
fix bns
- --ile-sampler-method ILE_SAMPLER_METHOD
choose ile sampler method
- --ile-n-eff ILE_N_EFF
ILE n_eff passed to helper/downstream. Default internally is 50; lower is faster but less accurate, going much below 10 could be dangerous
- --cip-sampler-method CIP_SAMPLER_METHOD
choose cip sampler method
- --cip-fit-method CIP_FIT_METHOD
choose cip fit method
- --cip-internal-use-eta-in-sampler
Use ‘eta’ as a sampling parameter. Designed to make GMM sampling behave particularly nicely for objects which could be equal mass
- --ile-jobs-per-worker ILE_JOBS_PER_WORKER
Default will be 20 per worker usually for moderate-speed approximants, and more for very fast configurations
- --ile-no-gpu
not using gpu during ile stage
- --ile-force-gpu
force gpu use for ile jobs
- --fake-data-cache FAKE_DATA_CACHE
fake data cache
- --spin-magnitude-prior SPIN_MAGNITUDE_PRIOR
options are default [volumetric for precessing,uniform for aligned], volumetric, uniform_mag_prec, uniform_mag_aligned, zprior_aligned
- --force-chi-max FORCE_CHI_MAX
Provde this value to override the value of chi-max provided
- --force-mc-range FORCE_MC_RANGE
Pass this argument through to the helper to set the mc range
- --force-eta-range FORCE_ETA_RANGE
Pass this argument through to the helper to set the eta range
- --force-hint-snr FORCE_HINT_SNR
Pass this argument through to the helper to control source amplitude effects
- --force-initial-grid-size FORCE_INITIAL_GRID_SIZE
Only used for automated grids. Passes –force-initial-grid-size down to helper
- --hierarchical-merger-prior-1g
As in 1903.06742
- --hierarchical-merger-prior-2g
As in 1903.06742
- --link-reference-pe
If present, creates a directory ‘reference_pe’ and adds symbolic links to fiducial samples. These can be used by the automated plottingcode. Requires LVC_PE_SAMPLES environment variable defined!
- --link-reference-psds
If present, uses the varialbe LVC_PE_CONFIG to find a ‘reference_pe_config_map.dat’ file, which provides the location for reference PSDs. Will override PSDs used / setup by default
- --make-bw-psds
If present, adds nodes to create BW PSDs to the dag. If at all possible, avoid this and re-use existing PSDs
- --link-bw-psds
If present, uses the script retrieve_bw_psd_for_event.sh to find a precomputed BW psd, and convert it to our format
- --use-online-psd
If present, will use the online PSD estimates
- --ile-retries ILE_RETRIES
number retries for ile jobs
- --general-retries GENERAL_RETRIES
number retries general, for DAG
- --ile-runtime-max-minutes ILE_RUNTIME_MAX_MINUTES
If not none, kills ILE jobs that take longer than the specified integer number of minutes. Do not use unless an expert
- --fit-save-gp
If true, pass this argument to CIP. GP plot for each iteration will be saved. Useful for followup investigations or reweighting. Warning: lots of disk space (1G or so per iteration)
- --cip-explode-jobs CIP_EXPLODE_JOBS
explode jobs cip
- --cip-explode-jobs-last CIP_EXPLODE_JOBS_LAST
Number of jobs to use in last stage. Hopefully in future auto-set
- --cip-quadratic-first
cip quadratic
- --n-output-samples N_OUTPUT_SAMPLES
Number of output samples generated in the final iteration
- --internal-cip-cap-neff INTERNAL_CIP_CAP_NEFF
Largest value for CIP n_eff to use for non-final iterations. ALWAYS APPLIED.
- --internal-cip-temper-log
Use temper_log in CIP. Helps stabilize adaptation for high q for example
- --internal-ile-sky-network-coordinates
Passthrough to ILE
- --internal-ile-freezeadapt
Passthrough to ILE
- --internal-ile-adapt-log
Passthrough to ILE
- --manual-initial-grid MANUAL_INITIAL_GRID
Filename (full path) to initial grid. Copied into proposed-grid.xml.gz, overwriting any grid assignment done here
- --manual-extra-ile-args MANUAL_EXTRA_ILE_ARGS
Avenue to adjoin extra ILE arguments. Needed for unusual configurations (e.g., if channel names are not being selected, etc)
- --verbose
verbose print everything
- --use-quadratic-early
If provided, use a quadratic fit in the early iterations’
- --use-gp-early
If provided, use a gp fit in the early iterations’
- --use-cov-early
If provided, use cov fit in the early iterations’
- --use-osg
Restructuring for ILE on OSG. The code by default will use CVMFS
- --use-osg-file-transfer
Restructuring for ILE on OSG. The code will NOT use CVMFS, and instead will try to transfer the frame files.
- --condor-local-nonworker
Provide this option if job will run in non-NFS space.
- --condor-nogrid-nonworker
NOW STANDARD, auto-set if you pass use-osg Causes flock_local for ‘internal’ jobs
- --use-osg-simple-requirements
Provide this option if job should use a more aggressive setting for OSG matching
- --archive-pesummary-label ARCHIVE_PESUMMARY_LABEL
If provided, creates a ‘pesummary’ directory and fills it with this run’s final output at the end of the run
- --archive-pesummary-event-label ARCHIVE_PESUMMARY_EVENT_LABEL
Label to use on the pesummary page itself