Backend for columnar, fully orchestrated HEP analyses with pure Python, law and order.
Original source hosted at GitHub.
The 0.3 release introduces many performance fixes and new features such as
- a new interface for all task array functions (calibrators, selectors, producers, etc.),
- support for plotting data of multiple data taking campaigns at once,
- a simplified machine learning interface, and
- statistical inference models with support for merging data of different campaigns.
However, some of these changes are potentially breaking existing code. Checkout the v0.2 → v0.3 transition guide as well as the release notes for a detailed overview of the changes and how to adapt your code.
Version 0.2 continues to be available via the legacy/v0.2
branch, with the latest release being v0.2.5.
This project is in an advanced beta phase. The project setup, suggested workflows, definitions of particular tasks, and the signatures of various helper classes and functions are mostly frozen but could still be subject to changes in the near future. Various large-scale analyses based upon columnflow have been performed, others are being developed, and in the process, help test and verify various aspects of the framework.
To create an analysis using columnflow, it is recommended to start from a predefined template (located in analysis_templates). The following command (no previous git clone required) interactively asks for a handful of names and settings, and creates a minimal, yet fully functioning project structure for you!
bash -c "$(curl -Ls https://raw.githubusercontent.com/columnflow/columnflow/master/create_analysis.sh)"
At the end of the setup, you will see further instructions and suggestions to run your first analysis tasks (example below).
Setup successfull! The next steps are:
1. Setup the repository and install the environment.
> cd
> source setup.sh [recommended_yet_optional_setup_name]
2. Run local tests & linting checks to verify that the analysis is setup correctly.
> ./tests/run_all
3. Create a GRID proxy if you intend to run tasks that need one
> voms-proxy-init -rfc -valid 196:00
4. Checkout the 'Getting started' guide to run your first tasks.
https://columnflow.readthedocs.io/en/stable/start.html
Suggestions for tasks to run:
a) Run the 'calibration -> selection -> reduction' pipeline for the first file of the
default dataset using the default calibrator and default selector
(enter the command below and 'tab-tab' to see all arguments or add --help for help)
> law run cf.ReduceEvents --version dev1 --branch 0
Verify what you just run by adding '--print-status -1' (-1 = fully recursive)
> law run cf.ReduceEvents --version dev1 --branch 0 --print-status -1
b) Create the jet1_pt distribution for the single top datasets
(if you have an image/pdf viewer installed, add it via '--view-cmd <binary>')
> law run cf.PlotVariables1D --version dev1 --datasets 'st*' --variables jet1_pt
Again, verify what you just ran, now with recursion depth 4
> law run cf.PlotVariables1D --version dev1 --datasets 'st*' --variables jet1_pt --print-status 4
c) Include the ttbar dataset and also plot jet1_eta
> law run cf.PlotVariables1D --version dev1 --datasets 'tt*,st*' --variables jet1_pt,jet1_eta
For a better overview of the tasks that are triggered by the commands below, checkout the current (yet stylized) task graph.
- hh2bbtautau: HH → bb𝜏𝜏 analysis with CMS.
- hh2bbww: HH → bbWW analysis with CMS.
- topmass: Top quark mass measurement with CMS.
- mttbar: Search for heavy resonances in ttbar events with CMS.
- analysis playground: TODO
- topsf: Top tagging scale factor measurement.
- hto4l: H → ZZ → 4l analysis with CMS.
- DiJetJERC: Di-jet analysis with CMS.
This project follows the all-contributors specification.
- Source hosted at GitHub
- Report issues, questions, feature requests on GitHub Issues