General Guide and Examples:
Is something missing from this guide? Please post your questions on the discussions page!
Features of all (or most) functions:
Automatic handling of Uproot duplicate counter issue: If you are using a hepconvert function that goes ROOT -> ROOT (both the input and output files are ROOT) and working with data in jagged arrays, if branches have the same “fLeafCount”, hepconvert will group branches automatically so that Uproot will not create a counter branch for each branch.
Quick Modifications of ROOT files and TTrees:
Functions copy_root
, merge_root
, and root_to_parquet
have a few options for applying quick
modifications to ROOT files and TTree data.
- Branch slimming:
Parameters
keep_branches
ordrop_branches
(list or dict) control branch slimming. Examples:>>> hepconvert.root_to_parquet("out_file.root", "in_file.root", keep_branches="x*", progress_bar=True, force=True) # Before: # name | typename | interpretation # ---------------------+--------------------------+------------------------------- # x1 | int64_t | AsDtype('>i8') # x2 | int64_t | AsDtype('>i8') # y1 | int64_t | AsDtype('>i8') # y2 | int64_t | AsDtype('>i8') # After: # name | typename | interpretation # ---------------------+--------------------------+------------------------------- # x1 | int64_t | AsDtype('>i8') # x2 | int64_t | AsDtype('>i8')
>>> hepconvert.root_to_parquet("out_file.root", "in_file.root", keep_branches={"tree1": ["branch2", "branch3"], "tree2": ["branch2"]}, progress_bar=True, force=True) # Before: # Tree1: # name | typename | interpretation # ---------------------+--------------------------+------------------------------- # branch1 | int64_t | AsDtype('>i8') # branch2 | int64_t | AsDtype('>i8') # branch3 | int64_t | AsDtype('>i8') # Tree2: # name | typename | interpretation # ---------------------+--------------------------+------------------------------- # branch1 | int64_t | AsDtype('>i8') # branch2 | int64_t | AsDtype('>i8') # branch3 | int64_t | AsDtype('>i8') # After: # Tree1: # name | typename | interpretation # ---------------------+--------------------------+------------------------------- # branch2 | int64_t | AsDtype('>i8') # branch3 | int64_t | AsDtype('>i8') # Tree2: # name | typename | interpretation # ---------------------+--------------------------+------------------------------- # branch2 | int64_t | AsDtype('>i8')
- Branch skimming:
Parameters
cut
andexpressions
control branch skimming. Both of these parameters go to Uproot’s iterate function. See Uproot’s documentation for more details.Basic example:
hepconvert.copy_root("skimmed_HZZ.root", "HZZ.root", keep_branches="Jet_", force=True, expressions="Jet_Px", cut="Jet_Px >= 10",)
- Remove TTrees:
Use parameters
keep_ttrees
ordrop_ttrees
to remove TTrees.# Creating example data: with uproot.recreate("two_trees.root") as file: file["tree"] = {"x": np.array([1, 2, 3])} file["tree1"] = {"x": np.array([1, 2, 3])} hepconvert.copy_root("one_tree.root", "two_trees.root", keep_trees=tree, force=True, expressions="Jet_Px", cut="Jet_Px >= 10",)
How hepconvert works with ROOT
hepconvert uses Uproot for reading and writing ROOT files; it also has the same limitations. It currently only works with flat TTrees (nanoAOD-like data), and cannot yet read or write RNTuples.
As described in Uproot’s documentation:
Note
A small but growing list of data types can be written to files:
strings: TObjString
histograms: TH1*, TH2*, TH3*
profile plots: TProfile, TProfile2D, TProfile3D
NumPy histograms created with np.histogram, np.histogram2d, and np.histogramdd with 3 dimensions or fewer
histograms that satisfy the Universal Histogram Interface (UHI) with 3 dimensions or fewer; this includes boost-histogram and hist
PyROOT objects
Memory Management
Each hepconvert function has automatic and customizable memory management for working with large files.
Functions reading ROOT files will read in batches controlled by the parameter step_size
.
Set step_size
to either an int to set the batch size to a number of entries, or a string in
form of “100 MB”.
Progress Bars
hepconvert uses the package tqdm for progress bars, if you do not have the package installed an error message will provide installation instructions.
They are controlled with the progress_bar
argument.
For example, to use a default progress bar with copy_root, set progress_bar to True:
hepconvert.copy_root("out_file.root", "in_file.root", progress_bar=True)
Some functions can handle a customized tqdm progress bar. To use a customized tqdm progress bar, make a progress bar object and pass it to the hepconvert function like so,
>>> import tqdm
>>> bar_obj = tqdm.tqdm(colour="GREEN", desc="Description")
>>> hepconvert.add_histograms("out_file.root", "path/in_files/", progress_bar=bar_obj)

Some types of tqdm progress bar objects may not work in this way.
Command Line Interface
All functions are able to be run in the command line. See the “Command Line Interface Instructions” tab on the left to see CLI instructions on individual functions.
Adding Histograms
hepconvert.add_histograms
adds the values of many histograms
and writes the summed histograms to an output file (like ROOT’s hadd, but limited
to histograms).
Parameters of note:
union
If True, adds the histograms that have the same name and appends all others
to the new file.
append
If True, appends histograms to an existing file. Force and append
cannot both be True.
same_names
If True, only adds together histograms which have the same name (key). If False,
histograms are added together based on TTree structure (bins must be equal).
Memory:
add_histograms
has no memory customization available currently. To maintain
performance it stores the summed histograms in memory until all files have
been read, then the summed histograms are written to the output file. Only
one input ROOT file is read and kept in memory at a time.
Merging TTrees
hepconvert.merge_root
merges TTrees in multiple ROOT files together. The end result is a single file containing data from all input files (again like ROOT’s hadd, but can handle flat TTrees and histograms).
Warning
At the moment, hepconvert.merge can only merge TTrees that have the same number of branches, with the same names and datatypes. We are working on adding backfill capabilities for mismatched TTrees.
Features:
merge_root has parameters cut
, expressions
, drop_branches
, keep_branches
, drop_trees
and keep_trees
.
Copying TTrees
hepconvert.copy_root
copies TTrees in multiple ROOT files together.
Warning
At the moment, hepconvert.merge can only merge TTrees that have the same number of branches, with the same names and datatypes. We are working on adding backfill capabilities for mismatched TTrees.
Features:
merge_root has parameters cut
, expressions
, drop_branches
, keep_branches
, drop_trees
and keep_trees
.
Parquet to ROOT
Writes the data from a single Parquet file to one TTree in a ROOT file.
This function creates a new TTree (name the new tree with parameter tree
).
ROOT to Parquet
Writes the data from one TTree in a ROOT file to a single Parquet file.
If there are multiple TTrees in the file, specify one TTree to write to the Parquet file using the tree
parameter.
Features:
root_to_parquet has parameters cut
, expressions
, drop_branches
, keep_branches
.