Dump an H5 File
Overview
Teaching: 10 min
Exercises: 10 minQuestions
How do I get data to Keras
Objectives
Get a data file you can use
Get your credentials etc
The script in the analysisbase-launcher
expects that your key is in
~/.ssh/id_rsa
. If you don’t want to use this you can run
make-key.sh
and copy the output into your gitlab SSH keys.
Set up your working directory
You’ll need to make your input data available within the image. Copy
it into the work/data
directory.
Building the dumper
You can load up the image by running ./run-docker.sh
script. This
should (hopefully) drop you into a working image. I’ve added most of
the annoying setup stuff to the .bashrc
so you should be ready to
clone and build.
Clone and build the dumper with
ssh://git@gitlab.cern.ch:7999/deep-sets-example/higgs-regression-dumper.git
Then make a build directory and build it.
Producing an output file
You’ll have to run something like
./AnalysisPayload ../data/DAOD_FTAG5.p3870.pool.root
Your path may be something else.
This will produce an output file called jets.h5
. Let’s take a look
at it:
h5ls -v jet.h5
This prints a lot of information:
Opened "jets.h5" with sec2 driver.
jets Dataset {18757/Inf}
Location: 1:1552
Links: 1
Chunks: {2048} 24576 bytes
Storage: 225084 logical bytes, 201354 allocated bytes, 111.79% utilization
Filter-0: deflate-1 OPT {7}
Type: struct {
"GhostHBosonsPt" +0 native float
"GhostBHadronsFinalPt" +4 native float
"pt" +8 native float
} 12 bytes
subjets Dataset {18757/Inf, 5/5}
Location: 1:800
Links: 1
Chunks: {2048, 5} 122880 bytes
Storage: 1125420 logical bytes, 517620 allocated bytes, 217.42% utilization
Filter-0: deflate-1 OPT {7}
Type: struct {
"pt" +0 native float
"rnnip_pb" +4 native float
"JetFitter_mass" +8 native float
} 12 bytes
We’ll be trying to learn GhostHBosonPt
from the other
variables. Note that subjets
is a 2d array, we’re saving all the
subjets in each large-R higgs jet.
Key Points
We always reformat our data before M’Learnin