Very much a work in progress
To determine how (power) efficient our new Jetson is for numerical computations, we shall consider a few things, one at a time:
- A single core of the ARM Cortex CPU
- The low power core of the Cortex
- The NVIDIA GPU
- All the CPU+GPU cores on the Jetson
As well as what to measure, there are questions over how to measure:
- using a mains power meter to get the macro measurement of energy consumed by the Jetson
- component-based measurement particularly for the GPU or the CPU (but not for the fan?) eg via the voltage drop
But things are never simple. For example, there's a lot of questions along the way:
- Is our code achieving good compute performance (ie are we efficiently using the cycles)?
- What is the theoretical peak performance of a Cortex core?
- How do we ensure the Jetson OS isn't burning energy on housekeeping, X11 and so on?
- Which code should we use for such benchmarking?
Let's start with the last question: which code should we use for this benchmarking?
This is perhaps the least importance, since once we have a methodology for a given code, then we can re-apply all the steps for a suite of codes and then examine their results to obtain potential profiles of energy consumption per algorithmic implementation.
Therefore, for this initial experiment, we have chosen to use matrix-matrix multiplication. For square matrices with each dimension of length n, matrix-matrix multiplication requires n-cubed operations on n-squared data. This ratio of compute-to-date of n will come in useful later.