Estimating model energy¶
One of the main motivations for using spiking methods is the potential for significant energy savings over standard techniques. Thus it is useful to be able to estimate how much energy would be used by a model on different devices, so that we can get an idea of how different model/device parameters affect the energy usage before pursuing a full deployment.
[1]:
import warnings
import numpy as np
import tensorflow as tf
import keras_spiking
warnings.simplefilter("ignore")
tf.get_logger().addFilter(lambda rec: "Tracing is expensive" not in rec.msg)
2023-02-08 15:43:00.423598: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Assumptions¶
It is important to keep in mind that actual power usage will be heavily dependent on the specific details of the underlying software and hardware implementation. The numbers provided by KerasSpiking should be taken as very rough estimates only, and they rely on a number of assumptions:
Device specifications: In order to estimate the energy used by a model on a particular device, we need to know how much energy is used per synaptic operation/neuron update. We rely on published data for these numbers (see our sources for CPU/GPU/ARM, Loihi, and SpiNNaker 1/2). Energy numbers in practice can differ significantly from published results.
Overhead: We do not account for any overhead in the energy estimates (e.g., the cost of transferring data on and off a device). We only estimate the energy usage of internal model computations (synaptic operations and neuron updates). In practice, overhead can be a significant contributor to the energy usage of a model.
Spiking implementation: When estimating the energy usage for spiking devices, such as Loihi and Spinnaker, we assume that the model being estimated can be fully converted to a spiking implementation for deployment on the device (even if the input model has non-spiking elements). For example, if the model contains
tf.keras.layers.Activation("relu")
layers (non-spiking), we assume that on a spiking device those layers will be converted to something equivalent tokeras_spiking.SpikingActivation("relu")
, and that any connecting layers (e.g.tf.keras.layers.Dense
) are applied in an event-based fashion (i.e., processing only occurs when the input layer emits a spike). In practice, it is not trivial to map a neural network to a spiking device in this way, and implementation details can significantly affect energy usage. Nengo and NengoDL are designed to make this easier.
On non-spiking devices, such as CPU and GPU, we assume that the network runs as a traditional (non-spiking) ANN, and is able to compute the output without iterating over time using non-spiking neurons.
Using ModelEnergy¶
The keras_spiking.ModelEnergy
class provides the entry point for energy estimation. It takes a Keras model as input, and computes relevant statistics for that model.
[2]:
# build an example model
inp = x = tf.keras.Input((28, 28, 1))
x = tf.keras.layers.Conv2D(filters=2, kernel_size=(7, 7))(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(units=128)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Dense(units=10)(x)
model = tf.keras.Model(inp, x)
model.summary()
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 28, 28, 1)] 0
conv2d (Conv2D) (None, 22, 22, 2) 100
re_lu (ReLU) (None, 22, 22, 2) 0
flatten (Flatten) (None, 968) 0
dense (Dense) (None, 128) 124032
re_lu_1 (ReLU) (None, 128) 0
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 125,422
Trainable params: 125,422
Non-trainable params: 0
_________________________________________________________________
2023-02-08 15:43:06.953653: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-08 15:43:07.495095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10784 MB memory: -> device: 0, name: Tesla K80, pci bus id: 0001:00:00.0, compute capability: 3.7
[3]:
# estimate model energy
energy = keras_spiking.ModelEnergy(model)
energy.summary(print_warnings=False)
Layer (type) |Output shape |Param #|Conn #|Neuron #|J/inf (cpu)
--------------------|-------------------|-------|------|--------|-----------
input_1 (InputLayer)|[(None, 28, 28, 1)]| 0| 0| 0| 0
conv2d (Conv2D) | (None, 22, 22, 2)| 100| 47432| 0| 0.00041
re_lu (ReLU) | (None, 22, 22, 2)| 0| 0| 968| 8.3e-06
flatten (Flatten) | (None, 968)| 0| 0| 0| 0
dense (Dense) | (None, 128)| 124032|123904| 0| 0.0011
re_lu_1 (ReLU) | (None, 128)| 0| 0| 128| 1.1e-06
dense_1 (Dense) | (None, 10)| 1290| 1280| 0| 1.1e-05
============================================================================
Total energy per inference [Joules/inf] (cpu): 1.49e-03
The first three columns show the layer name/type, the output shape, and the number of parameters in each layer, and are identical to the corresponding columns in model.summary()
.
The next column shows the number of connections; two units are connected if a change in the input unit’s value changes the output unit’s value (assuming non-zero parameters). In a dense connection, the number of connections is the input size times the output size (since each output unit is connected to each input unit); in a convolutional connection, it equals the kernel size times the number of input filters times the output shape. Note that the number of connections can be quite different than
the number of parameters, particularly for layers like Conv2D
where parameters are shared between many connections.
The next column shows the number of neurons in a layer; for activation layers, this equals the number of output units (i.e. the output size), otherwise it is zero.
The last column shows the estimated energy consumption in Joules per inference on a CPU (specifically an Intel i7-4960X). All comparisons made by ModelEnergy
are done using energy per inference, to account for the fact that spiking devices must iterate over a number of timesteps to get an accurate inference, whereas non-spiking devices (such as the CPU here) do not require such iteration. This number represents a lower bound on the amount of energy that might be used by a CPU, since it does
not include any overhead, such as energy required to get data on and off the device.
We can customize the summary by specifying the columns we want displayed (see the documentation for the available options, and here for the built-in devices).
[4]:
energy.summary(
columns=(
"name",
"energy cpu",
"energy gpu",
"synop_energy cpu",
"synop_energy gpu",
"neuron_energy cpu",
"neuron_energy gpu",
),
print_warnings=False,
)
Layer (type) |J/inf (cpu)|J/inf (gpu)|Synop J/inf (|Synop J/inf (|Neuron J/inf (|Neuron J/inf (
----------------|-----------|-----------|-------------|-------------|--------------|--------------
input_1 (InputLa| 0| 0| 0| 0| 0| 0
conv2d (Conv2D) | 0.00041| 1.4e-05| 0.00041| 1.4e-05| 0| 0
re_lu (ReLU) | 8.3e-06| 2.9e-07| 0| 0| 8.3e-06| 2.9e-07
flatten (Flatten| 0| 0| 0| 0| 0| 0
dense (Dense) | 0.0011| 3.7e-05| 0.0011| 3.7e-05| 0| 0
re_lu_1 (ReLU) | 1.1e-06| 3.8e-08| 0| 0| 1.1e-06| 3.8e-08
dense_1 (Dense) | 1.1e-05| 3.8e-07| 1.1e-05| 3.8e-07| 0| 0
==================================================================================================
Total energy per inference [Joules/inf] (cpu): 1.49e-03
Total energy per inference [Joules/inf] (gpu): 5.21e-05
Here, we can see the individual components contributing to the energy usage on each device. The energy spent on synops (short for “synaptic operations”) is used to multiply values by connection weights; on non-spiking hardware, this has to be done for all connections, but on spiking hardware it is only done when a pre-synaptic neuron spikes. The energy spent on neurons is used to compute neural non-linearities; these neuron updates must happen for all neurons, regardless of input.
ModelEnergy has one other parameter, example_data
. This data will be passed to the model and used to compute the average firing rate of each layer. This is necessary information for estimating the energy usage of spiking devices, as the number of synaptic updates that need to be performed will be proportional to the firing rates (but has no impact on non-spiking devices, as they perform all synaptic updates every timestep regardless).
[5]:
energy = keras_spiking.ModelEnergy(model, example_data=np.ones((32, 28, 28)))
energy.summary(
columns=(
"name",
"rate",
"synop_energy cpu",
"synop_energy loihi",
"neuron_energy cpu",
"neuron_energy loihi",
),
print_warnings=False,
)
2023-02-08 15:43:08.226443: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8401
1/1 [==============================] - 1s 867ms/step
Layer (type) |Rate [Hz]|Synop J/inf (cp|Synop J/inf (loih|Neuron J/inf (cp|Neuron J/inf (loih
------------------|---------|---------------|-----------------|----------------|------------------
input_1 (InputLaye| 1| 0| 0| 0| 0
conv2d (Conv2D) | 1| 0.00041| 1.3e-09| 0| 0
re_lu (ReLU) | 0.5| 0| 0| 8.3e-06| 7.8e-08
flatten (Flatten) | 0.28| 0| 0| 0| 0
dense (Dense) | 0.28| 0.0011| 9.5e-10| 0| 0
re_lu_1 (ReLU) | 0.36| 0| 0| 1.1e-06| 1e-08
dense_1 (Dense) | 0.2| 1.1e-05| 7e-12| 0| 0
2023-02-08 15:43:08.606407: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-02-08 15:43:08.606734: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-02-08 15:43:08.606759: W tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2023-02-08 15:43:08.607093: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-02-08 15:43:08.607150: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] INTERNAL: Failed to launch ptxas
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
We can see that if we increase the magnitude of the input (and thereby increase the firing rate), the energy estimate increases for the spiking device (Loihi), but not the CPU. Note that only the synaptic energy increases, the neuron energy is unaffected (since it is not dependent on input activity).
[6]:
energy = keras_spiking.ModelEnergy(model, example_data=np.ones((32, 28, 28, 1)) * 5)
energy.summary(
columns=(
"name",
"rate",
"synop_energy cpu",
"synop_energy loihi",
"neuron_energy cpu",
"neuron_energy loihi",
),
print_warnings=False,
)
1/1 [==============================] - 0s 51ms/step
Layer (type) |Rate [Hz]|Synop J/inf (cp|Synop J/inf (loih|Neuron J/inf (cp|Neuron J/inf (loih
------------------|---------|---------------|-----------------|----------------|------------------
input_1 (InputLaye| 5| 0| 0| 0| 0
conv2d (Conv2D) | 5| 0.00041| 6.4e-09| 0| 0
re_lu (ReLU) | 2.5| 0| 0| 8.3e-06| 7.8e-08
flatten (Flatten) | 1.4| 0| 0| 0| 0
dense (Dense) | 1.4| 0.0011| 4.7e-09| 0| 0
re_lu_1 (ReLU) | 1.8| 0| 0| 1.1e-06| 1e-08
dense_1 (Dense) | 1| 1.1e-05| 3.5e-11| 0| 0
Adding custom devices¶
We can use ModelEnergy.register_device
to add the specification for new devices, thereby allowing ModelEnergy to provide energy estimates for those devices. This function takes four parameters:
name
: An identifying name for the device.energy_per_synop
: The energy (in Joules) required for one synaptic update. A synaptic update is the computation that occurs whenever some input is received by a neuron and multiplied by a weight.energy_per_neuron
: The energy (in Joules) required for one neuron update. A neuron update is the computation that occurs in a neuron every timestep (regardless of whether or not it has received some input).spiking
: Whether or not this is a spiking, or event-based, device. That is, do all synaptic updates occur every timestep (i.e. all the output of one layer is communicated to the next layer every timestep), or do synaptic updates only occur when a neuron in the input layer emits a spike?
In addition to registering new devices, this can be used to modify the assumptions for existing devices. For example, if you think the gpu
device specs are too optimistic, you could increase the energy estimates and see what effect that has.
[7]:
keras_spiking.ModelEnergy.register_device(
"my-gpu", energy_per_synop=1e-9, energy_per_neuron=2e-9, spiking=False
)
energy.summary(columns=("name", "energy gpu", "energy my-gpu"), print_warnings=False)
Layer (type) |J/inf (gpu)|J/inf (my-gpu)
--------------------|-----------|--------------
input_1 (InputLayer)| 0| 0
conv2d (Conv2D) | 1.4e-05| 4.7e-05
re_lu (ReLU) | 2.9e-07| 1.9e-06
flatten (Flatten) | 0| 0
dense (Dense) | 3.7e-05| 0.00012
re_lu_1 (ReLU) | 3.8e-08| 2.6e-07
dense_1 (Dense) | 3.8e-07| 1.3e-06
===============================================
Total energy per inference [Joules/inf] (gpu): 5.21e-05
Total energy per inference [Joules/inf] (my-gpu): 1.75e-04
Temporal processing¶
Whenever we are working with spiking models it is important to think about how time affects the model. For example, often when working with spiking models we need to run them for multiple timesteps in order to get an accurate estimate of the model’s output (see this example for more details). So in order to make a fair comparison between spiking and non-spiking devices (which only need a single timestep to compute their output), we can specify how many timesteps per inference we expect to run on spiking devices.
[8]:
energy.summary(
columns=("name", "energy cpu", "energy loihi"),
timesteps_per_inference=10,
print_warnings=False,
)
Layer (type) |J/inf (cpu)|J/inf (loihi)
--------------------|-----------|-------------
input_1 (InputLayer)| 0| 0
conv2d (Conv2D) | 0.00041| 6.4e-08
re_lu (ReLU) | 8.3e-06| 7.8e-07
flatten (Flatten) | 0| 0
dense (Dense) | 0.0011| 4.7e-08
re_lu_1 (ReLU) | 1.1e-06| 1e-07
dense_1 (Dense) | 1.1e-05| 3.5e-10
==============================================
Total energy per inference [Joules/inf] (cpu): 1.49e-03
Total energy per inference [Joules/inf] (loihi): 1.00e-06
Note that if we use more timesteps per inference that increases the energy estimate for the spiking device, but not the non-spiking:
[9]:
energy.summary(
columns=("name", "energy cpu", "energy loihi"),
timesteps_per_inference=20,
print_warnings=False,
)
Layer (type) |J/inf (cpu)|J/inf (loihi)
--------------------|-----------|-------------
input_1 (InputLayer)| 0| 0
conv2d (Conv2D) | 0.00041| 1.3e-07
re_lu (ReLU) | 8.3e-06| 1.6e-06
flatten (Flatten) | 0| 0
dense (Dense) | 0.0011| 9.5e-08
re_lu_1 (ReLU) | 1.1e-06| 2.1e-07
dense_1 (Dense) | 1.1e-05| 7e-10
==============================================
Total energy per inference [Joules/inf] (cpu): 1.49e-03
Total energy per inference [Joules/inf] (loihi): 2.00e-06
We also need to consider the simulation timestep, dt
, being used in each of those inference timesteps. This will affect the number of spike events observed, since longer timesteps will result in more spikes (the number of spikes is proportional to firing_rate*timesteps_per_inference*dt
). Note that the dt
used on the device could be different than the dt
used when training/running the model in KerasSpiking. However, it will default to the same value as
keras_spiking.default.dt
.
[10]:
energy.summary(
columns=("name", "energy cpu", "energy loihi"), dt=0.001, print_warnings=False
)
Layer (type) |J/inf (cpu)|J/inf (loihi)
--------------------|-----------|-------------
input_1 (InputLayer)| 0| 0
conv2d (Conv2D) | 0.00041| 6.4e-09
re_lu (ReLU) | 8.3e-06| 7.8e-08
flatten (Flatten) | 0| 0
dense (Dense) | 0.0011| 4.7e-09
re_lu_1 (ReLU) | 1.1e-06| 1e-08
dense_1 (Dense) | 1.1e-05| 3.5e-11
==============================================
Total energy per inference [Joules/inf] (cpu): 1.49e-03
Total energy per inference [Joules/inf] (loihi): 1.00e-07
[11]:
energy.summary(
columns=("name", "energy cpu", "energy loihi"), dt=0.002, print_warnings=False
)
Layer (type) |J/inf (cpu)|J/inf (loihi)
--------------------|-----------|-------------
input_1 (InputLayer)| 0| 0
conv2d (Conv2D) | 0.00041| 1.3e-08
re_lu (ReLU) | 8.3e-06| 7.8e-08
flatten (Flatten) | 0| 0
dense (Dense) | 0.0011| 9.5e-09
re_lu_1 (ReLU) | 1.1e-06| 1e-08
dense_1 (Dense) | 1.1e-05| 7e-11
==============================================
Total energy per inference [Joules/inf] (cpu): 1.49e-03
Total energy per inference [Joules/inf] (loihi): 1.11e-07
We can see that increasing dt
increases the energy estimate on the spiking device, but not the non-spiking (since the output of a non-spiking neuron is not affected by dt
). Note that increasing dt
is not exactly equivalent to increasing timesteps_per_inference
, because dt
only increases the number of synaptic updates, it leaves the number of neuron updates unchanged.
One final factor to keep in mind regarding temporal models is how time is represented in the Keras model itself. The above models did not have a temporal component, they were simply a single-step feedforward model. ModelEnergy assumes that a non-temporal model represents the computations that will be performed each timestep on a spiking device. But we can also directly define a Keras model that operates over time, which gives us more control over how time is represented. For example, this is equivalent to our original model definition above, but we have added a time dimension:
[12]:
# add a new input dimension (None) representing
# temporal data of unknown length
inp = x = tf.keras.Input((None, 28, 28, 1))
# the TimeDistributed wrapper can be used to apply
# non-temporal layers to temporal inputs
x = tf.keras.layers.TimeDistributed(
tf.keras.layers.Conv2D(filters=2, kernel_size=(7, 7))
)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.TimeDistributed(tf.keras.layers.Flatten())(x)
# some layers, like Dense, can operate on temporal data
# without requiring a TimeDistributed wrapper
x = tf.keras.layers.Dense(units=128)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Dense(units=10)(x)
temporal_model = tf.keras.Model(inp, x)
temporal_model.summary()
Model: "model_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, None, 28, 28, 1) 0
]
time_distributed (TimeDistr (None, None, 22, 22, 2) 100
ibuted)
re_lu_2 (ReLU) (None, None, 22, 22, 2) 0
time_distributed_1 (TimeDis (None, None, 968) 0
tributed)
dense_2 (Dense) (None, None, 128) 124032
re_lu_3 (ReLU) (None, None, 128) 0
dense_3 (Dense) (None, None, 10) 1290
=================================================================
Total params: 125,422
Trainable params: 125,422
Non-trainable params: 0
_________________________________________________________________
If we compare the energy estimates of the temporal and non-temporal models we can see that they are the same, because KerasSpiking is automatically assuming that the non-temporal model will be translated into a temporal model:
[13]:
energy = keras_spiking.ModelEnergy(model, example_data=np.ones((32, 28, 28, 1)))
energy.summary(
columns=("name", "energy cpu", "energy loihi"),
timesteps_per_inference=10,
print_warnings=False,
)
1/1 [==============================] - 0s 50ms/step
Layer (type) |J/inf (cpu)|J/inf (loihi)
--------------------|-----------|-------------
input_1 (InputLayer)| 0| 0
conv2d (Conv2D) | 0.00041| 1.3e-08
re_lu (ReLU) | 8.3e-06| 7.8e-07
flatten (Flatten) | 0| 0
dense (Dense) | 0.0011| 9.5e-09
re_lu_1 (ReLU) | 1.1e-06| 1e-07
dense_1 (Dense) | 1.1e-05| 7e-11
==============================================
Total energy per inference [Joules/inf] (cpu): 1.49e-03
Total energy per inference [Joules/inf] (loihi): 9.10e-07
[14]:
# note that we add a temporal dimension to our example data (which does not need to be
# the same length as timesteps_per_inference)
energy = keras_spiking.ModelEnergy(
temporal_model, example_data=np.ones((32, 5, 28, 28, 1))
)
energy.summary(
columns=("name", "energy cpu", "energy loihi"),
timesteps_per_inference=10,
print_warnings=False,
)
1/1 [==============================] - 0s 78ms/step
Layer (type) |J/inf (cpu)|J/inf (loihi)
------------------------------------|-----------|-------------
input_2 (InputLayer) | 0| 0
time_distributed (TimeDistributed) | 0.00041| 1.3e-08
re_lu_2 (ReLU) | 8.3e-06| 7.8e-07
time_distributed_1 (TimeDistributed)| 0| 0
dense_2 (Dense) | 0.0011| 1.3e-08
re_lu_3 (ReLU) | 1.1e-06| 1e-07
dense_3 (Dense) | 1.1e-05| 1.1e-10
==============================================================
Total energy per inference [Joules/inf] (cpu): 1.49e-03
Total energy per inference [Joules/inf] (loihi): 9.14e-07
In the above example the model was assumed to be temporal because it had None
as the shape of the first (non-batch) axis. However, in some cases the Keras model definition can be ambiguous as to whether it represents a temporal or non-temporal model.
For example, consider the following model:
[15]:
inp = tf.keras.Input((28, 28))
x = tf.keras.layers.ReLU()(inp)
model = tf.keras.Model(inp, x)
model.summary()
Model: "model_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 28, 28)] 0
re_lu_4 (ReLU) (None, 28, 28) 0
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________
Is this a temporal model, with 28 neurons being applied for 28 timesteps? Or is it a non-temporal model, with 784 neurons being applied to a 28x28 2D input? The definition is ambiguous, so ModelEnergy
will assume that this is a non-temporal model:
[16]:
energy = keras_spiking.ModelEnergy(model)
energy.summary(
columns=("name", "output_shape", "neurons", "energy cpu"), print_warnings=False
)
Layer (type) |Output shape |Neuron #|J/inf (cpu)
--------------------|----------------|--------|-----------
input_3 (InputLayer)|[(None, 28, 28)]| 0| 0
re_lu_4 (ReLU) | (None, 28, 28)| 784| 6.7e-06
==========================================================
Total energy per inference [Joules/inf] (cpu): 6.74e-06
You can signal to ModelEnergy
that the ReLU layer should be considered temporal by wrapping it in a TimeDistributed
layer:
[17]:
inp = tf.keras.Input((28, 28))
x = tf.keras.layers.TimeDistributed(tf.keras.layers.ReLU())(inp)
model = tf.keras.Model(inp, x)
energy = keras_spiking.ModelEnergy(model)
energy.summary(
columns=("name", "output_shape", "neurons", "energy cpu"), print_warnings=False
)
Layer (type) |Output shape |Neuron #|J/inf (cpu)
------------------------------------|----------------|--------|-----------
input_4 (InputLayer) |[(None, 28, 28)]| 0| 0
time_distributed_2 (TimeDistributed)| (None, 28, 28)| 28| 2.4e-07
==========================================================================
Total energy per inference [Joules/inf] (cpu): 2.41e-07
Alternatively, we could have changed the shape of the first dimension to None
, in which case ModelEnergy will assume that that dimension represents time, without the need for a TimeDistributed wrapper.
[18]:
inp = tf.keras.Input((None, 28))
x = tf.keras.layers.ReLU()(inp)
model = tf.keras.Model(inp, x)
energy = keras_spiking.ModelEnergy(model)
energy.summary(
columns=("name", "output_shape", "neurons", "energy cpu"), print_warnings=False
)
Layer (type) |Output shape |Neuron #|J/inf (cpu)
--------------------|------------------|--------|-----------
input_5 (InputLayer)|[(None, None, 28)]| 0| 0
re_lu_6 (ReLU) | (None, None, 28)| 28| 2.4e-07
============================================================
Total energy per inference [Joules/inf] (cpu): 2.41e-07
Using SpikingActivation layers¶
You may have noticed above that we have been silencing some warnings. Let’s see what those warnings are:
[19]:
inp = tf.keras.Input((None, 32))
x = tf.keras.layers.Dense(units=64)(inp)
x = tf.keras.layers.ReLU()(x)
model = tf.keras.Model(inp, x)
energy = keras_spiking.ModelEnergy(model, example_data=np.ones((8, 10, 32)))
energy.summary(columns=("name", "output_shape", "energy loihi"), print_warnings=True)
1/1 [==============================] - 0s 53ms/step
Layer (type) |Output shape |J/inf (loihi)
--------------------|------------------|-------------
input_6 (InputLayer)|[(None, None, 32)]| 0
dense_4 (Dense) | (None, None, 64)| 5.6e-11
re_lu_7 (ReLU) | (None, None, 64)| 5.2e-09
=====================================================
Total energy per inference [Joules/inf] (loihi): 5.24e-09
* These are estimates only; see the documentation for a list of the assumptions being made.
https://bit.ly/3c3aKKH
* This model contains non-spiking activations that would not actually behave in the manner we
assume in these calculations; we assume these layers will be converted to spiking equivalents.
Consider using `keras_spiking.SpikingActivation` to make this conversion explicit.
The first warning highlights that these energy estimates are highly dependent on certain assumptions being made (which we discussed above).
The second warning is due to the fact that we are estimating energy on a spiking device but our model contains non-spiking activation functions (ReLU). When estimating energy on spiking devices we assume that neurons will be outputting spikes (in order to compute the number of synaptic updates that need to occur). But if we were to directly map this model to a spiking device 1) that may not even be possible, many spiking devices can only simulating spiking neurons, and 2) these neurons would be triggering synaptic updates on every timestep, not at the rates displayed above.
In order to provide a useful estimate for spiking devices, we assume that any non-spiking neurons will be converted to spiking neurons when the model is mapped to the device. However, that may not be a safe assumption; it is better to be explicit and directly convert the Keras model to a spiking one using keras_spiking.SpikingActivation
:
[20]:
inp = tf.keras.Input((None, 32))
x = tf.keras.layers.Dense(units=64)(inp)
x = keras_spiking.SpikingActivation("relu")(x)
model = tf.keras.Model(inp, x)
energy = keras_spiking.ModelEnergy(model, example_data=np.ones((8, 10, 32)))
energy.summary(columns=("name", "output_shape", "energy loihi"))
1/1 [==============================] - 0s 61ms/step
Layer (type) |Output shape |J/inf (loihi)
--------------------------------------|------------------|-------------
input_7 (InputLayer) |[(None, None, 32)]| 0
dense_5 (Dense) | (None, None, 64)| 5.6e-11
spiking_activation (SpikingActivation)| (None, None, 64)| 5.2e-09
=======================================================================
Total energy per inference [Joules/inf] (loihi): 5.24e-09
* These are estimates only; see the documentation for a list of the assumptions being made.
https://bit.ly/3c3aKKH
Deploying to real devices¶
Once we’ve gotten an idea what the energy usage might be for our model on different devices, we likely want to actually deploy our model on one of those devices and see how it performs in the real world. For this we can use Nengo, which provides a suite of tools for running neural models on different hardware platforms.
For example, suppose we would like to run the above model on Loihi. First, we can use the NengoDL converter to automatically convert our Keras model to a Nengo model:
[21]:
# pylint: disable=wrong-import-order
import nengo_dl
import nengo_loihi
converter = nengo_dl.Converter(model, temporal_model=True, inference_only=True)
The advantage of the Nengo ecosystem is that once we have a Nengo model, we can run that model on any Nengo-supported hardware platform. For example, if we would like to run on Loihi, we just create a nengo_loihi.Simulator
and run our model:
[22]:
with nengo_loihi.Simulator(converter.net) as sim:
sim.run_steps(10)
print(sim.data[converter.outputs[model.output]].shape)
(10, 64)
Since we don’t have an actual Loihi board hooked up here this is just running in an emulator, but if we had a physical board attached the code would be the same (and NengoLoihi would automatically use the board). And that’s all that would be required to deploy your model to a spiking device, and start seeing how it performs in the real world!
Summary¶
We can use ModelEnergy
to estimate the energy usage of a Keras model on different hardware platforms. We have looked at the various parameters of these estimates (example data, device specifications, the number of timesteps per inference, and the hardware simulation timestep), as well as how we can customize the input Keras model in different ways (adding temporal features or SpikingActivation layers).
As we mentioned at the start, it is important to keep in mind that these numbers are only rough estimates; actual energy usage will be heavily dependent on the details of the hardware and software implementation when mapping your model to a physical device.
After you have explored different options using ModelEnergy, you will likely want to actually deploy your model on one of these devices to see how it performs in the real world. This is where the Nengo ecosystem can be very helpful, as it allows you to run a neuron model on any Nengo-supported platform (non-spiking devices like standard CPUs and GPUs, or spiking devices like Loihi or SpiNNaker). You can use the NengoDL Converter to automatically convert a Keras model (including KerasSpiking) to a Nengo network, and then you can use any Nengo backend (e.g. NengoDL, NengoOCL, or NengoLoihi) to run that network on different hardware platforms. See this example for an end-to-end walkthrough of deploying a Keras model to Loihi.