This is the class that allows users to access the nengo_dl
backend. This can be used as a drop-in replacement for nengo.Simulator
(i.e., simply replace any instance of nengo.Simulator
with
nengo_dl.Simulator
and everything will continue to function as
normal).
In addition, the Simulator exposes features unique to the
NengoDL backend, such as Simulator.train
.
The full class documentation can be viewed in the API Reference; here we will explain the practical usage of the Simulator in more depth.
The NengoDL Simulator
has a number of optional arguments, beyond
those in nengo.Simulator
, which control features specific to
the nengo_dl
backend.
This specifies the computational device on which the simulation will
run. The default is None
, which means that operations will be assigned
according to TensorFlow’s internal logic (generally speaking, this means that
things will be assigned to the GPU if tensorflow-gpu
is installed,
otherwise everything will be assigned to the CPU). The device can be set
manually by passing the TensorFlow device specification to this
parameter. For example, setting device="/cpu:0"
will force everything
to run on the CPU. This may be worthwhile for small models, where the extra
overhead of communicating with the GPU outweighs the actual computations. On
systems with multiple GPUs, device="/gpu:0"
/"/gpu:1"
/etc. will select
which one to use.
This controls how many simulation iterations are executed each time through the outer simulation loop. That is, we could run 20 timesteps as
for i in range(20):
<run 1 step>
or
for i in range(5):
<run 1 step>
<run 1 step>
<run 1 step>
<run 1 step>
This is an optimization process known as “loop unrolling”, and
unroll_simulation
controls how many simulation steps are unrolled. The
first example above would correspond to unroll_simulation=1
, and the
second would be unroll_simulation=4
.
Unrolling the simulation will result in faster simulation speed, but increased build time and memory usage.
In general, unrolling the simulation will have no impact on the output of a
simulation. The only case in which unrolling may have an impact is if
the number of simulation steps is not evenly divisible by
unroll_simulation
. In that case extra simulation steps will be executed,
and then data will be truncated to the correct number of steps. However, those
extra steps could still change the internal state of the simulation, which
will affect any subsequent calls to sim.run
. So it is recommended that the
number of steps always be evenly divisible by unroll_simulation
.
nengo_dl
allows a model to be simulated with multiple simultaneous inputs,
processing those values in parallel through the network. For example, instead
of executing a model three times with three different inputs, the model can
be executed once with those three inputs in parallel. minibatch_size
specifies how many inputs will be processed at a time. The default is
None
, meaning that this feature is not used and only one input will be
processed at a time (as in standard Nengo simulators).
In order to take advantage of the parallel inputs, multiple inputs need to
be passed to Simulator.run
via the data
argument. This
is discussed in more detail below.
When using Simulator.train
, this parameter controls how many items
from the training data will be used for each optimization iteration.
This can be used to specify an output directory if you would like to export data from the simulation in a format that can be visualized in TensorBoard.
To view the collected data, run the command
tensorboard --logdir <tensorboard_dir>
(where tensorboard_dir
is the directory name passed to tensorboard
),
then open a web browser and navigate to http://localhost:6006.
By default the TensorBoard output will only contain a visualization of the TensorFlow graph constructed for this Simulator. However, TensorBoard can also be used to track various aspects of the simulation throughout the training process; see the sim.train documentation for details.
Repeated Simulator calls with the same output directory will be organized into
subfolders according to run number (e.g., <tensorboard_dir>/run_0
).
Simulator.run
(and its variations Simulator.step
/
Simulator.run_steps
) also have some optional parameters beyond those
in the standard Nengo simulator.
This parameter can be used to override the value of any
input Node
in a model (an input node is defined as
a node with no incoming connections). For example
n_steps = 5
with nengo.Network() as net:
node = nengo.Node([0])
p = nengo.Probe(node)
with nengo_dl.Simulator(net) as sim:
sim.run_steps(n_steps)
will execute the model in the standard way, and if we check the output of
node
print(sim.data[p])
>>> [[ 0.] [ 0.] [ 0.] [ 0.] [ 0.]]
we see that it is all zero, as defined.
data
is specified as a
dictionary of {my_node: override_value}
pairs, where my_node
is the
Node to be overridden and override_value
is a numpy array with shape
(minibatch_size, n_steps, my_node.size_out)
that gives the Node output
value on each simulation step. For example, if we instead run the model via
sim.run_steps(n_steps, data={node: np.ones((1, n_steps, 1))})
print(sim.data[p])
>>> [[ 1.] [ 1.] [ 1.] [ 1.] [ 1.]]
we see that the output of node
is all ones, which is the override
value we specified.
data
is usually used in concert with the minibatching feature of
nengo_dl
(see above). nengo_dl
allows multiple
inputs to be processed simultaneously, but when we construct a
Node
we can only specify one value. For example, if we
use minibatching on the above network
mini = 3
with nengo_dl.Simulator(net, minibatch_size=mini) as sim:
sim.run_steps(n_steps)
print(sim.data[p])
>>> [[[ 0.] [ 0.] [ 0.] [ 0.] [ 0.]]
[[ 0.] [ 0.] [ 0.] [ 0.] [ 0.]]
[[ 0.] [ 0.] [ 0.] [ 0.] [ 0.]]]
we see that the output is an array of zeros with size
(mini, n_steps, 1)
. That is, we simulated 3 inputs
simultaneously, but those inputs all had the same value (the one we defined
when the Node was constructed) so it wasn’t very
useful. To take full advantage of the minibatching we need to override the
node values, so that we can specify a different value for each item in the
minibatch:
with nengo_dl.Simulator(net, minibatch_size=mini) as sim:
sim.run_steps(n_steps, data={
node: np.zeros((mini, n_steps, 1)) + np.arange(mini)[:, None, None]})
print(sim.data[p])
>>> [[[ 0.] [ 0.] [ 0.] [ 0.] [ 0.]]
[[ 1.] [ 1.] [ 1.] [ 1.] [ 1.]]
[[ 2.] [ 2.] [ 2.] [ 2.] [ 2.]]]
Here we can see that 3 independent inputs have been processed during the simulation. In a simple network such as this, minibatching will not make much difference. But for larger models it will be much more efficient to process multiple inputs in parallel rather than one at a time.
If set to True
, profiling data will be collected while the simulation
runs. This will significantly slow down the simulation, so it should be left
on False
(the default) in most cases. It is mainly used by developers,
in order to help identify performance bottlenecks.
Profiling data will be saved to a file named nengo_dl_profile.json
. It
can be viewed by opening a Chrome browser, navigating to
chrome://tracing and loading the nengo_dl_profile.json
file. A
filename can be passed instead of True
, to change the output filename.
Note that in order for GPU profiling to work, you need to manually add
<cuda>/extras/CUPTI/libx64
to the LD_LIBRARY_PATH
environment variable
(where <cuda>
is your CUDA installation directory).