Tip

An interactive online version of this notebook is available, which can be accessed via

Alternatively, you may download this notebook and run it offline.

Defining output variables to reduce memory usage#

[1]:

%pip install "pybamm[plot,cite]" -q    # install PyBaMM if it is not installed
import time
import tracemalloc

import numpy as np

import pybamm


[notice] A new release of pip is available: 25.1.1 -> 26.1.1
[notice] To update, run: pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.

Solution storage and output variables#

A PyBaMM model is defined as a set of differential equations. For an ordinary differential equation (ODE) model, the model is defined as

\[\begin{align*} \frac{d\mathbf{y}}{dt} = \mathbf{f}(\mathbf{y}, t) \end{align*}\]

where \(\mathbf{y}\) is the state vector, and \(\mathbf{f}\) is the vector of ODEs (PyBaMM actually uses a semi-explicit DAE equation form, but for the purposes of this notebook we can just assume we’re dealing with an ODE). The state vector \(\mathbf{y}\) contains all the variables that are integrated over time by the solver.

For example, we can create a DFN model with PyBaMM and look at each of the state variables individually and all together in the final concatenated state vector \(\mathbf{y}\):

[2]:

model = pybamm.lithium_ion.DFN()
print("The model state variables are:", [var.name for var in model.rhs.keys()])
sim = pybamm.Simulation(model)
sim.build()
print(
    "The concatenated state vector is a vector of shape",
    sim.built_model.concatenated_rhs.shape,
)

The model state variables are: ['Discharge capacity [A.h]', 'Throughput capacity [A.h]', 'Negative particle concentration [mol.m-3]', 'Positive particle concentration [mol.m-3]', 'Porosity times concentration [mol.m-3]']
The concatenated state vector is a vector of shape (862, 1)

The user is also interested in a number of output variables \(\mathbf{g}\). Once the simulation is complete and the model has been solved to get \(\mathbf{y}\), the output variables \(\mathbf{g}\) can be calculated from \(\mathbf{y}\) using a function \(\mathbf{h}\):

\[\begin{align*} \mathbf{g} = \mathbf{h}(\mathbf{y}) \end{align*}\]

where \(\mathbf{h}\) is a function that extracts the variables of interest from the state vector. Each model in PyBaMM has many such variables of interest, and the user can choose which ones to extract and plot. E.g. perhaps the user wants to solve the DFN model and plot the positive electrode capacity? In this case, the expression for the variable of interest is a function \(h\) of the state vector \(\mathbf{y}\), and this function is defined in the DFN model using a PyBaMM expression tree:

[3]:

print(model.variables["Positive electrode capacity [A.h]"])

0.0002777777777777778 * yz-average(x-average(Positive electrode active material volume fraction)) * Positive electrode thickness [m] * Electrode width [m] * Electrode height [m] * Number of electrodes connected in parallel to make a cell * Maximum concentration in positive electrode [mol.m-3] * F

PyBaMM doesn’t evaluate this expression until the user asks for it when they extract it from the solution object. Instead, it stores the solution vector \(\mathbf{y}\) at each time point, and then evaluates the expression for the variable of interest \(\mathbf{g}\) at each time point when the user asks for it. In the code below, the evaluation of the positive electrode capacity is done by calling solution["Positive electrode capacity [A.h]"].

[4]:

# Solve the model, storing the state vector at each time step
solution = sim.solve([0, 3600])

# Extract the positive electrode capacity using the function $h(y)$ and the stored state vector $y$
pos_elec_capacity = solution["Positive electrode capacity [A.h]"]

During a solve, the PyBaMM solver needs to store the solution vector \(\mathbf{y}\) at each time step. By default, PyBaMM stores the entire vector, so if the state vector is of size \(n\) and the solver steps through \(m\) time points, the memory usage is \(O(nm)\). This can be a problem for large models, especially when running on a machine with limited memory, or when running a very long-running simulation that needs to store many time points.

Calculating output variables on-the-fly#

However, it is likely that the user is only interested in a small subset of the variables, and that the size of these output variables \(\mathbf{g}\) is much smaller than the size of the state vector \(\mathbf{y}\). Therefore, it could be wasteful to store the entire state vector \(\mathbf{y}\). To address this, PyBaMM has a feature called “output variables”. This allows the user to specify a list of variables that they are interested in up front (i.e. before the solve). During the solve, PyBaMM will evaluate the output variables of interest at each time point and discard the rest of the state vector. At the end of the solve, the pybamm.Solution object will only contain the output variables of interest, as well as the full state vector at the last time point (so that other simulations can be run from this point).

Let’s see how this works in practice. We’ll start by solving a DFN model with a long-running experimental protocol, and assume we are only interested in the terminal voltage. We setup a PyBaMM simulation as normal and solve it, and once the solver has finished we extract the terminal voltage from the solution object. To evaluate the memory usage of this simulation, we will use the tracemalloc library, which allows us to track the total amount of memory allocated by the Python interpreter. We’ll also keep track of the time taken to solve the model using the time library.

[5]:

model = pybamm.lithium_ion.DFN()
solver = pybamm.IDAKLUSolver()
experiment = pybamm.Experiment(
    [
        "Discharge at 0.1 A for 1 hour",
        "Charge at 0.1 A for 1 hour",
    ]
    * 100
)

tracemalloc.start()
sim = pybamm.Simulation(model, solver=solver, experiment=experiment)
time_start = time.perf_counter()
sol = sim.solve()
t_eval = np.linspace(0, 3600 * 10, 100)
voltage = sol["Terminal voltage [V]"](t_eval)
time_end = time.perf_counter()
print("Time to solve: ", time_end - time_start)

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")
total_size = sum(stat.size for stat in top_stats)
print("Total allocated size: %.1f MB" % (total_size / 10**6))
tracemalloc.stop()

Time to solve:  2.17429095800253
Total allocated size: 11.6 MB

So for about 10 seconds of compute we have already generated 202 MB of data that is in memory, for longer-running simulations this can quickly become a problem. We will now use the output variables feature to reduce the memory usage. When the solver is created, we pass in a list of output variables that we are interested in. The solver will then only store these variables at each time point, rather than the entire state vector.

[6]:

import time
import tracemalloc

import numpy as np

import pybamm

model = pybamm.lithium_ion.DFN()
solver = pybamm.IDAKLUSolver(output_variables=["Voltage [V]"])
experiment = pybamm.Experiment(
    [
        "Discharge at 0.1 A for 1 hour",
        "Charge at 0.1 A for 1 hour",
    ]
    * 100
)

tracemalloc.start()
sim = pybamm.Simulation(model, solver=solver, experiment=experiment)
time_start = time.perf_counter()
sol = sim.solve()
t_eval = np.linspace(0, 3600 * 10, 100)
voltage = sol["Voltage [V]"](t_eval)
time_end = time.perf_counter()
print("Time to solve: ", time_end - time_start)

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")
total_size = sum(stat.size for stat in top_stats)
print("Total allocated size: %.1f MB" % (total_size / 10**6))
tracemalloc.stop()

Time to solve:  2.2499296669993782
Total allocated size: 14.7 MB

That is, the solve time is the same, but the memory usage has been reduced by an order of magnitude.

Limitations#

An obvious downside is that we can only plot the variables that we have stored, if any other variables are accessed the solution object will raise an error.

[7]:

try:
    sol["X-averaged positive particle surface concentration [mol.m-3]"]
except KeyError as e:
    print("Error:", e)

Error: "Cannot process variable 'X-averaged positive particle surface concentration [mol.m-3]' as it was not part of the solve. Please re-run the solve with `output_variables` set to include this variable."

Storing only the first and last sample of each step#

For very long ageing experiments, even with output_variables set, storing one sample per intra-step time point still grows the solution linearly with the number of steps. If post-processing only needs the value at the start and end of each step, the IDAKLU solver can be told to keep only those two samples per integration window via store_first_last=True.

It composes with output_variables — the two flags address different axes (which variables are kept, and how many samples per window are kept), so they can be combined for the largest memory savings on long experiments.

Note: with store_first_last=True, IDAKLU’s Hermite interpolation is disabled and queries at intra-step times fall back to linear interpolation across the whole step, so this is not appropriate when post-processing reads non-endpoint times within a step.

[8]:

model = pybamm.lithium_ion.DFN()
solver = pybamm.IDAKLUSolver(
    output_variables=["Voltage [V]"],
    store_first_last=True,
)
experiment = pybamm.Experiment(
    [
        "Discharge at 0.1 A for 1 hour",
        "Charge at 0.1 A for 1 hour",
    ]
    * 100
)

tracemalloc.start()
sim = pybamm.Simulation(model, solver=solver, experiment=experiment)
time_start = time.perf_counter()
sol = sim.solve()
time_end = time.perf_counter()
print("Time to solve: ", time_end - time_start)

# Each sub-solution now holds just two samples (start and end of step).
print("Samples per step:", {sub.t.size for sub in sol.sub_solutions})

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")
total_size = sum(stat.size for stat in top_stats)
print("Total allocated size: %.1f MB" % (total_size / 10**6))
tracemalloc.stop()

Time to solve:  2.265932290996716
Samples per step: {2}
Total allocated size: 13.7 MB