Generate Battery Aging and EIS AI Datasets with PyBaMM
Generate Battery Aging and EIS AI Datasets with PyBaMM

Generate Battery Aging and EIS AI Datasets with PyBaMM

Generating massive, properly labeled battery datasets using PyBaMM transcends simple computational for-loops; it is an intensive problem of numerical design of experiments (DoE). To generate data that truly empowers AI models without teaching them simulator-induced artifacts, one must rigorously control the physical boundary value problem, parameter space topology, temperature gradients, electrochemical state vector snapshots, frequency-domain impedance grids, and the algebraic solver tolerances.

The companion lab provides a foundational implementation. However, scaling this requires deploying batch processes with high --samples settings. Keep in mind: these outputs represent deterministic solutions to systems of partial differential equations (PDEs)—they are strictly physics-based synthetic data, and must undergo non-linear least squares parameter identification to match real-world battery behavior.

Sample SOH and local ECM resistance labels versus cycle
The trajectory of SOH degradation and internal resistance evolution. Rigorous engineering requires calibration of these underlying growth mechanisms to experimental cyclic aging.

1. The Formalized Dataset Objective

A mathematically rigorous AI battery dataset is a collection of tuples:

$$ mathcal{D} = {(mathbf{x}_i, mathbf{y}_i, mathbf{m}_i)}_{i=1}^N $$

Here, $mathbf{x}_i in mathbb{R}^p$ represents the observable feature vector (time-series voltage transients, complex impedance $Z(omega)$, operational protocols). The target vector $mathbf{y}_i in mathbb{R}^q$ represents internal unobservables (SOH, Loss of Lithium Inventory [LLI], Loss of Active Material [LAM]). Crucially, $mathbf{m}_i$ is the exact metadata graph encapsulating the structural model assumptions, thermodynamic parameters, and solver configuration. Without $mathbf{m}_i$, the physical causality of the mapping $f: mathbf{x} rightarrow mathbf{y}$ is undefined.

2. Degradation Kinetics and State Labels

When extracting labels, we are actually querying the integrated state of specific degradation PDEs. For example, the rate of Solid Electrolyte Interphase (SEI) thickness $L_{SEI}$ growth, which drives LLI, is often modeled via solvent reduction kinetics:

$$ frac{partial L_{SEI}}{partial t} = frac{M_{SEI}}{rho_{SEI} z F} j_{SEI} expleft( -frac{E_a}{R T} right) expleft( -frac{alpha F (phi_s – phi_e – U_{SEI})}{R T} right) $$

  • SOH: Defined as the integrated discharge capacity over the nominal capacity $Q_k / Q_0$.
  • RUL-to-80: The projected cyclic horizon until the manifold intersects $SOH = 0.8$.
  • LLI: The cumulative integral of parasitic side reaction currents.
  • LAM: Often governed by particle fracture mechanics, driven by the stress tensor $sigma_{t, max}$ exceeding the material’s yield strength.
  • EIS Vectors: Generated via perturbation and linearization of the full DFN Jacobian in the frequency domain, capturing charge-transfer semi-circles and Warburg diffusion tails.

3. Non-Linear Least Squares Parameter Identification

Before synthetic data can be trusted, the underlying parameter distributions must be fitted to experimental data. This is typically achieved by minimizing an objective function $J(mathbf{p})$ representing the sum of squared residuals between experimental voltage $V_{exp}$ and the PyBaMM simulated voltage $V_{sim}(mathbf{p})$, utilizing optimizers like L-BFGS-B or Nelder-Mead (e.g., via PyBOP or SciPy).

import pybamm
import numpy as np
import scipy.optimize as opt

def objective(params_array):
    # Map array to PyBaMM parameters
    parameter_values.update({
        "Negative electrode active material volume fraction": params_array[0],
        "Positive electrode active material volume fraction": params_array[1]
    })
    sim = pybamm.Simulation(model, parameter_values=parameter_values)
    sim.solve(t_eval)
    V_sim = sim.solution["Terminal voltage [V]"].entries
    # Calculate Sum of Squared Errors
    return np.sum((V_exp - V_sim)**2)

# Minimize using Sequential Least Squares Programming
res = opt.minimize(objective, initial_guess, method='SLSQP', bounds=param_bounds)

Without enforcing these physical parameter constraints through optimization, an AI model trained on the dataset will merely learn the arbitrary distribution of the Latin Hypercube Sampling rather than true electrochemistry.

4. Dataset Generation Pipeline Execution

The rigorous generation of these DAE solutions requires careful management of parallel workers to handle solver failures caused by extreme parameter combinations leading to non-convergent Jacobians.

cd pybamm-ai-data-lab
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Execute generating 200 high-fidelity samples utilizing CasADi solvers
python src/run_all.py --samples 200 --workers 4 --seed 7 --backend pybamm --output /tmp/pybamm-ai-dataset

5. Orthogonal Splitting and Information Leakage

If cycle snapshots from the same modeled aging trajectory (e.g., cycle 10, 50, 100 of identical cell parameters) are arbitrarily partitioned into train and test sets, the AI model will overfit to the deterministic nature of the ODEs. It will simply memorize the trajectory rather than learning the generalized degradation function.

  • Perform splits strictly along orthogonal boundaries: cell_design_id or randomized subsets of the kinetic parameter space.
  • Frequency-domain EIS points are structurally correlated via the Kramers-Kronig relations; adjacent frequencies from a single impedance sweep must never straddle the train/test boundary.

6. Connecting Synthetic Distributions to Real-World Physics

Synthetic generation is incredibly powerful for pre-training neural networks, testing active-learning acquisition functions, and performing ablation studies on network architectures. However, bridging the sim-to-real gap necessitates:

  • Continuous calibration of the underlying PyBaMM model to Differential Voltage Analysis (DVA) and Incremental Capacity Analysis (ICA) from physical cyclers.
  • Global sensitivity analysis (e.g., Sobol indices) to quantify which unobservable parameters actually influence the generated voltage features.
  • Strict documentation of the solver constraints—if the SUNDIALS CVODES solver fails to meet absolute tolerances of $10^{-6}$ during severe degradation steps, those data points must be flagged as non-physical rather than fed blindly to the AI.

References

Search questions

FAQ

Who is this article for?

This article is for readers who want a phd level-level guide to Generate Battery Aging and EIS AI Datasets with PyBaMM. It takes about 15 min and focuses on PyBaMM, Battery Aging, SOH, RUL.

What should I read next?

The recommended next step is Training a Battery AI Model with PyBaMM: Predicting SOH and RUL, so the article connects into a longer learning route instead of ending as an isolated note.

Does this article include runnable code or companion resources?

Yes. Use the run notes, resource cards, and download links on the page to reproduce the example or inspect the companion files.

How does this article fit into the larger site?

It is connected to the article context block, learning routes, resources, and project timeline so readers can move from concept to implementation.

Article context

Battery Modeling for AI

A reproducible path from PyBaMM, EIS, and aging simulation to labeled battery datasets for AI training.

Level: PhD level Reading time: 15 min
  • PyBaMM
  • Battery Aging
  • SOH
  • RUL
  • Data Quality
Other language version 用 PyBaMM 生成电池老化与阻抗 AI 数据集:标签、切分和质量控制
Share summary Generate Battery Aging and EIS AI Datasets with PyBaMM

Build a reproducible PyBaMM data factory for SOH, RUL, LLI, LAM, plating, and impedance-feature labels.

Download share card Open share center

Companion resources

Leave a Reply

Project timeline

Published posts

  1. Reading PyBaMM Fast: Architecture for Battery Modeling and AI Data A PhD-level guide to PyBaMM expression trees, Simulation, model options, metadata, and AI dataset design.
  2. PyBaMM EIS Data Generation: Impedance Features and AI Labels Use PyBaMM core EISSimulation to generate impedance spectra, extract features, and align them with aging labels.
  3. Generate Battery Aging and EIS AI Datasets with PyBaMM Build a reproducible PyBaMM data factory for SOH, RUL, LLI, LAM, plating, and impedance-feature labels.
  4. Training a Battery AI Model with PyBaMM: Predicting SOH and RUL Train scikit-learn regressors on PyBaMM-style EIS features and operating metadata to predict battery SOH and RUL.

Published resources

  1. PyBaMM AI Data Lab README Setup, quick run, backend behavior, and output schemas for the PyBaMM battery AI data pipeline.
  2. PyBaMM AI Data Lab full bundle Bundles design generation, aging sweeps, EIS sweeps, label building, validation checks, sample CSVs, and figures.
  3. PyBaMM sample manifest Stores sample id, model family, parameter set, protocol, temperature, SOC, cycle, split group, and label source.
  4. PyBaMM EIS sample spectra CSV Frequency-level impedance output with frequency, Z_re, Z_im, magnitude, phase, backend, and solver status.
  5. Battery aging and EIS labels CSV Stores SOH, RUL proxy, LLI, LAM, plating, local resistance, and EIS features.
  6. PyBaMM AI data quality report Records duplicate samples, duplicate spectrum points, missing labels, split leakage, and backend usage.
  7. PyBaMM to AI data pipeline figure Shows design grid, aging solve, EIS solve, label build, quality gate, and AI split.
  8. EIS feature and label schema figure Connects frequency points, impedance features, operating metadata, and SOH/RUL/degradation labels.
  9. Aging label sample figure Sample figure showing cycle snapshots, SOH, and local ECM resistance labels.
  10. SOH/RUL training metrics CSV Stores group split, MAE, RMSE, R2, label source, and backend used for auditing model results.
  11. SOH/RUL held-out predictions CSV Stores held-out true values, predictions, and absolute errors.
  12. SOH/RUL feature importance CSV Records random-forest feature importance values for each target model.
  13. SOH/RUL training results figure Shows held-out SOH/RUL prediction scatter plots and SOH feature importance.
  14. Battery Modeling for AI share card OG share card for the PyBaMM battery modeling, EIS, aging simulation, and AI data hub.

Next notes

  1. Add experimental calibration and identifiability notes
  2. Add revalidated PyBOP/SEIS comparison notes
Scroll down