T10: Deep learning and surrogate methods

In this topic we will advance the fundamental mathematical understanding of artificial neural networks, e.g., through the design and rigorous analysis of stochastic gradient descent methods for their training. Combining data-driven machine learning approaches with model order reduction methods, we will develop fully certified multi-fidelity modelling frameworks for parameterised PDEs, design and study higher-order deep learning-based approximation schemes for parametric SPDEs and construct cost-optimal multi-fidelity surrogate methods for PDE-constrained optimisation and inverse problems.

  • Mathematical fields

    • Differential geometry
    • Stochastic analysis
    • Theory of stochastic processes
    • Optimisation and calculus of variation
    • Numerical analysis, machine learning and scientific computing
  • Collaborations with other Topics

  • Selected publications and preprints

    since 2019

    $\bullet $ Arnulf Jentzen and Adrian Riekert. Non-convergence to global minimizers for Adam and stochastic gradient descent optimization and constructions of local minimizers in the training of artificial neural networks. arXiv e-prints, February 2024. arXiv:2402.05155.

    $\bullet $ Ilya Chevyrev, Andris Gerasimovičs, and Hendrik Weber. Feature engineering with regularity structures. Journal of Scientific Computing, 98(1):13, January 2024. doi:10.1007/s10915-023-02401-4.

    $\bullet $ Sebastian Becker, Arnulf Jentzen, Marvin S. Müller, and Philippe von Wurstemberger. Learning the random variables in Monte Carlo simulations with stochastic gradient descent: Machine learning for parametric PDEs and financial derivative pricing. Mathematical Finance, 34(1):90–150, January 2024. doi:10.1111/mafi.12405.

    $\bullet $ Arnulf Jentzen, Benno Kuckuck, and Philippe von Wurstemberger. Mathematical introduction to deep learning: methods, implementations, and theory. arXiv e-prints, October 2023. arXiv:2310.20360.

    $\bullet $ Hendrik Kleikamp, Martin Lazar, and Cesare Molinari. Be greedy and learn: Efficient and certified algorithms for parametrized optimal control problems. arXiv e-prints, July 2023. arXiv:2307.15590.

    $\bullet $ Bernard Haasdonk, Hendrik Kleikamp, Mario Ohlberger, Felix Schindler, and Tizian Wenzel. A new certified hierarchical and adaptive RB-ML-ROM surrogate model for parametrized PDEs. SIAM J. Sci. Comput., pages A1039–A1065, June 2023. doi:10.1137/22M1493318.

    $\bullet $ Josua Sassen, Klaus Hildebrandt, Martin Rumpf, and Benedikt Wirth. Parametrizing product shape manifolds by composite networks. arXiv e-prints, February 2023. arXiv:2302.14665.

    $\bullet $ Steffen Dereich, Arnulf Jentzen, and Sebastian Kassing. On the existence of minimizers in shallow residual ReLU neural network optimization landscapes. arXiv e-prints, February 2023. arXiv:2302.14690.

    $\bullet $ Tim Keil, Hendrik Kleikamp, Rolf J. Lorentzen, Micheal B. Oguntola, and Mario Ohlberger. Adaptive machine learning-based surrogate modeling to accelerate PDE-constrained optimization in enhanced oil recovery. Advances in Computational Mathematics, 48(6):Paper No. 73, 35, November 2022. doi:10.1007/s10444-022-09981-z.

    $\bullet $ Patrick Cheridito, Arnulf Jentzen, and Florian Rossmannek. Landscape analysis for shallow neural networks: Complete classification of critical points for affine target functions. J. Nonlinear Sci., 32(5):64, July 2022. doi:10.1007/s00332-022-09823-8.

    $\bullet $ Steffen Dereich and Sebastian Kassing. Convergence of stochastic gradient descent schemes for Lojasiewicz-landscapes. arXiv e-prints, February 2021. arXiv:2102.09385.

    $\bullet $ Steffen Dereich and Thomas Müller-Gronbach. General multilevel adaptations for stochastic approximation algorithms of Robbins–Monro and Polyak–Ruppert type. Numer. Math., 142(2):279–328, June 2019. doi:10.1007/s00211-019-01024-y.

    $\bullet $ Sebastian Becker, Patrick Cheridito, and Arnulf Jentzen. Deep optimal stopping. J. Mach. Learn. Res., 20(74):1–25, April 2019. doi:10.5555/3322706.3362015.

    Back to top

  • Recent publications and preprints

    since 2023

    $\bullet $ Patrick Cheridito, Arnulf Jentzen, and Florian Rossmannek. Gradient descent provably escapes saddle points in the training of shallow ReLU networks. Journal of Optimization Theory and Applications, September 2024. doi:10.1007/s10957-024-02513-3.

    $\bullet $ Raphael Lafargue, Luke Smith, Franck Vermet, Mathias Löwe, Ian Reid, Vincent Gripon, and Jack Valmadre. Oops, I sampled it again: Reinterpreting confidence intervals in few-shot learning. arXiv e-prints, September 2024. arXiv:2409.02850.

    $\bullet $ Lukas Gonon, Arnulf Jentzen, Benno Kuckuck, Siyu Liang, Adrian Riekert, and Philippe von Wurstemberger. An overview on machine learning methods for partial differential equations: from physics informed neural networks to deep operator learning. arXiv e-prints, August 2024. arXiv:2408.13222.

    $\bullet $ Steffen Dereich and Arnulf Jentzen. Convergence rates for the Adam optimizer. arXiv e-prints, July 2024. arXiv:2407.21078.

    $\bullet $ Steffen Dereich, Robin Graeber, and Arnulf Jentzen. Non-convergence of Adam and other adaptive stochastic gradient descent optimization methods for non-vanishing learning rates. arXiv e-prints, July 2024. arXiv:2407.08100.

    $\bullet $ Fabian Hornung, Arnulf Jentzen, and Diyora Salimova. Space-time deep neural network approximations for high-dimensional partial differential equations. Journal of Computational Mathematics, 0(0):0–0, June 2024. doi:10.4208/jcm.2308-m2021-0266.

    $\bullet $ Julia Ackermann, Arnulf Jentzen, Benno Kuckuck, and Joshua Lee Padgett. Deep neural networks with ReLU, leaky ReLU, and softplus activation provably overcome the curse of dimensionality for space-time solutions of semilinear partial differential equations. arXiv e-prints, June 2024. arXiv:2406.10876.

    $\bullet $ Steffen Dereich, Arnulf Jentzen, and Adrian Riekert. Learning rate adaptive stochastic gradient descent optimization methods: numerical simulations for deep learning methods for partial differential equations and convergence analyses. arXiv e-prints, June 2024. arXiv:2406.14340.

    $\bullet $ Anselm Hudde, Martin Hutzenthaler, Arnulf Jentzen, and Sara Mazzonetto. On the Itô–Alekseev–Gröbner formula for stochastic differential equations. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, May 2024. doi:10.1214/21-aihp1199.

    $\bullet $ Tizian Wenzel, Bernard Haasdonk, Hendrik Kleikamp, Mario Ohlberger, and Felix Schindler. Application of deep kernel models for certified and adaptive RB-ML-ROM surrogate modeling. In Ivan Lirkov and Svetozar Margenov, editors, Large-Scale Scientific Computations, 117–125. Springer Nature Switzerland, May 2024. doi:10.1007/978-3-031-56208-2_11.

    $\bullet $ Sonja Cox, Martin Hutzenthaler, and Arnulf Jentzen. Local Lipschitz continuity in the initial value and strong completeness for nonlinear stochastic differential equations. Memoirs of the American Mathematical Society, April 2024. doi:10.1090/memo/1481.

    $\bullet $ Arnulf Jentzen and Adrian Riekert. Non-convergence to global minimizers for Adam and stochastic gradient descent optimization and constructions of local minimizers in the training of artificial neural networks. arXiv e-prints, February 2024. arXiv:2402.05155.

    $\bullet $ Ilya Chevyrev, Andris Gerasimovičs, and Hendrik Weber. Feature engineering with regularity structures. Journal of Scientific Computing, 98(1):13, January 2024. doi:10.1007/s10915-023-02401-4.

    $\bullet $ Sebastian Becker, Arnulf Jentzen, Marvin S. Müller, and Philippe von Wurstemberger. Learning the random variables in Monte Carlo simulations with stochastic gradient descent: Machine learning for parametric PDEs and financial derivative pricing. Mathematical Finance, 34(1):90–150, January 2024. doi:10.1111/mafi.12405.

    $\bullet $ Juliane Braunsmann, Marko Rajković, Martin Rumpf, and Benedikt Wirth. Convergent autoencoder approximation of low bending and low distortion manifold embeddings. ESAIM: Mathematical Modelling and Numerical Analysis, 58(1):335–361, January 2024. doi:10.1051/m2an/2023088.

    $\bullet $ Arnulf Jentzen, Benno Kuckuck, and Philippe von Wurstemberger. Mathematical introduction to deep learning: methods, implementations, and theory. arXiv e-prints, October 2023. arXiv:2310.20360.

    $\bullet $ Arnulf Jentzen and Timo Welti. Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation. Applied Mathematics and Computation, 455:127907, October 2023. doi:10.1016/j.amc.2023.127907.

    $\bullet $ Julia Ackermann, Arnulf Jentzen, Thomas Kruse, Benno Kuckuck, and Joshua Lee Padgett. Deep neural networks with ReLU, leaky ReLU, and softplus activation provably overcome the curse of dimensionality for Kolmogorov partial differential equations with Lipschitz nonlinearities in the $L^p$-sense. arXiv e-prints, September 2023. arXiv:2309.13722.

    $\bullet $ Philipp Grohs, Shokhrukh Ibragimov, Arnulf Jentzen, and Sarah Koppensteiner. Lower bounds for artificial neural network approximations: A proof that shallow neural networks fail to overcome the curse of dimensionality. J. Complexity, 77:101746, 53 pp., August 2023. doi:10.1016/j.jco.2023.101746.

    $\bullet $ Hendrik Kleikamp, Martin Lazar, and Cesare Molinari. Be greedy and learn: Efficient and certified algorithms for parametrized optimal control problems. arXiv e-prints, July 2023. arXiv:2307.15590.

    $\bullet $ Bernard Haasdonk, Hendrik Kleikamp, Mario Ohlberger, Felix Schindler, and Tizian Wenzel. A new certified hierarchical and adaptive RB-ML-ROM surrogate model for parametrized PDEs. SIAM J. Sci. Comput., pages A1039–A1065, June 2023. doi:10.1137/22M1493318.

    $\bullet $ Christian Beck, Martin Hutzenthaler, Arnulf Jentzen, and Benno Kuckuck. An overview on deep learning-based approximation methods for partial differential equations. Discrete and Continuous Dynamical Systems - B, 28(6):3697–3746, June 2023. doi:10.3934/dcdsb.2022238.

    $\bullet $ Philipp Grohs, Fabian Hornung, Arnulf Jentzen, and Philippe von Wurstemberger. A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations. Memoirs of the American Mathematical Society, April 2023. doi:10.1090/memo/1410.

    $\bullet $ Arnulf Jentzen and Adrian Riekert. Strong overall error analysis for the training of artificial neural networks via random initializations. Commun. Math. Stat., pages 50, March 2023. doi:10.1007/s40304-022-00292-9.

    $\bullet $ Simon Eberle, Arnulf Jentzen, Adrian Riekert, and Georg S. Weiss. Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation. Electron. Res. Arch., 31(5):2519–2554, March 2023. doi:10.3934/era.2023128.

    $\bullet $ Josua Sassen, Klaus Hildebrandt, Martin Rumpf, and Benedikt Wirth. Parametrizing product shape manifolds by composite networks. arXiv e-prints, February 2023. arXiv:2302.14665.

    $\bullet $ Steffen Dereich, Arnulf Jentzen, and Sebastian Kassing. On the existence of minimizers in shallow residual ReLU neural network optimization landscapes. arXiv e-prints, February 2023. arXiv:2302.14690.

    $\bullet $ Philipp Grohs, Fabian Hornung, Arnulf Jentzen, and Philipp Zimmermann. Space-time error estimates for deep neural network approximations for differential equations. Advances in Computational Mathematics, 49(1):4, January 2023. doi:10.1007/s10444-022-09970-2.

    $\bullet $ Lukas Gonon, Robin Graeber, and Arnulf Jentzen. The necessity of depth for artificial neural networks to approximate certain classes of smooth and bounded functions without the curse of dimensionality. arXiv e-prints, January 2023. arXiv:2301.08284.

    $\bullet $ Arnulf Jentzen and Adrian Riekert. Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation. J. Math. Anal. Appl., 517(2):126601, January 2023. doi:10.1016/j.jmaa.2022.126601.

    Back to top

    further publications

Back to research programme

Training of deep neural networks

© MM/vl

Investigators: Böhm, Dereich, Jentzen, Kuckuck

This research unit aims to establish convergence rates for Stochastic Gradient Descent (SGD) methods used in Artificial Neural Networks (ANN) training, by utilizing techniques from stochastic analysis, martingale theory, real algebraic geometry, and Lyapunov functions. Despite the great success of SGD methods in training ANNs, it remains a fundamental open research problem to prove (or disprove) convergence with convergence rates, and answering this problem is precisely one of our key goals. Additionally, we plan to develop new optimized versions of existing optimizers.

Learning methods for PDEs

© MM/vl

Investigators: Engwer, Jentzen, Ohlberger, Rave, Weber

This research unit aims to design and analyze deep learning approximation methods for various partial differential equations (PDEs), including those with low regularity like stochastic PDEs (SPDEs). We will develop new error estimation and certification frameworks for machine learning (ML)-based surrogate methods for PDEs, focusing on improving initialization and unsupervised learning performance. Additionally, we plan to enhance higher-order approximation schemes for SPDEs using rough path theory and regularity structures combined with ML approaches.

Surrogate methods for optimal control and inverse problems

© MM/vl

Investigators: Jentzen, Ohlberger, Rave, Wirth

This research unit focusses on designing and studying deep learning and other surrogate methods for inverse or more general optimal control problems. A key challenge is integrating data-driven with model-based approaches using machine learning. We aim to learn dynamics or operators from data and to incorporate prior knowledge as regularisation. A central part considers the development of cost-optimal multifidelity approximation frameworks, combining deep learning models with classical approximation schemes such as finite element and model order reduction methods. Finally, our aim is to develop numerical techniques for solving high-dimensional optimal control problems.