Koopman Spectrum RL for Bifurcation Control: Data-Driven Policy Optimization in Spectral Subspaces


Dipesh D., Dhatterwal J. S., ÖZDEN AYNA H.

Mathematics, cilt.14, sa.11, 2026 (SCI-Expanded, Scopus)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 14 Sayı: 11
  • Basım Tarihi: 2026
  • Doi Numarası: 10.3390/math14111847
  • Dergi Adı: Mathematics
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, zbMATH, Directory of Open Access Journals, Academic Search Ultimate (EBSCO), Materials Science & Engineering Collection (ProQuest), Technology Collection (ProQuest)
  • Anahtar Kelimeler: Koopman operator, Lyapunov exponents, Markov Decision Process, reinforcement learning, Runge–Kutta scheme
  • Bursa Uludağ Üniversitesi Adresli: Evet

Özet

This paper presents a reinforcement learning (RL) framework based on the Koopman operator for high-dimensional nonlinear control. By leveraging nonlinear eigenvalue dynamics, the approach enables scalable and efficient policy optimization. We examined the challenge of controlling complex systems by embedding high-dimensional states (Formula presented.) into a Koopman-invariant subspace (Formula presented.), where evolution becomes linear under the Koopman operator (Formula presented.). By spectrally decomposing (Formula presented.), the eigenvalue dynamics are obtained, and (Formula presented.) is reconstructed iteratively via dominant eigenpairs (Formula presented.). A policy network (Formula presented.) selects actions (Formula presented.), while a value function (Formula presented.), expressed in Koopman eigenfunction coordinates, guides gradient-based policy updates. The framework integrates spectral stability constraints ( (Formula presented.) and Lyapunov-based analysis to ensure convergence. We derive perturbation bounds for Koopman eigenvalues under policy updates and establish conditions for nonlinear mode interactions in the lifted space. The spectral policy gradient theorem for Koopman RL links eigenvalue dynamics to policy optimization, includes a constrained Bellman formulation in Koopman coordinates, and analyzes bifurcation of learning-induced eigenvalue shifts.