Xiaonan Gao, Ziyi Wu, Xianchao Zhu, Lei Cai, Soft actor-critic algorithm with adaptive normalization, Vol. 2025 (2025), No. 6, pp. 1-10

Full Text: PDF
DOI: 10.23952/jnfa.2025.6

Received December 24, 2024; Accepted March 1, 2025; Published March 13, 2025

Abstract. In recent years, breakthroughs were made in the field of deep reinforcement learning, but, their applications in the real world were seriously affected due to the instability of algorithms and the difficulty in ensuring convergence. As a typical algorithm in reinforcement learning, although the SAC algorithm enhances the robustness and agent’s exploration ability by introducing the concept of maximum entropy, it still has the disadvantage of instability in the training process. In order to solve the problems, this paper proposes an Adaptive Normalization-based SAC (AN-SAC) algorithm. By introducing the adaptive normalized reward mechanism into the SAC algorithm, our method can dynamically adjust the normalized parameters of the reward during the training process so that the reward value has zero mean and unit variance. Thus it better adapts to the reward distribution and improves the performance and stability of the algorithm. Experimental results demonstrate that the performance and stability of the AN-SAC algorithm are significantly improved compared with the SAC algorithm.

How to Cite this Article:
X. Gao, Z. Wu, X. Zhu, L. Cai, Soft actor-critic algorithm with adaptive normalization, J. Nonlinear Funct. Anal. 2025 (2025) 6.