Authors
Amar Ahmad , Yvonne Valles , and Youssef Idaghdour , New York University, UAE
Abstract
We study how fundamental statistical limits in reinforcement learning change when multiple real-world challenges interact. Focusing on sample inefficiency, nonstationarity, partial observability, and high-dimensional observations, we synthesise existing lower-bound arguments and show that their effects are generally non-additive.We formalise three structure-conditioned mechanisms: multiplicative complexity penalties in partially observable nonstationary environments, memory collapse under low-rank observation structure, and explicit finite-horizon safety guarantees via probabilistic shielding. Rather than proposing new algorithms, the paper clarifies how exploitable structure reshapes worst-case guarantees and motivates a shift from pessimistic minimax analysis toward conditional complexity frontiers that tighten as structure is detected online
Keywords
Reinforcement Learning, Sample Complexity, Partial Observability, Nonstationarity, Safe RL, Control, Statistical Limits