1 decade ago: Reinforcement Learning Prompt Engineer in Sec. 5.3 of «Learning to Think …» [2]. Adaptive Chain of Thought! An RL net learns to query another net for abstract reasoning & decision making. Going beyond the 1990 World Model for millisecond-by-millisecond planning [1].  [2] J. Schmidhuber (JS, 2015). «On Learning to Think: Algorithmic Information Theory for Novel Combinations of RL Controllers and Recurrent Neural World Models.» ArXiv 1210.0118  [1] JS (1990). “Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments.» TR FKI-126-90, TUM. (This report also introduced artificial curiosity and intrinsic motivation through generative adversarial networks.)
23,54K