You cannot learn that which you cannot sample Crank up the temperature to train more curious agents. Simple and effective. From “Training a Generally Curious Agent”: We design a diverse set of tasks where an LLM agent needs strategic information gathering to succeed, then train an LLM on self-generated data to prefer higher performing trajectories. The resulting behavior learned can transfer zero-shot to unseen tasks, showcasing its potential to build general decision making agents.
219