I’m curious where everyone’s agents fail. Demos aren’t shared unless they are impressive so there is an inherent selection bias. Problems show up in real production under load. Is it contextual memory handling? Reliability during long workflow? Please share 🦞