After 2 years of iteration at scale, here's everything we learned about engineering Fluid. The most desirable (yet seemingly hardest to attain) property of a compute system is *efficient and predictable performance at any scale*. Serverless as pioneered by Lambda had an unusual set of tradeoffs. It has the potential for cold starts in production, which is a no-go. What's lesser known though is that it solved for the achilles heel of servers: noisy neighbors, congestion, and inadequate load balancing. It did so at tremendous expense (1 "computer" per concurrent request), but it was glorious. I sometimes compare this to you and your friend leave work to go to the same exact restaurant, at the same time, but you both order Uber XLs. You'll have an amazing experience, but it's wasteful. When we built Fluid, we wanted to have the cake and eat it too. I'm particularly happy with our algorithm that uses 'every available seat in the car' (great for AI apps that wait a lot on tokens), but spins up more computers horizontally as needed. This is something exceedingly hard to get right, even for experienced DevOps teams.
Vercel
Vercel20 tuntia sitten
Fluid makes it possible to ship AI and backend workloads without the gotchas of serverless. It prevents cold starts, it adds streaming and post-response compute support, and massively improves cost-efficiency. Here's how we engineered it
50,92K