• ygouzerh 16 minutes ago

How does this work when a projet have many external dependencies, like an S3 bucket, a secret manager, a third party API, etc?

• deet an hour ago

We've found that methods like this substantially increase the quality and reliability of coding agent output. The ability to run code in a sandbox, drive an interactive session using a browser or API calls or other apps, and visually confirm output via vision models all adds up to plugging a big hole in the feedback loop for agent modifying a complex codebase.

We've had agents go as far as interactively testing how our product responds in video calls by launching our full stack in a set of docker containers (app, api, db, queues, etc.), all inside a larger sandbox, populating test data, connecting the mock system to a real video call solution like Google meet, and injecting audio and video to test the response. End-to-end, like a real user flow.

It's not perfect yet, but if you are a skeptic on the ability for AI agents to productively modify a complex product, I'd highly encourage you to play with a setup like this before ossifying your conclusions.

• Elzair 2 hours ago

I wonder how long it will take for someone to pwn this?