Create a commercial benchmarking service featuring real-world coding tasks and scenarios to more accurately assess AI code generation tools.