Tom proposes encoding video as code so that LLMs—being adept at code—can import, manipulate 3D scene directions, and re-render the content.