Tom stresses that the initial focus should be on establishing a solid UI interaction and a basic generation method as the foundational first step.