If you have a high-quality evaluation set, you can iterate on prompts and inference strategies instead of fine-tuning the base model to achieve great user outcomes.