Use a small offline LLM with limited context window to handle upfront tasks like categorization before routing to a larger model.