Infrastructure layer that routes AI inference between local models and cloud APIs based on task complexity, latency requirements, and cost. LLM-as-judge determines output quality in real time.