PreferredMaxLatency - Go SDK
PreferredMaxLatency type definition
The Go SDK and docs are currently in beta. Report issues on GitHub.
Preferred maximum latency (in seconds). Can be a number (applies to p50) or an object with percentile-specific cutoffs. Endpoints above the threshold(s) may still be used, but are deprioritized in routing. When using fallback models, this may cause a fallback model to be used instead of the primary model if it meets the threshold.
Supported Types
PercentileLatencyCutoffs
Union Discrimination
Use the Type field to determine which variant is active, then access the corresponding field: