Intelligent Rate Limiting
Go beyond basic IP limits. Apply rules based on Authenticated User IDs, API Keys, or Session IDs. Fairly restrict abusers, set concurrency limits, and block obvious misuse with basic input analysis.
Apply precise limits based on authenticated User IDs, API Keys, or session data, not just IP addresses.
Limit the number of simultaneous requests per user or globally to prevent resource exhaustion and downstream API overload.
Perform basic checks on request size or prompt length to block obviously malicious or wasteful requests early.
Target restrictions specifically at abusive actors without penalizing legitimate high-volume users on shared IPs.
Go Beyond Simple IP Blocking with Intelligent Rate Limiting
Traditional IP-based rate limiting often falls short in the era of LLM applications. Shared IPs, dynamic addresses, and authenticated user sessions mean you might unfairly block legitimate users or fail to stop targeted abuse from a single authenticated entity. Prompt Shield provides intelligent rate limiting capabilities that understand the context of each request.
Key Capabilities
Identifying the Requester
- Flexible Identification: Configure Prompt Shield to identify unique requesters using:
- Authenticated User IDs (e.g., from JWT
sub
claims) - API Keys (passed in headers or request bodies)
- Session IDs
- Custom Headers or Request Attributes
- IP Address (as a fallback or specific rule)
- Authenticated User IDs (e.g., from JWT
- Targeted Rules: Apply limits accurately to the entity making the request, not just their network address.
Setting Granular Limits
- User-Specific Quotas: Define precise request limits (per minute, hour, day, month) for individual users or API keys.
- Tiered Application: Combine with Tiered Usage & Cost Limits to apply different rate limits based on the identified user’s subscription plan or role. (Links conceptually to the other feature)
- Fairness: Ensure high-volume legitimate users aren’t penalized due to noisy neighbours on the same IP.
Concurrency Control
- Limit Simultaneous Requests: Prevent a single user or service from overwhelming your application resources or hitting downstream LLM API concurrency limits by capping simultaneous connections or function invocations.
- Maintain Stability: Ensure your application remains responsive under load.
Basic Input Analysis
- Pre-emptive Blocking: Configure basic checks on incoming requests before executing expensive logic or LLM calls.
- Block Obvious Misuse: Reject requests with excessively long prompts or unusually large payloads designed solely to maximize token usage or cause errors.
- Cost Savings: Prevent wasteful requests from incurring unnecessary processing or LLM API costs.
How It Works
- Request Intercepted: Prompt Shield receives the incoming request.
- Identifier Extraction: Extracts the configured identifier (User ID, API Key, IP Address, etc.).
- Limit Check: Checks the request against the specific rate limits (e.g., RPM, monthly quota) defined for that identifier or its associated tier.
- Concurrency Check: Verifies if accepting the request would exceed configured concurrency limits.
- Input Analysis (Optional): Performs basic checks on request size/length if configured.
- Enforcement: Allows the request to proceed only if all applicable checks pass; otherwise, blocks it.
Benefits
- Targeted Protection: Accurately limit specific users or keys, not just IPs.
- Improved Fairness: Avoid blocking legitimate users on shared networks.
- Enhanced Stability: Prevent resource exhaustion with concurrency limits.
- Early Abuse Detection: Block simple, wasteful attacks before they cost you money.
- Reduced False Positives: More accurate identification leads to fewer wrongly blocked requests.
Leverage Prompt Shield’s Intelligent Rate Limiting to secure your LLM applications effectively and fairly.
See it in action
Visualize how Prompt Shield identifies different users and applies specific rate limits to ensure fairness and stability.
