It costs what?! A few things to know before you develop with Gemini | Prompt Shield

It’s easy to start hacking with Gemini’s APIs - especially the powerful 2.5 Pro versions, and it’s just as easy to give yourself a massive problem and rack up a giant bill… Here’s a few things to bear in mind before you generate yourself an API key and start generating

The Meter is Always Running (And It Runs Fast!)

Unlike traditional software development where costs might be more predictable (server instances, database storage), LLM costs are often usage-based. Gemini APIs, for instance, are typically priced per unit of data processed – often per character or per token (think of tokens as pieces of words; roughly 1 token ≈ 4 characters in English).

Gemini API Pricing: Models like Gemini 1.5 Pro charge different rates for the text you send in (input/prompt) and the text the model generates out (output). This could be around $1.25 per million input tokens and $5.00 per million output tokens for standard prompts, doubling if your prompts exceed 128k tokens. While a million tokens sounds like a lot, complex tasks, long conversations, multi-modal input or high user traffic can consume them surprisingly quickly.
Scaling Costs: A simple chatbot might have negligible costs during testing, but deploy it to thousands of users, and those per-token charges multiply rapidly. An inefficient prompt design that uses more tokens than necessary can dramatically inflate costs without providing extra value.

Beware the Denial-of-Wallet (DoW) Attack

This isn’t your standard Denial-of-Service (DoS) attack aimed at taking your application offline. A Denial-of-Wallet (DoW) attack specifically targets the financial aspect of your cloud-hosted services, especially consumption-based ones like LLMs.

How does it work? Malicious actors can intentionally send carefully crafted prompts to your Gemini-powered application designed to maximize resource consumption. They might ask for extremely long, complex analyses, recursive summaries, or exploit vulnerabilities to make the model work much harder (and use far more tokens) than usual. The goal isn’t necessarily to break the app but to run up your cloud bill astronomically. Because you pay per token, a successful DoW attack can lead to unexpected charges amounting to hundreds or even thousands of pounds/dollars.

Warning: Google Cloud Doesn’t Have an Automatic Billing “Kill Switch”

You might think, “I’ll just set a budget in Google Cloud, and it’ll stop everything if costs get too high.” Think again.

Google Cloud Budgets are primarily alerting mechanisms. You can configure them to send you an email or trigger a Pub/Sub notification when spending reaches certain thresholds (e.g., 50%, 90%, 100% of your budget). However, they do not automatically stop your services or disable billing to prevent exceeding the budget.

Delayed Reporting: There’s often a delay (hours, sometimes days) between when resources are consumed and when that usage is reflected in billing reports and triggers budget alerts. By the time you get notified, you might have already significantly overspent.
Manual/Custom Intervention Required: Stopping runaway costs requires manual intervention (shutting down projects/services) or building your own custom solution. One common approach is using the budget’s Pub/Sub alerts to trigger a Cloud Function that programmatically disables billing for the project. Alternatively, third-party services exist specifically to address this gap; for example, Flame Shield (flameshield.com) offers a service that acts as a billing kill switch for Firebase and Google Cloud projects by automatically detaching the billing account when a pre-set limit is breached. It’s crucial to understand that abruptly disabling billing, whether through custom code or a third-party service, carries risks, including potential data loss for services that are stopped.

This lack of a built-in, automatic “hard stop” makes DoW attacks even more dangerous and puts the onus entirely on you to monitor and react very quickly, or implement a custom/third-party kill switch mechanism.

Rate Limiting: Your First and Best Line of Defense

Given the risks, how do you protect yourself? The single most crucial step is implementing robust rate limiting on your application’s API endpoints that call Gemini.

Rate limiting restricts how many requests a user (or API key) can make within a specific time window (e.g., 10 requests per minute).

Cost Control: It prevents individual users (or bots) from generating excessive costs accidentally or maliciously.
DoW Mitigation: It directly thwarts DoW attacks by capping the number of expensive requests an attacker can send.
Fair Usage: It ensures your application remains responsive for all users by preventing resource hogging.

Implementing rate limiting shouldn’t be treated as an optional extra; consider it essential infrastructure for any public-facing LLM application.

Conclusion: Build Smart, Build Secure

Developing with Gemini offers incredible potential, but it’s crucial to go in with your eyes open to the financial risks. Costs can escalate rapidly, malicious actors can exploit the pay-per-use model through DoW attacks, and you can’t rely on cloud budgets to automatically cap your spending.

Before launching your Gemini-powered app:

Understand the pricing model thoroughly.
Set up budget alerts as an early warning system, but don’t rely on them as a safety net.
Implement rate limiting per user or API key.
Monitor your usage and costs vigilantly.

By taking these precautions, you can harness the power of Gemini while keeping your cloud bills under control and protecting your application from abuse. Don’t let runaway costs turn your innovative project into a financial nightmare.

The Meter is Always Running (And It Runs Fast!)

Beware the Denial-of-Wallet (DoW) Attack

Warning: Google Cloud Doesn’t Have an Automatic Billing “Kill Switch”

Rate Limiting: Your First and Best Line of Defense

Conclusion: Build Smart, Build Secure

Ready to Get Started?