How We Cut Cloud Cost by 97% on an IoT Platform

A client running an IoT-powered recycled-bag vending machine network came to us with a simple question: "Our cloud bill is ₹30,000 every month. Can you check why?"

It didn't sound dramatic. Many fast-growing IoT setups hit five-figure monthly cloud bills, and the assumption is usually "we need to grow into it."

But ₹30K/month for a network of vending machines that mostly sit idle, dispensing a recycled bag every few hours? That's not a scaling cost. That's an architecture problem.

We took two weeks to audit the platform end to end. The fix dropped their cloud spend from ₹30,000 per month to under ₹10,000 per year — a 97% reduction — without losing a single feature. Here's what we found, what we changed, and what every IoT founder should learn from it.

What was actually running

The platform was sitting on Google Cloud Platform with a few worker VMs, a managed database, and a handful of Cloud Functions handling business logic. On paper, a clean stack.

The problem was hidden in two places.

1. Every machine was pinging the cloud every 10 seconds

Each vending machine sent a heartbeat, a status snapshot, and a stock-level packet to the backend on a fixed 10-second interval — whether anything had happened or not.

That's 8,640 pings per machine per day. For a network with dozens of machines, the platform was processing millions of packets per month — most of them carrying zero new information. The same "machine is fine, stock at 87%, last dispense 4 hours ago" message, over and over and over.

GCP charges by request count, by data ingress, by function invocation, by database write. The pricing model isn't generous when you're hitting it 8,640 times per day per device.

2. The platform had an admin portal — but no operations layer

There was a working admin portal. The tech team could see machine logs, online/offline status, and pull up event history. That part wasn't broken.

What was missing was the operational layer — the day-to-day tooling that vendors, refill staff, and the business owner actually need to run the network.

Specifically:

No vendor-friendly dashboard. Vendors handling refills and field operations had no clean view of "which machines need attention, where, and how urgently." The data was buried in admin-grade screens.
No automated alerts. Failure logs and offline events were visible in the admin portal — but only if someone went looking. There was no proactive notification when a machine went offline or a critical error occurred.
No automated refill management. Stock levels were trackable, but threshold-based refill alerts and inventory planning had to be done manually.
No business-owner reporting. No daily/weekly digest of dispense volumes, machine uptime, revenue per machine, or refill cadence.

The platform was built like an engineer's tool — fine for the dev team, poor for the people actually running the business.

Why polling architectures bleed money

Polling-based IoT systems were the default ten years ago because edge devices couldn't reliably hold open persistent connections. So devices "phoned home" on a schedule.

Today, that's no longer true. Cheap edge hardware (ESP32, Raspberry Pi, embedded Linux boards) handles MQTT, WebSocket, and CoAP without breaking a sweat. The infrastructure to support it (Mosquitto, EMQX, AWS IoT Core) is mature and cheap.

But teams keep building polling systems because:

It's the architecture they're familiar with from web apps
It feels "simpler" upfront — easier to reason about
The cost penalty doesn't show up until you've deployed at scale

By the time you're at 30 machines pinging every 10 seconds, you're processing 22 million pings a month for the privilege of confirming that mostly nothing is happening.

What we changed

The fix had three parts. None of them were exotic.

1. Moved from polling to event-driven communication

Machines now stay connected over MQTT and only transmit when something actually happens — a dispense event, a stock-level threshold crossing, a hardware error, a daily heartbeat. The cloud no longer asks "are you alive?" 8,640 times a day. The machine tells the cloud when it's not.

Result: ping volume dropped by over 99%.

2. Migrated off GCP onto a single tuned VPS

For an IoT platform of this size, the compute requirements are actually modest. Most of the cost was coming from the request-pricing model and a constellation of "managed" services that each carried their own line item.

We moved to a properly-sized VPS running:

An MQTT broker for device communication
A Postgres database
A FastAPI backend
An Angular admin/vendor portal
Nightly off-server backups

One bill. One server. Predictable costs. Total: ~₹10,000/year.

For a network of this size, this is the right move. (When the network grows past a certain threshold, we'll re-architect to horizontal scaling — but premature cloud-native is the most expensive thing in IoT.)

3. Built the operational layer

While we were re-architecting the data flow, we added the missing operations tooling:

A real-time vendor portal (machine status, stock levels, dispense history per machine — designed for non-technical users)
Automated failure alerts pushed to the right person, not just logged
Inventory and refill threshold alerts with refill planning views
A separate clean admin layer for the operations team

These weren't extras. Without them the platform was technically running but operationally invisible.

The result

Before:

₹30,000/month cloud bill (₹3.6L/year)
~22 million pings/month
Admin-only visibility, no operational tooling for vendors
Manual ops work to translate logs into action

After:

~₹10,000/year cloud bill
~99% reduction in network traffic
Real-time vendor portal designed for non-technical users
Automated alerts for failures and low-stock conditions
Two weeks to audit, six weeks to migrate

Net savings to the business: ₹3.5 lakh per year, every year, plus a platform that's actually maintainable.

What every IoT founder should take from this

1. Polling is almost always the wrong default in 2026.

Unless you have a hard reason your devices can't hold a persistent connection, build event-driven from day one. MQTT, WebSocket, and message queues are mature and cheap.

2. "Cloud-native" is not the same as "cheap" — and is often the opposite at small scale.

Managed services have a fixed cognitive and financial overhead per service. For a small-to-medium IoT platform, a single well-tuned VPS often beats a Kubernetes-on-managed-cloud setup by 80–90% on cost without losing reliability.

3. An admin portal is not an operations layer.

If your vendors and field staff can't act from the screen they look at every day, you have a tech tool, not a platform. Operational dashboards designed for the actual users are part of v1, not v2.

4. Audit before you scale.

If you're spending more than ₹15,000/month on cloud for an IoT platform of fewer than 100 devices, something is wrong. Get the architecture audited before you raise the next round and bake the leak into a bigger budget.

Need an architecture audit?

If you're running an IoT platform — vending, fleet, agriculture, smart-building, anything — and the cloud bill is climbing, we're happy to take a look. Two weeks, fixed scope, written report with concrete recommendations. No obligation to engage further.

Bengaluru-based, working with clients across India and globally.

Get in touch · WhatsApp: +91 9677749648

A client running an IoT-powered recycled-bag vending machine network came to us with a simple question: "Our cloud bill is ₹30,000 every month. Can you check why?"

It didn't sound dramatic. Many fast-growing IoT setups hit five-figure monthly cloud bills, and the assumption is usually "we need to grow into it."

But ₹30K/month for a network of vending machines that mostly sit idle, dispensing a recycled bag every few hours? That's not a scaling cost. That's an architecture problem.

What was actually running

The platform was sitting on Google Cloud Platform with a few worker VMs, a managed database, and a handful of Cloud Functions handling business logic. On paper, a clean stack.

The problem was hidden in two places.

1. Every machine was pinging the cloud every 10 seconds

Each vending machine sent a heartbeat, a status snapshot, and a stock-level packet to the backend on a fixed 10-second interval — whether anything had happened or not.

GCP charges by request count, by data ingress, by function invocation, by database write. The pricing model isn't generous when you're hitting it 8,640 times per day per device.

2. The platform had an admin portal — but no operations layer

There was a working admin portal. The tech team could see machine logs, online/offline status, and pull up event history. That part wasn't broken.

What was missing was the operational layer — the day-to-day tooling that vendors, refill staff, and the business owner actually need to run the network.

Specifically:

No vendor-friendly dashboard. Vendors handling refills and field operations had no clean view of "which machines need attention, where, and how urgently." The data was buried in admin-grade screens.
No automated alerts. Failure logs and offline events were visible in the admin portal — but only if someone went looking. There was no proactive notification when a machine went offline or a critical error occurred.
No automated refill management. Stock levels were trackable, but threshold-based refill alerts and inventory planning had to be done manually.
No business-owner reporting. No daily/weekly digest of dispense volumes, machine uptime, revenue per machine, or refill cadence.

The platform was built like an engineer's tool — fine for the dev team, poor for the people actually running the business.

Why polling architectures bleed money

Polling-based IoT systems were the default ten years ago because edge devices couldn't reliably hold open persistent connections. So devices "phoned home" on a schedule.

But teams keep building polling systems because:

It's the architecture they're familiar with from web apps
It feels "simpler" upfront — easier to reason about
The cost penalty doesn't show up until you've deployed at scale

By the time you're at 30 machines pinging every 10 seconds, you're processing 22 million pings a month for the privilege of confirming that mostly nothing is happening.

What we changed

The fix had three parts. None of them were exotic.

1. Moved from polling to event-driven communication

Result: ping volume dropped by over 99%.

2. Migrated off GCP onto a single tuned VPS

We moved to a properly-sized VPS running:

An MQTT broker for device communication
A Postgres database
A FastAPI backend
An Angular admin/vendor portal
Nightly off-server backups

One bill. One server. Predictable costs. Total: ~₹10,000/year.

3. Built the operational layer

While we were re-architecting the data flow, we added the missing operations tooling:

A real-time vendor portal (machine status, stock levels, dispense history per machine — designed for non-technical users)
Automated failure alerts pushed to the right person, not just logged
Inventory and refill threshold alerts with refill planning views
A separate clean admin layer for the operations team

These weren't extras. Without them the platform was technically running but operationally invisible.

The result

Before:

₹30,000/month cloud bill (₹3.6L/year)
~22 million pings/month
Admin-only visibility, no operational tooling for vendors
Manual ops work to translate logs into action

After:

~₹10,000/year cloud bill
~99% reduction in network traffic
Real-time vendor portal designed for non-technical users
Automated alerts for failures and low-stock conditions
Two weeks to audit, six weeks to migrate

Net savings to the business: ₹3.5 lakh per year, every year, plus a platform that's actually maintainable.

What every IoT founder should take from this

1. Polling is almost always the wrong default in 2026.

Unless you have a hard reason your devices can't hold a persistent connection, build event-driven from day one. MQTT, WebSocket, and message queues are mature and cheap.

2. "Cloud-native" is not the same as "cheap" — and is often the opposite at small scale.

3. An admin portal is not an operations layer.

If your vendors and field staff can't act from the screen they look at every day, you have a tech tool, not a platform. Operational dashboards designed for the actual users are part of v1, not v2.

4. Audit before you scale.

Need an architecture audit?

Bengaluru-based, working with clients across India and globally.

Get in touch · WhatsApp: +91 9677749648

How We Cut Cloud Cost by 97% on an IoT Platform

What was actually running

1. Every machine was pinging the cloud every 10 seconds

2. The platform had an admin portal — but no operations layer

Why polling architectures bleed money

What we changed

1. Moved from polling to event-driven communication

2. Migrated off GCP onto a single tuned VPS

3. Built the operational layer

The result

What every IoT founder should take from this

Need an architecture audit?

Have a Project to Discuss?

How We Cut Cloud Cost by 97% on an IoT Platform

What was actually running

1. Every machine was pinging the cloud every 10 seconds

2. The platform had an admin portal — but no operations layer

Why polling architectures bleed money

What we changed

1. Moved from polling to event-driven communication

2. Migrated off GCP onto a single tuned VPS

3. Built the operational layer

The result

What every IoT founder should take from this

Need an architecture audit?

Have a Project to Discuss?