β€’
Drizz raises $2.7M in seed funding β€’
β€’
Featured on Forbes
β€’
Drizz raises $2.7M in seed funding β€’
β€’
Featured on Forbes
Logo
Schedule a demo
Blog page
>
Blue Green Deployment: Testing During Cutover

Blue Green Deployment: Testing During Cutover

Blue-green deployment explained. How it works, how mobile teams use it for backend APIs, vs canary and rolling, and testing during cutover.
Author:
Asad Abrar
Posted on:
June 22, 2026
Read time:

Blue green deployment (also called green blue deployments or green blue deploy) runs two identical production environments side by side. One (blue) serves live traffic. The other (green) hosts new release. You test on green, then switch router so green becomes live. If something breaks, you switch back to blue in seconds.

For mobile teams, blue-green deployment applies to your backend infrastructure, not your app binary. Users control when they update their app. You can't "switch" their phone to a new version. But you can blue-green deploy API servers, databases, and microservices that power your mobile app, giving your backend releases designed to avoid user-visible downtime while your app release follows a canary strategy.

How does blue green deployment work?

The blue green deployment strategy has five steps.

Step 1: Two identical environments. Blue and green run identical application infrastructure: servers, load balancers, and caches. The database is typically shared between both environments (duplicating a production database is operationally complex). Blue handles all production traffic. Green sits idle or handles internal testing.

Step 2: Deploy to green. Push new release to green environment. Blue continues serving users. No user sees new code yet.

Step 3: Test on green. Run your full test suite against green: integration tests, smoke tests, load tests. Green has production-equivalent infrastructure, so test results are representative. This is where quality gates enforce pass/fail criteria before switch.

Step 4: Switch traffic. Once green passes all tests, update load balancer or DNS to route traffic from blue to green. Users hit new code. The cutover takes seconds.

Step 5: Keep blue as rollback. If green shows problems in production (elevated error rates, latency spikes), switch traffic back to blue. Rollback is instant because blue still runs previous version.

After green is stable, blue becomes staging environment for next release. The cycle repeats.

What is blue green deployment vs canary deployment?

These are two most common deployment strategies for production releases. They solve different problems.

Aspect Blue-Green Deployment Canary Deployment
Traffic Switch All-at-once (100% cutover) Gradual (1% β†’ 10% β†’ 50% β†’ 100%)
Infrastructure Two full environments running simultaneously One environment, traffic split by percentage
Cost Higher (double infrastructure during cutover) Lower (shared infrastructure, split traffic)
Rollback Speed Instant (switch back to blue) Fast (reduce canary to 0%)
Risk Exposure All users hit new code at once after switch Only canary % exposed to new code
Testing Approach Full testing on green before switch Monitor live metrics from canary group
Mobile Backend Fit Strong (API servers, microservices) Strong (API servers, feature flags)
Mobile App Fit Not applicable (users control their version) Strong (staged rollout via app store)

The blue-green deployment vs canary decision depends on your risk tolerance. Blue-green gives you a clean cutover with instant rollback. Canary gives you gradual exposure with real-user monitoring. Many mobile teams use blue-green for backend APIs and canary (staged rollout) for app releases. The blue green deployment vs canary choice isn't either/or. It's both, at different layers.

Where does rolling deployment fit?

A rolling deployment updates instances one at a time. Instead of switching all traffic at once (blue-green) or splitting traffic by percentage (canary), you replace individual servers sequentially.

Strategy How It Works Rollback Best For
Blue-Green Switches all traffic instantly between two identical environments. Instant APIs, microservices, zero-downtime DB migrations.
Canary Gradually shifts a small % of traffic to the new version. Fast Mobile app releases, feature flags, risk mitigation.
Rolling Replaces instances one at a time across the cluster. Slow Stateless services, large clusters, cost efficiency.

Rolling deployments are usually cheaper and simpler operationally (no duplicate environment), but they provide slower rollback and temporarily run mixed versions during update. Modern Kubernetes rolling updates and ECS instance refreshes have made rolling safer than it used to be. For mobile backend APIs where instant rollback matters, blue-green or canary still provides more control.

How do you test during a blue green cutover?

Testing during cutover is where most guides stop at "run your tests on green." Here's full testing sequence a production blue-green deployment needs. Each test type serves a different purpose during switch.

Pre-switch testing (on green, no live traffic):

  • Smoke tests: critical API endpoints return expected responses
  • Integration tests: services communicate correctly (auth, payment, notifications)
  • Contract tests: API responses match format your mobile app expects
  • Load tests: green handles expected traffic volume with acceptable latency
  • Database migration validation: schema changes applied correctly, data intact

During-switch testing (traffic moving from blue to green):

  • Monitor error rates in real time. Compare green's error rate to blue's baseline.
  • Monitor latency percentiles (p50, p95, p99). Latency spikes during cutover indicate resource contention.
  • Verify no in-flight requests are dropped during switch. Stateful connections (WebSocket for real-time features) need special handling.

Post-switch testing (green is live):

  • Run same smoke tests against production URL to confirm switch completed
  • Monitor crash-free rate in your mobile app (backend errors can cause app crashes)
  • Check push notification delivery (new backend version might break notification pipeline)
  • Validate deep link resolution if URL patterns changed
  • Keep blue running for 30-60 minutes as a hot standby before decommissioning

How does blue green work for mobile backend APIs?

Your mobile app calls APIs. Those APIs run on servers. Blue-green deploys those servers. Here's what this looks like for a mobile team.

Your API infrastructure:

  • Authentication service
  • Product catalog service
  • Order processing service
  • Push notification service
  • Feature flag service

Each service can be blue-green deployed independently. When you deploy a new version of order processing service, green version runs alongside blue. Your mobile app doesn't know or care which environment it's hitting because load balancer handles routing.

The mobile-specific concern: backward compatibility. Your app in wild runs multiple versions (latest plus 2-3 older versions). Your green API must handle requests from all active app versions, not just latest. This is where contract testing and API versioning protect you from breaking older clients during cutover.

How do database migrations work in blue-green deployments?

This is where blue-green gets hard. Application tiers are easy to duplicate. Databases are not.

Most blue-green implementations share a single database between blue and green environments. Running two fully synchronized production databases is operationally complex and introduces data consistency risks. Shared databases mean both blue and green application versions must work against same schema at same time during cutover.

That constraint requires backward-compatible schema migrations. You can't rename a column and deploy green in one step because blue still reads old column name. The standard approach is expand/contract:

  1. Expand: Add new column (or table) without removing old one. Deploy green. Green writes to both old and new columns. Blue continues reading old column. Both versions work against same schema.
  2. Migrate data: Backfill new column from old data. Verify consistency.
  3. Contract: After blue is decommissioned and only green is live, drop old column in a separate migration during next deployment cycle.

This means every schema change takes two deployment cycles to complete. It's slower but prevents scenario where a rollback to blue fails because blue can't read new schema.

For mobile teams, this matters because your API often serves data directly from database. A broken migration doesn't just affect backend. It causes 500 errors that crash mobile app. Test schema migrations in pre-switch phase by running green against shared database with real query patterns before routing any production traffic.

How do you implement blue-green on AWS and Kubernetes?

Blue-green deployment AWS has three common patterns:

  • ECS with CodeDeploy: The most production-ready approach. Create two ECS target groups (blue and green) behind an Application Load Balancer. AWS CodeDeploy manages traffic shift between target groups, supports automatic rollback on CloudWatch alarm triggers, and handles connection draining. The ECS deployment controller type is set to CODE_DEPLOY instead of default ECS.
  • Elastic Beanstalk environment swap: Beanstalk maintains two environments. You deploy to inactive one, test it, then swap CNAMEs. Simple but less granular than ECS + CodeDeploy.
  • Route 53 weighted routing: Shift DNS weight from blue to green. Works for any infrastructure but has downside of DNS propagation delays (TTL-dependent, typically 60-300 seconds).

For most mobile backend teams on AWS, ECS + CodeDeploy + ALB target groups is standard pattern. It gives you automated traffic shifting, CloudWatch-based rollback triggers, and zero manual steps during cutover.

Blue-green deployment Kubernetes uses two approaches:

  • Service label selectors: Deploy green pods with a version: green label. Run tests against a temporary Service that targets green. Once tests pass, update production Service selector from version: blue to version: green. Traffic switches instantly.
  • Argo Rollouts or Flagger: These controllers automate full blue-green lifecycle with metrics-based promotion. Argo Rollouts supports analysis runs that check Prometheus metrics before promoting green. Flagger integrates with Istio or Linkerd for traffic management.

Both approaches follow same pattern: deploy green, test green, switch traffic, monitor, keep blue as rollback.

What is a blue green deployment example for a mobile team?

A food delivery app team deploys their backend APIs using blue-green. Here's a real deployment cycle:

Monday 10 AM: team deploys a new version of order service to green. Green runs on same AWS ECS cluster as blue, behind a separate target group.

Monday 10:15 AM: automated tests run against green. Smoke tests validate order creation, payment processing, and status updates. Integration tests confirm notification service receives order events. Load tests verify green handles 3,000 orders/minute.

Monday 10:30 AM: all tests pass. The ALB switches traffic from blue to green. Green is live. Blue becomes hot standby.

Monday 10:35 AM: monitoring shows green's error rate at 0.02% (same as blue's baseline). Latency p95 at 180ms (within threshold). No spikes.

Monday 11:30 AM: green has been stable for an hour. Blue is decommissioned and becomes staging environment for next release.

User-visible downtime: none (DNS propagation and in-flight requests handled by ALB). Time from deploy to live: 30 minutes. Rollback capability: instant for first hour.

Meanwhile, mobile app release follows a separate canary strategy through Google Play staged rollout. The backend deploys with blue-green. The app deploys with canary. Two deployment strategies for two different layers, each optimized for its constraints.

The vibe testing layer runs on top of both: Drizz executes plain English tests against production API (post-switch validation) and against app on real devices (post-canary validation). Vision AI verifies that full stack works end-to-end after both backend and app updates ship.

FAQs

What is bluegreen deployment?

Blue green deployment maintains two identical production environments. You deploy to inactive one, test it, then switch traffic. If problems appear, you switch back instantly. It's designed to avoid user visible downtime during releases.

What is difference between blue-green and canary deployment?

Blue-green switches 100% of traffic at once between two environments. Canary shifts traffic gradually (1% β†’ 100%). Mobile teams often use both: blue-green for backend APIs, canary (staged rollout) for app releases.

How do database migrations work in blue-green deployments?

Blue and green typically share one database. Schema changes must be backward-compatible so both versions work against same schema during cutover. Use expand/contract pattern: add new columns first, migrate data, then remove old columns in a later deployment cycle.

Is blue-green deployment expensive?

Yes, it costs more than rolling or canary because you run double infrastructure during cutover. On cloud platforms, you can mitigate this by provisioning green on-demand and tearing it down after blue is decommissioned. The cost is justified when instant rollback and pre-switch testing outweigh infrastructure spend.

When should you NOT use blue-green deployment?

Skip blue-green when your infrastructure is too large to duplicate cost-effectively, when your database migrations can't be made backward-compatible, or when gradual rollout (canary) provides better risk control. Stateful applications with complex session management also make blue-green cutover harder.

Can blue-green deployment be fully automated?

Yes. AWS CodeDeploy automates ECS blue-green deployments end-to-end: deploy green, run health checks, shift ALB traffic, roll back on CloudWatch alarms. On Kubernetes, Argo Rollouts and Flagger automate lifecycle with metrics-based promotion. Most production teams automate full pipeline and only intervene on failures.

‍

About the Author:

Asad Abrar
Co-founder & CEO, Drizz
Ex-Coinbase PM and IIT Kharagpur grad killing flaky mobile tests by day, and obsessing over F1 lap timings by night.
Schedule a demo