Mobile games and graphics-heavy apps don't expose a usable UI tree. That's why traditional test automation breaks the moment you touch them. Drizz's Game Mode runs entirely on vision, takes plain-English commands, and caches what it sees, so the second run takes seconds.

THE PROBLEM

Games don't expose what automation frameworks need to see.

Traditional mobile automation, Appium and the stack underneath it (UiAutomator2 on Android, XCUITest on iOS), works by reading the native UI hierarchy. Every native button, label, and container shows up as a node with properties you can target: a resource ID, an accessibility label, a bounding box.

Games don't render that way.

A game built on OpenGL, Unity, Cocos2D, or any canvas-style renderer paints its UI directly to a single graphics surface. To Appium, that entire surface is one node, often just a GLView or equivalent root container. There's no <Button> for "Play." No accessible label for "Collect Reward." No targetable element for the candy you're trying to swap. Even when a human can clearly see the controls, the framework can't.

This isn't a niche issue. It's the architecture of every modern mobile game.

WHAT IT TAKES TODAY

Around 4 hours per screen, just to write a single test.

The standard workarounds are well-documented: image recognition layered on top of Appium, raw coordinate-based tapping, or building a custom backend that bridges your game engine to the test framework. All of them work. None of them are quick.

Teams report routinely spending around four hours per complex game screen, pulling whatever signal they can, mapping coordinates, tuning image-match thresholds, just to get one reliable test working.

Multiply that across a real game's flow: home screen, level select, gameplay, store, settings, recharge, reward popups. The hours don't add up to a budget. They add up to "we don't test this anymore."

That's how the highest-stakes user flows in many gaming apps end up with the lowest test coverage.

THE FEATURE

Game Mode: vision in, action out.

Game Mode is a vision-only execution path inside Drizz, designed for any app where the underlying UI tree isn't reliable, or doesn't exist at all.

Here's how it works:

1. You describe what to do, in plain English."Tap Play." "Open the store." "Recharge with 50 credits." "Return to the home screen."

2. Drizz looks at the screen, not the DOM.There's no UI-tree extraction. No coordinate hardcoding. No image-match script tuning. Drizz takes the visual frame, understands what's on it, and decides where to act.

3. It executes.The action runs on the actual screen, the same way a player would tap, swipe, or drag.

That's the whole loop. Plain English in. The right action out. No tree required.

WHERE IT'S BEING USED

Real production gaming apps, end to end.

We've been running Game Mode against real titles, including Candy Crush and active gaming customers, on flows that include:

Navigating through the app
Playing core gameplay loops
Recharging in-app wallets

All of these are flows traditional automation either can't reach or reaches very expensively. Game Mode runs them on a vision-only path.

SPEED: THE ONE THING GAMES CAN'T COMPROMISE ON

Caching turns the second run into seconds.

Vision models are powerful, but they're not free. And games come with a hard constraint: they have to be fast. Real gameplay doesn't wait for inference.

Game Mode is built around two ideas:

1. The first run does the hard work. Vision interprets every step, understands the screen, and stores what it learned.

2. Subsequent runs hit the cache. The same script, run again, doesn't make AI calls. It pulls from cache, deterministic, no inference latency, no flakiness from model variance.

This is the same Intelligent Visual Caching layer that powers Drizz's regular execution. In Game Mode, it does even more lifting: the screens are denser, the work to interpret them the first time is heavier, and the value of skipping that work on every rerun compounds faster.

We're still actively pushing first-run latency down, that's the next chunk of work, but caching already makes reruns production-grade.

WHY IT MATTERS

Coverage where coverage was a guess.

Most QA teams shipping a game have a tier of flows they describe with a shrug, "we test that manually," or "we don't really test that." Recharge flows. Mid-game state transitions. Specific reward triggers. Levels behind a paywall.

Game Mode changes which flows are reachable by automation:

Gameplay loops that depend on visual state
Reward and recharge flows that mix native UI with rendered canvases
Tutorial and onboarding sequences with motion and animation
Any flow where the "right element" is something a human eye recognizes, not something a parser exposes

You don't maintain coordinates that don't survive a layout change. You describe the goal.

GETTING STARTED

Live for customers today.

Game Mode is available now to Drizz customers. It's still maturing on the speed front, first-run latency is the next thing on our list, but caching already makes reruns fast enough for CI, and the surface of what's testable has fundamentally changed.

If your app has a canvas, a game loop, or any UI surface that traditional selectors can't see, book a walkthrough and we'll run it on yours.

‍