The TOPS Lie: Why Your Phone's "AI Chip" Numbers Mean Almost Nothing

Alright, let's talk silicon.

Your new phone is doing somewhere between 38 and 78 TOPS. Or maybe it's 45. The number depends entirely on which press release you read, which generation of chip they're referencing, and—crucially—whether the manufacturer counted the CPU cores, the GPU shader units, and the dedicated NPU all at once and then just... added them together.

Welcome to the TOPS Wars. It's benchmark theater, and your $1,200 is the ticket price.

What "TOPS" Actually Measures (And What It Doesn't)

TOPS stands for Tera Operations Per Second—a measure of how many trillion arithmetic operations a chip can execute in one second. Sounds rigorous. It's not.

The problem is there's no ISO standard for what counts as an "operation." Apple counts INT8 multiply-accumulate operations on their Neural Engine. Qualcomm's Snapdragon 8 Elite (Hexagon NPU) reports in a mix of INT4 and INT8 precision depending on the workload. These are not equivalent figures being compared on equal terms—they're apples-to-kumquats comparisons dressed up in the same unit. Independent coverage of MediaTek's Dimensity 9400 NPU announcement noted that reviewers couldn't fully replicate the headlining figure using the same test conditions—not unusual when vendors define their own measurement methodology.

Meanwhile, the comparison charts in every carrier retail store line these numbers up like they're standardized.

(Imagine if car manufacturers each got to define what "horsepower" meant. A Tesla goes 450 "horsepower." A Dodge Challenger goes 485 "horsepower." A bicycle goes 200 "horsepower" if you count the rider, the wind, and a favorable gradient. That's where we are.)

The Sustained Load Gap

Even if you accept the peak TOPS figure at face value, there's a second problem: sustained performance versus burst performance.

In hardware QA, this is where prototypes used to break. You could run a chip at advertised clock speeds for exactly as long as the benchmark lasted—sometimes 30 seconds, sometimes two minutes—and then thermal throttling would slam the frequency down to something sustainable. The benchmark looked great. The real-world experience didn't.

NPUs have the same problem, and it doesn't get talked about enough.

The iPhone 16 Pro's A18 Pro chip has a reported 35 TOPS Neural Engine. That figure represents peak burst. Run a sustained generative AI task—on-device image generation, real-time video processing, extended transcription—and device thermals start climbing. Gaming phone thermal throttling is the most visible version of this problem: even with intensive graphics, a $1,200 flagship can't maintain peak performance for 20 minutes. Independent sustained-load testing (GSMArena's performance endurance scores are the closest publicly available proxy) consistently shows that thermal management, not peak TOPS, determines the experience after the first few minutes. Apple doesn't publish sustained NPU performance numbers. Neither does Qualcomm. Neither does anyone.

The benchmark is a sprint. Your actual use case is a marathon.

What the NPU Actually Runs On Your Device

Here's the part of the marketing deck they skip: a significant portion of "AI features" on your phone don't use the NPU at all, and some of what your phone advertises as AI still phones home to a server.

Let's go through the actual breakdown (this is approximate, because manufacturers don't publish task routing tables—which should tell you something):

Genuinely on-device NPU tasks:

Face ID / fingerprint biometrics (must be local, privacy reasons)
Photo noise reduction and computational HDR (this is where NPUs actually earn their keep)
Live captioning and transcription (Pixel's Recorder app, iOS Live Captions)
Basic autocorrect and next-word prediction

Marketed as "AI," often runs on CPU/GPU rather than dedicated NPU:

Most real-time camera filters
Photo editing sliders ("AI Enhance" buttons on Samsung Gallery, depending on tier)
Some "smart" notification summaries on lower-tier chips

Marketed as "on-device AI," may route to cloud depending on model size and tier:

Larger language model inference—on flagship hardware this is increasingly local (Gemini Nano on Pixel, on-device Phi-3 variants), but on mid-range chips the heavy lifting often offloads
Generative photo editing on non-flagship hardware
"AI search" in many third-party apps

When Apple launched Apple Intelligence, press coverage treated it as a triumph of on-device AI. The infrastructure trap of AI and cloud routing is more nuanced: the system uses on-device processing for triage and smaller model tasks, while more demanding language model requests route through Private Cloud Compute—Apple's privacy-hardened server infrastructure. That's a defensible architecture and an honest implementation. It's also not the full AI-on-chip story the headlines told. (Apple's own documentation covers this distinction in detail, for anyone who wants to go primary-source on it.)

The category isn't binary. It's a spectrum by model size, chip generation, and task type—and the marketing rarely tells you where a specific feature sits on that spectrum.

The 2026 NPU Arms Race in Practice

Coming out of this year's Barcelona showings, you heard a lot about "next-generation AI silicon." Qualcomm's next-gen Snapdragon previews. Samsung's Exynos 2600 positioning. Arm's silicon roadmap presentations—which covered Cortex-X925 CPU performance improvements and separate NPU block improvements, often collapsed into a single "AI performance" headline that obscures which block is doing which work.

That last one is worth noting: Cortex-X925 is a CPU core. The NPU in Arm's reference designs is a separate block (the Ethos series). When you see "Arm's AI silicon" in a headline, check whether they're citing CPU improvements, NPU improvements, or an aggregate of both—because those are different components doing different jobs.

MWC 2026 smartphone trends showed the real divide: every slide deck led with TOPS numbers. Almost no presentation led with: here are real tasks, here is measured latency, here is battery consumption per inference, here is performance at 15 minutes of sustained load.

I sat through enough vendor briefings in my QA days to recognize what that omission means. When a company wants you to evaluate their chip, they give you the number that looks best and design the demo to hit that number for the duration of the demo. When the chip performs well across the board, they show you everything.

The sustained-load data is rarely in the press deck because it's not where the headline comes from.

When On-Device AI Actually Earns Its Keep

This isn't a nihilism piece. Some of this is real.

The computational photography pipeline on modern flagships is genuinely impressive, and it runs predominantly on dedicated hardware. The difference between a Pixel 9 Pro's Night Sight and what a mid-range chip produces is measurable, visible, and attributable to the NPU doing real work on real pixel data with real low-latency constraints. Battery cost for that burst: manageable.

On-device biometrics are non-negotiable from a privacy standpoint and they work. Live transcription on Pixel and recent iPhones is legitimately good. Noise reduction on voice calls using local ML has improved call quality in ways you notice.

The issue isn't that NPUs are fake. The issue is that the TOPS figure is a marketing abstraction that doesn't help you evaluate any of that. 38 TOPS that runs computational photography efficiently is more useful than 78 TOPS where half the "AI features" route to a server anyway.

What to Actually Look For

If you're buying a phone in this cycle and trying to parse the AI chip claims, here's how to cut through:

Ignore peak TOPS. It's not standardized, not sustained, and not comparable across vendors.

Ask which features run locally vs. cloud. If the manufacturer doesn't answer this clearly in their documentation, that's usually a signal the answer involves more server-side processing than they want to headline.

Check sustained performance under load. GSMArena's performance endurance test is the most consistently available independent benchmark for this. That number is closer to what you'll actually live with.

Match the NPU to your use case. If you don't edit video on your phone, the video-optimized NPU pipeline doesn't matter to you. If you do live transcription, verify whether that specific feature is documented as local or server-routed.

Battery impact is an honest signal. Your phone's battery is being murdered by good intentions—if a feature measurably increases battery drain, it's doing real computational work on-device. If it barely touches battery, either the model is very efficient—or the heavy computation is happening elsewhere and you're trading battery cost for latency instead.

The Verdict for Your Wallet

The TOPS number on your phone's spec sheet is the new megapixel race. It's a big number designed to win a comparison chart, not to tell you whether the chip will run the features you actually use, at the speed you'll actually notice, without cooking your battery in the process.

The real AI silicon story in 2026 is about task routing (what actually runs on-device), sustained thermal performance (what the chip does after the benchmark ends), and honest power consumption data (what this costs you over a day of real use).

None of those numbers are on the box. Which means you need to go find them before you buy.

Stay wired.