Personal AI

Why Apple's neural engine matters more than its marketing

The Apple Neural Engine doesn't get featured the way the GPU and unified memory do. It's the part of the chip that does the most for the on-device personal-AI story Apple is supposedly betting on. Worth being plain about why it matters and why nobody talks about it.

Sid Smith

05 Sep 2025 • 5 min read

The Apple Silicon marketing focuses on three things: the CPU performance, the GPU performance for graphics and gaming, and the unified memory bandwidth. The Neural Engine, the dedicated ML accelerator that's been on every Apple Silicon chip since the M1, almost never makes the keynote slides. It gets a mention as a number-of-cores footnote and disappears from the conversation.

Here's the thing worth being plain about: what the ANE actually does, why it's underemphasized, and why it's the part of the silicon doing the most for the on-device personal-AI story Apple is supposedly betting on.

How Apple Silicon routes ML workloads across the Neural Engine, GPU via MLX, and CPU, the ANE doing most invisible work, GPU getting most marketing energy, CPU as substrate.

What the Neural Engine actually does

The ANE is a dedicated neural-network accelerator. It runs ML inference workloads at very low power (~1-2W) and very low latency. It's tuned for the matrix-multiply patterns that dominate neural-network inference. On the M4 generation it has 16 cores; on the M4 Pro / M4 Max / M3 Ultra it scales similarly. The actual operations-per-second number is large; the relevant practical number is "lots of inference compute that doesn't draw much power."

In normal use, the ANE handles:

Computer vision pipelines, face detection, scene analysis, object recognition, OCR. Most of the on-device CV in macOS and iOS goes through the ANE.
Audio processing, speech recognition (offline), noise reduction, audio classification. The transcription that happens locally on Apple devices is mostly ANE-driven.
Image-generation tasks at the smaller-model scale, the on-device portions of Image Playground, the smart-features in Photos.
Natural-language tasks at small-model scale, the writing tools, autocorrect-class features, the text-suggestion plumbing.
The on-device portions of Apple Intelligence, the 3B-class model that powers the consumer-facing AI features runs on ANE for inference where it can.

These are quiet workloads. You don't see the ANE working; you see the features it powers, and you take them for granted because the device handles them at low power without thermal throttling.

Why it's underemphasized

Three reasons the ANE doesn't get the marketing energy it deserves.

It's invisible. GPU performance shows up in benchmark scores, gaming performance, and creative-app rendering speeds. The ANE shows up in "your features work without your battery dying," which is harder to demo and harder to put on a keynote slide.

The third-party developer story is thin. Most third-party developers who do ML on Apple Silicon target the GPU via Metal or MLX, because the developer story for MLX is more flexible than the Core ML / ANE pathway. The ANE is Apple's dedicated accelerator for Apple's own workloads more than it is a developer surface, and that limits the visibility.

The performance-per-watt story doesn't fit the comparison narrative. When you're competing on "tokens per second" against an NVIDIA GPU, the ANE looks bad on the headline number. The ANE wins on watts-per-token by a wide margin, but watts-per-token isn't the comparison consumers know to ask for.

The result is that the ANE is doing a lot of the actual work the Apple Silicon plus open-weights inflection point depends on, while getting almost no credit for it.

What this means for the on-device-AI story

A few specific implications for the personal-AI crowd.

The 3B-class on-device model that ships with Apple Intelligence runs on ANE for inference. That's why it's instant and doesn't drain the battery. The model is small because the ANE is well-suited to small models running fast at low power; growing the model to 7B or 12B (which would meaningfully expand what on-device AI can do) requires either a bigger ANE or moving more of the inference to the GPU, with worse power characteristics.

Workloads that would otherwise be cloud-bound become viable on-device. Speech-to-text, real-time transcription, OCR, image recognition, all of these can run on the ANE without hitting the network. The privacy and latency benefits are real.

The GPU vs ANE routing decision matters. For larger-model inference (7B+), the GPU is the right place because the ANE doesn't have enough memory bandwidth or capacity for those workloads. For smaller-model inference and the inference-as-a-system-feature use cases (autocorrect-grade), the ANE is the right place because the GPU is overkill and would draw more power. The OS handles this routing; users don't see it; the routing is what makes the device feel responsive.

MLX's ANE story is improving but not parity. MLX-based training and inference defaults to the GPU. ANE acceleration in MLX exists but is less complete than the GPU path. As MLX matures, more workloads can route to the ANE for the use cases where it's the right fit.

What changes when ANE gets a real upgrade

The ANE's ceiling is the binding constraint on Apple's on-device AI ambitions. A meaningful ANE upgrade in the M5 or M6 generation would unlock several things.

A bigger on-device model becomes viable. Apple Intelligence's 3B model is small because the ANE has constraints. Doubling or tripling the ANE capacity would let Apple ship a 7-12B on-device model, which would close most of the visible-quality gap with hosted models for consumer use cases.

Multi-agent on-device workloads become possible. Currently the on-device AI surface handles one conversation, one model, one workload at a time. A capable enough ANE could run multiple smaller models concurrently, the pattern that mature personal-AI workflows want.

The "good enough on-device" bar moves up. Workloads that today require routing to Private Cloud Compute (or to ChatGPT-as-fallback) become local-feasible. The privacy and latency benefits compound.

Third-party developer story becomes more compelling. If the ANE can credibly run useful-quality models, more developers will target it. The current "everyone targets the GPU" pattern reflects today's ANE constraints; a bigger ANE changes the calculation.

Why this matters for the strategic position

The follow-through gap I wrote about for Apple Intelligence is partly an ANE-capacity gap. Apple's commitment to running things on-device whenever possible runs into the ceiling of what the current ANE can comfortably handle. The keynote demos suggest capabilities that would require either a bigger ANE or routing more to the GPU; the shipped versions stay within the ANE's comfortable range and are accordingly more constrained.

The next ANE generation is therefore the most-watched piece of Apple's silicon roadmap from an AI perspective. Not the GPU (which is already very capable for the workloads it handles). Not the unified memory (which scales by model rather than by ANE capacity). The ANE is where Apple's on-device AI strategy lives or dies.

What to watch

Two specific things over the next twelve months.

The M5 ANE specs. When the M5 generation lands, the ANE specs are the most-important AI-relevant number. Bigger ANE in the next generation = more ambitious Apple Intelligence in the year after. Same-size ANE = continued constraint = continued gap-widening between Apple's promised features and what ships.

Whether MLX-on-ANE matures. If MLX gets first-class ANE support comparable to its GPU support, the developer story improves and more local-AI workloads can use the lower-power path. If the GPU stays the only credible MLX target, the ANE remains an Apple-internal accelerator with limited third-party reach.

The Neural Engine is the part of Apple Silicon that does the most for the on-device-AI story while getting the least marketing attention. That's a story worth understanding for anyone making bets on the Apple platform's AI trajectory. The headline numbers are about the GPU and the memory; the actual on-device AI rests on the ANE; the strategic position depends on the ANE keeping up with Apple's ambitions for consumer AI features.

Worth paying attention to a part of the silicon that nobody else is paying attention to.