Microsoft's New AI Models from Build 2026: Aion and MAI, and How to Actually Use Them

Microsoft Build 2026 had one clear theme: Windows wants to be an agent operating system. Underneath
the keynote slides, though, the part that actually matters to those of us who build things is a
handful of new models. And the most useful question about them isn't "how do they benchmark" — it's
"where do they run, and how do I call them from my machine?" Because the answer is different for
each one, and it depends a lot on what hardware you're sitting in front of.

Let me cut through it. There are two families here, and they're philosophically opposite.

The two families

Aion 1.0 is Microsoft's new family of small, on-device language models — they run locally,
inside Windows, on the NPU of a capable machine. No cloud, no per-token bill.

Aion 1.0 Instruct — a small SLM for everyday text intelligence: summarize, rewrite, classify,
detect intent. The kind of thing you want to fire off constantly without a network round-trip.
Aion 1.0 Plan — a 14-billion-parameter reasoning model, 32K context, with tool-calling.
This is the "local brain" meant to reason over what you're trying to do and orchestrate sub-agents
and file operations on-device.

MAI is Microsoft's own cloud model family (MAI = Microsoft AI). These do not run on your
NPU — they live in Microsoft/Azure AI Foundry and you reach them over HTTPS:

MAI-Image-2.5 (plus a cheaper Flash variant) — image generation/editing.
MAI-Transcribe-1.5 — speech-to-text across 43 languages, pitched as best-in-class accuracy.

So before you get excited about running any of this "locally," sort it into the right bucket: Aion =
local on Windows hardware; MAI = cloud, callable from anywhere. That distinction drives everything
below.

How to use them on a Mac

I do most of my work on Apple Silicon, so this is the honest starting point: the Aion models do not
run on a Mac today. They ship in-box on Windows and are exposed through the Windows Copilot
Runtime / Windows AI Foundry APIs. There is no macOS path right now.

There is one opening worth marking on your calendar, though: Microsoft said open weights for Aion
1.0 Instruct land on Hugging Face in July 2026. Once that drops, you'll be able to download the
weights and run the Instruct model locally on a Mac via MLX or llama.cpp, the same way you'd
run any other open SLM. Aion Plan (the 14B reasoner) has not been promised as open weights, so don't
count on it.

The MAI models, on the other hand, are usable from a Mac today, because they're cloud APIs:

MAI-Image-2.5 / Flash lives in the Foundry Model Catalog. Flash pricing is roughly
$1.75 per 1M tokens for text+image input and $33 per 1M tokens for image output.
MAI-Transcribe-1.5 is served through Azure Speech / the LLM Speech API, at roughly
$0.36 per hour of audio.

To use either from macOS:

Create an Azure account and a Microsoft Foundry (Azure AI) resource. For Transcribe
specifically, create a Speech resource and note its key + region.
Grab the endpoint and key from the Azure portal.
Call it over REST or with the azure-ai SDKs — they work fine on macOS, Python or Node.

That's the whole story for a Mac: wait for July for local Aion-Instruct; use MAI in the cloud right
now.

How to use them on a Copilot+ PC

This is where Aion was actually meant to live. A Surface Copilot+ PC (or any Copilot+ machine —
Snapdragon X, or a recent qualifying Intel/AMD with a 40+ TOPS NPU) is the target device, and the
experience flips from "not available" to "first-class."

On a Copilot+ PC the Aion models run locally, on the NPU, for free, via the Windows AI stack. Here's
the practical path:

Enroll the machine in the Windows Insider channel (Dev/Canary). As of Build, the Aion models
and the new Settings → AI Model Management page are rolling out through Insider builds and 26H1,
not the stable channel yet.
Aion 1.0 Instruct (in preview now) — the fastest way to touch it is to install Edge Insider
(Canary or Dev, v150.0.4070+). The model auto-downloads on first use and you call it straight from
JavaScript via Edge's built-in Prompt/AI APIs — zero native setup, great for experiments. For a
real app, call it through the Windows AI APIs / Windows AI Foundry, which route summarize/rewrite
/text-intelligence to the in-box model on the NPU.
Aion 1.0 Plan (the 14B reasoner) — "ships in-box on capable devices in the coming months." Once
it lands in your build it appears as an installed model, and you call it through the same Windows AI
Foundry APIs for agentic, tool-calling workloads.
Use the new AI Model Management page to see exactly what's installed — size, purpose, data usage,
install date — and (with limited support) uninstall.

Worth underlining: you don't download Aion weights on a Copilot+ PC. Windows provisions them to the
NPU and hands you an API. That's the whole point of the "AI as a platform feature" pitch.

The MAI models stay cloud-based even here — your NPU doesn't run them — but you'll experience them
through Copilot, Paint/Designer (MAI-Image), and Windows voice typing/transcription (MAI-Transcribe),
all of which call Microsoft's cloud behind the scenes. To build against them yourself, it's the same
Azure Foundry path as on the Mac.

The cheat sheet

Model	Mac (Apple Silicon)	Copilot+ PC	How
Aion 1.0 Instruct	Not yet → open weights ~July 2026 (MLX/llama.cpp)	✅ Local, on-NPU	Edge Insider AI APIs or Windows AI Foundry
Aion 1.0 Plan (14B)	No	✅ Local, soon	Windows AI Foundry (in-box, "coming months")
MAI-Image-2.5 / Flash	✅ Cloud	✅ Cloud	Azure AI Foundry model catalog
MAI-Transcribe-1.5	✅ Cloud	✅ Cloud	Azure Speech / LLM Speech API

What I'd actually do

If you want offline, private, free inference: the Aion story is a "wait for July, then MLX it"
story on a Mac, and an "enroll in Insider builds" story on a Copilot+ PC. Until then, an open model you
can already run locally (Qwen, Llama, the usual suspects) gets you most of the way for summarize/rewrite
without waiting on anyone.

If you just want good results now and don't mind a cloud call: the MAI models are ready, cross-
platform, and cheap enough to experiment with — spin up an Azure Foundry resource and you're calling
MAI-Transcribe or MAI-Image from your Mac in an afternoon.

The headline from Build 2026 was "Windows is an agent OS." The quieter, more useful takeaway is that
Microsoft is now shipping models in two distinct delivery models — baked into the OS on capable
hardware, and rented from the cloud for everyone else — and knowing which is which tells you exactly
what you can run, where, and for how much.