The AI that never
phones home.
Halfmoon is full-featured AI chat that runs entirely in your browser. Open-source models on your own device — no cloud, no account, nothing to leak.
Why Halfmoon
Privacy you don't have to take on faith.
Private by physics
Not by promise. The model runs on your GPU, inside your browser. Your words never cross the network, so there's nothing to intercept, store, or subpoena.
Offline is a feature
Add it to your home screen like an app. Once a model is downloaded, Halfmoon works on a plane, in a tunnel, off the grid.
Bring any mind
Llama, Qwen, Gemma, Phi, Mistral — 160+ open models from phone-sized to desktop-class, with full control over system prompt and sampling.
How it works
Three steps. Zero accounts.
Pick a model
Small ones fly on phones; big ones think harder on desktops. Swap anytime.
Wait once
The model downloads to your browser's cache — a first-time-only wait.
Chat forever
Streaming replies, code, markdown, history. All local, all free.
Questions
The fine print, unfined.
Is it really free?
Yes. Your device does the computing, so there's nothing for us to meter. No subscription, no token limits, no ads.
How is this private, exactly?
Halfmoon uses WebGPU to run the language model inside your browser tab. Prompts and responses are processed on your hardware and stored only in your browser. There is no backend — the page is just files.
What do I need?
A browser with WebGPU — Chrome or Edge 113+, Safari 26+ (including iPhones on iOS 26+), or Firefox 141+ — and 1–2 GB of free space for a model. Recent phones handle the small models comfortably.
Do I have to install anything?
No. It's a web page. But if you tap “Add to Home Screen,” it behaves like a native app — full-screen, with an icon, working offline.
Chat like nobody's watching.
Because nobody is.