v0.3.4 public beta · Windows · macOS · Linux

Grok, Claude Code, Codex.
One desktop app.

Pick your agent per tab and run them side by side — chat, build apps, browse the web, and create, all in one window. No terminal required.

Download for Windows → Download for macOS → github ↗

Free · open source · Windows, macOS & Linux

shellX — one desktop app for Grok Build, Claude Code, Codex and Antigravity CLI, shown running an in-app image-generation session with the projects rail and the agent-tools panel

At a glance Feature list

Everything it does,
in plain terms.

A factual rundown of what shellX v0.3.4 does today — no jargon, updated every release.

Pick any agent, then chat

Choose Grok Build, Claude Code, or Codex CLI per tab — and run several side by side, each its own agent with its own folder and history
Antigravity and other ACP-compatible agents plug in the same way
Decide per tab how much of shellX each agent can reach — its own native tools, a chosen set of shellX tools, the full host toolset, or none at all
Hand work from one agent to another with your approval — let one plan, another make an image, a third build

Talk, attach, and preview

Talk to it by voice and hear it answer back, hands-free
Drag files onto the chat, paste them in, or right-click any file in Windows to send it straight to shellX — they land as tidy attachment chips, never raw paths
Preview code, PDFs, images, videos and web pages right inside the app, with everything it makes collected in one Assets board

Build a whole app, then watch it run

Run the website or Expo app it just built in a built-in preview — no setup
It screenshots the running app and looks at the result itself to catch what's broken
One /build command keeps it working across many steps until the job is verified done
Ask Fix sends any preview error straight back to the agent

Browse the web — you or your agent

Open shellX's own browser for everyday browsing, or hand a tab to the agent to do the web work for you
Three profiles: your Personal browser, a persistent Agent Work profile, and throwaway Task Disposable tabs
The agent navigates, reads, clicks and extracts pages — and logs a reviewable receipt for everything it does
Bookmarks with folders, history, downloads, and a privacy / ad-block mode

Make images and video

Generate images and short video clips from a prompt, in-app
Powered by Grok Imagine through the same account you sign in with — no extra key
Everything it makes lands in a gallery you can browse and reuse

Give it the tools it needs

It reads, writes and searches your project files
It can take a screenshot and actually see your screen to check its work
Add more tools in one click from the built-in marketplace
Web search, web fetch and X search when your Grok account includes them
Review the session's Git — status, diffs, checkpoints and new worktrees — without leaving the app

Vault your secrets — and let the agent use them

One encrypted vault for API keys, passwords, profile cards, email accounts and agent wallets
The agent and the browser fill them by name or a scoped grant — the value never shows in chat, logs, or page source
Approve, deny or revoke every request from the Vault Request Center; sign-ins, purchases and account changes come to you first
Recovery kit, remembered-device unlock and a password generator built in — keep it local, or sync it connected

Run it anywhere, connect anything

Your Windows PC with no setup — or inside Linux (WSL), or a remote server over SSH
Message your agent from Telegram or Discord, with sender allowlists
Back up or hand off any session as a single zip file
Drive shellX from your own scripts with a secure, on-your-machine API

Flagship Browser & Vault

Its own browser.
Its own vault.

shellX now ships a real browser the agent can drive — and an encrypted vault it uses without ever seeing the secret. The agent signs in, fills a form, reads a one-time code, even pays from a scoped wallet — and every sensitive step is gated by you and logged as a receipt you can review.

See everything the Browser does →

shellX 0.3.0 Integrated Browser — user and agent web work in one desktop shell, showing Personal / Agent Work / Disposable profiles, vault fill and secret capture, bookmarks, downloads and history, and observe / click / extract automation

The ShellX Browser open on a page, with the agent-automation side panel — autonomy control, Vault-ready badge, and Chat / Requests / Actions / Errors tabs

The browser

A browser you
share with the agent.

Browse normally in your own Personal profile, or hand a tab to the agent and let it do the web work — research, sign-ins, form fills, checkouts. It's a native webview, not a remote-controlled Chrome, so pages load fast and the agent reads the real DOM.

Three profiles: your Personal browser, a persistent Agent Work profile, and throwaway Task Disposable tabs
The agent navigates, observes, clicks and extracts pages — deterministic controls, clean Markdown out
Bookmarks with folders and a toolbar, history, downloads, and a privacy / ad-block mode
Every agent action is a task receipt you can open, review, and replay

The vault

It uses your secrets.
It never sees them.

ShellX Vault is the encrypted home for your API keys, passwords, profile cards, email accounts and agent wallets. The agent and the browser fill them by name or a scoped grant — the value never appears in chat, in logs, or in the page source. You stay in control of every use.

Approve, deny, review or revoke every request from the Vault Request Center — secrets, profile fills, email codes, wallet spend, write-only deposits
Profile cards autofill name, address and card; agent wallets cap what a checkout can spend
Recovery kit, remembered-device unlock, and a built-in password generator
Keep it entirely local, or run it in connected mode — your call

shellX running a web app it built, in the Work Preview panel, with logs and a live status pill

01 Live preview

It builds the app.
Then it looks at it.

Ask shellX to build a website or an Expo app and it runs it for you — a local preview server, right inside the window. Then it does what a chat box never can: it takes a screenshot of the running app and inspects it with its own eyes. A blank screen, a button in the wrong place, a console error — the agent sees it and fixes it before it ever tells you it's done.

Static sites, web apps, and Expo web — one click to run
Preview Doctor checks HTTP status, server logs, and a real first-page screenshot
Loopback only — the preview never leaves your machine
Hit Ask Fix and the failure goes straight back to the agent

shellX /build cockpit — a receipt log of completed gates next to the live build scratchboard

02 /build

Hand it the goal.
Walk away.

Type /build "make this production-ready" and shellX runs a long-horizon build: it writes a plan, works through it, and refuses to claim it's finished until the work is verified. A reviewer subagent checks the code, a verifier runs the gates, and anything with a UI has to come back clean from the live preview. Every step is logged as a receipt — plan writes, file changes, checkpoints, completions — so you can read exactly what happened.

The plan is a plain file, build.md — edit it, share it, version it
Approval gate up front; /pause and /resume any time
Type a note while it's running — it folds into the build at the next safe step, no restart
Won't mark itself complete while a check fails or a blocker is open
Close the laptop mid-build, reopen tomorrow — it picks up from the plan

shellX showing a generated image in its media view

03 Create

Borrow one agent's
superpower in another.

Image and video generation are built into every session — so you can be deep in a Claude Code or Codex build, ask for a picture, and get it without switching tabs. shellX routes the request to Grok Imagine on your Grok account, or to GPT Image through Codex, drops the result straight into your conversation, and the agent uses it right away. Grok Imagine also makes short clips with native synced audio — all collected into their own gallery.

Call Grok Imagine or GPT Image from inside any agent's session — no tab-switching, you just ask
Grok Imagine rides your Grok subscription; Codex sessions can use GPT Image — nothing extra to wire up
Cinematic stills plus short video with native synced audio, callable as first-class host tools
Generated media resolves across local, WSL and SSH, and carries between agents

04 Trace

See everything
the agent did.

Open Trace on any session and shellX draws its activity as a live graph — every file it read, searched, wrote or deleted, every git move, every subagent it spawned, every build receipt, with the media it made linked right in place. Resize the evidence panels for a records-heavy run, and search the whole trace by path, command, query, tool, source, or timestamp. Running several agents at once? Trace rolls their activity into one report you can watch from outside the window.

shellX Trace — the Session Activity Browser, showing a session's file, search, git, subagent and build activity as a connected node graph above resizable evidence panels

05 What the agent can reach

It can see your screen,
write your files,
talk to your processes.

Eighty-six built-in tools the agent can call directly — the same set whether it's running on your PC, in WSL, or over SSH. Voice, vision, files, processes, secrets, the browser, and more. Toggle them per session.

surface · voice in / voice out

Talk to your agent. Hear it back.

Real microphone in, real spoken answers out — not a dictation gimmick. Push-to-talk for one-shot prompts; voice-chat mode re-arms the mic after every reply so you have an actual conversation while your hands stay on the keyboard. Uses your Grok account for speech, and now reads back Claude and Codex replies too — nothing extra to set up.

→ push-to-talk · voice chat mode · per-tab

tool · vision_describe

Give the agent eyes.

shellX captures your screen, sends it to Grok's multimodal model, and returns a description the agent can act on. Verify a deploy. Catch the dialog blocking your terminal. Audit a UI you just shipped — the same eyes that check its own live previews.

→ full desktop, active window, or a named window

FS fs_read / fs_write / fs_grep

Your files, in full.

Read any file. Write any file. Fast search across whole project trees. Scoped to a working folder, and every call is auditable.

PS process_list / process_signal

Processes are first-class.

List, inspect, signal, and read output from any process the agent can see. A build hung? It finds it and stops it without you ever leaving the chat.

★ screenshot / preview_diagnose

See, then act.

Grab the desktop, a window, or the live app preview, then pipe it into vision_describe for a closed loop: see → reason → fix. This is the engine behind the live preview check.

$ vault_get / vault_list

Secrets the agent can use.

An encrypted vault backed by your OS keyring. The agent calls a secret by name — the value never appears in chat, logs, or transcripts.

⇄ net_fetch / net_post

HTTP, with an audit trail.

Every request is logged with response codes and byte counts. Allowlist hosts per session. The agent stays on-rails.

⊞ subagents · implementer / reviewer / verifier

Parallelize without context bleed.

Dispatch isolated subagents from the running session. Each gets a fresh context window and a role-baked prompt, and reports back to the parent. The foundation under /build.

⊕ MCP marketplace

Add tools in one click.

Discover and install MCP servers from the public registry with a one-click UI, project-scoped or global. Any MCP server you already use elsewhere works here unchanged.

≣ sessions · full-content search

Every chat, searchable.

Conversations are saved and searchable across history. Resume a week-old session by name. Export any session as a single zip for a teammate or for CI replay.

06 Runs anywhere

Local, WSL, or a server.
The same window.

shellX can run the agent on your Windows machine, inside a WSL distribution, or on a remote server over SSH — and carries its full toolset along wherever it goes. And when you do want a terminal, it's a real one: a true PTY, run vim or htop, no fake echo.

01 / Local Native

Local Windows

Real PTY via ConPTY. Run vim, htop, anything interactive. No fake terminal, no command echo simulator.

→ Settings · Connections · Local
→ binding ConPTY 200x60
→ host tools on 127.0.0.1
  ready  · grok-4.3
you> read C:\src\index.ts
  ok  · 142 lines

02 / WSL Tunneled

WSL Linux

Spawn the agent inside any installed distribution. The host toolset streams in — same tools, Linux paths.

→ Settings · Connections · WSL · ubuntu-24.04
→ entering distro
→ tools tunneled: Windows → WSL
  ready  · grok-4.3
you> grep -r 'TODO' ~/repo
  ok  · 7 matches

03 / SSH Remote

SSH

Any Linux box with SSH. Grok runs there; tools are tunneled back. No agent on your laptop, no inbound ports on the server.

→ Settings · Connections · SSH · prod-01
→ ssh forward: localhost ←→ remote
→ pty: 200x60 (xterm-256color)
  ready  · grok-4.3
you@prod-01> tail -f /var/log/app.log
  ok  · streaming

07 shellXagent — drive it from code

Every action,
also an API.

Everything you can click in shellX, another program can drive over HTTP — bearer-token gated, loopback-bound, origin-checked. Spawn a session, send a prompt, run a build, capture a screenshot, archive the workspace as a zip — without a human ever touching the window. The foundation for CI hooks and headless agent fleets.

shellxagent.orchestrate.curl

01# spawn a session against the WSL transport

02curl -X POST http://127.0.0.1:5757/connect \

03 -H "Authorization: Bearer $TOKEN" \

04 -d '{"tabId":"ci-build","cwd":"/home/me/app","connectionId":"wsl-1"}'

06# start a /build run and stream its receipts

07curl -X POST .../build/start -d '{"objective":"ship it","tabId":"ci-build"}'

08websocat ws://127.0.0.1:5757/events | jq '.kind'

10# diagnose the live app preview

11curl .../preview/work/state | jq '.status'

13# archive the entire session workspace as a zip

14curl -X POST .../tabs/ci-build/archive > bundle.zip

15status: ok · 14 events · 312 ms

surface · 110+ endpoints

Every UI action, scriptable.

/connect · /prompt · /build/* · /preview/work/* · /browser/* · /autonomy · /screenshot · /state/{header,sessions,subagents} · /permissions/:reqId/respond · /diagnostics · /tabs/:id/archive · vault · plugins · sessions/history. If a human can click it, an agent can drive it from outside.

WS /events (WebSocket)

Real-time event stream.

Subscribe with a single WebSocket. Every agent frame, every tool call, every build receipt, every permission request, every prompt completion lands typed and tagged with tabId.

⏎ async permission gate

Approve from outside the window.

When the agent asks for permission and autonomy is "Confirm", the request lands on the event stream with a reqId. POST a decision to /permissions/:reqId/respond — your orchestrator is the user.

⤓ /tabs/:tabId/archive

Reproducible session bundles.

One POST captures the working tree, the session scratch dir, every emitted event, and the active plan as a single zip. Drop the bundle in CI, replay deterministically.

⚿ bearer + origin gate

Loopback by default.

shellXagent binds 127.0.0.1, requires a per-install bearer token minted from the OS random source, and enforces an origin allow-list server-side. Your machine, no inbound ports.

≡ POST /diagnostics

One call, every check.

Self-test the running install — files, tools, screenshot, vault, sessions, connections, settings, auth, and preview setup. Returns a structured pass/fail report your CI can gate on.

full surface @ docs/API.md

08 What shellX is not

The discipline
of refusal.

Every design choice has a corresponding rejection. These are ours.

/ 01
Another web wrapper Not a re-skinned chat tab in an Electron shell. Native Tauri 2 — real windows, real file access, real process control. It behaves like an app because it is one.
/ 02
Chat-only client No emoji-padded text-only loop. The agent has hands — files, processes, vision, voice, image and video gen — and it uses them.
/ 03
Slop No mock terminals. No gradient glassmorphism that costs 16MB of GPU. No animated mascot. No upsell modals. No telemetry beacons.
/ 04
A walled garden Open source. Bring your own agents and accounts — Grok, Claude Code, Codex, Antigravity — your own API keys, your own SSH keys. shellX is the workspace; the agents and the clouds are yours.
/ 05
For terminal people only You never have to open a shell. Preview, voice, vision, image and video gen, autonomous builds — all point-and-click. The raw terminal and scriptable API are there underneath for when you want them.

09 Technical specifications

Small surface.
Heavy lift.

Windows

available now

signed installer · WebView2 · 64-bit

macOS

available

signed · notarized · Apple silicon · .app bundle

Linux

available

deb · rpm · AppImage on each release · less deep-tested than Windows

Framework

Tauri 2

Rust core · React + TS · no Electron

Installer

~14 MB

NSIS · signed · auto-updating

Footprint

< 90 MB

RAM idle · single window · single process tree

Ready

Install once.
Build anything.

Free. Open source. No account required to install. Bring an agent — sign in to Grok, Claude Code, Codex, or Antigravity to start talking; new Grok accounts receive free credits.

Download for Windows → Download for macOS →

signed · auto-updating · Windows 11 · macOS (Apple silicon) · v0.3.4 public beta

Linux: .deb · .rpm · .AppImage · view release

Grok, Claude Code, Codex. One desktop app.