Omni

Reference

Native Tools

29 built-in tools available to the agent out of the box. No extensions needed.

Overview

Native tools are built into the Omni runtime and available to the agent immediately. The LLM can call these tools during conversations to interact with the filesystem, make web requests, manage memory, send messages, and more.

Every native tool is permission-gated. The required capability is listed next to each tool. If the agent hasn't been granted the capability, a permission prompt appears in the UI.

Total Tools

29

System

7 tools

Web

3 tools

Dev Tools

8 tools

Other

11 tools

System Tools

execprocess.spawn

Execute shell commands on the host system. Returns stdout, stderr, and exit code.

Params

command (string), args (string[], optional), cwd (string, optional)

read_filefilesystem.read

Read the contents of a file as text. Supports any text-based file format.

Params

path (string)

write_filefilesystem.write

Write content to a file. Creates the file if it doesn't exist, overwrites if it does.

Params

path (string), content (string)

edit_filefilesystem.write

Edit a file by replacing a specific string with new content. Fails if the old string is not found.

Params

path (string), old_string (string), new_string (string)

list_filesfilesystem.read

List files and directories at a given path. Returns names, types, and sizes.

Params

path (string), recursive (bool, optional)

apply_patchfilesystem.write

Apply a unified diff patch to one or more files. Supports standard patch format.

Params

patch (string)

grep_searchfilesystem.read

Search file contents using regex patterns. Returns matching lines with file paths and line numbers.

Params

pattern (string), path (string, optional), include (string, optional)

Web Tools

web_fetchnetwork.http

Fetch content from a URL via HTTP. Supports GET, POST, PUT, DELETE with custom headers and body.

Params

url (string), method (string, optional), headers (object, optional), body (string, optional)

web_searchsearch.web

Search the web and return results. Returns titles, URLs, and snippets.

Params

query (string), num_results (integer, optional)

web_scrapebrowser.scrape

Scrape web content with 3 modes: extract (fast HTML parsing), browser (Puppeteer with anti-bot stealth), or crawl (BFS multi-page). Converts HTML to Markdown.

Params

url (string), mode (string), selector (string, optional), max_pages (integer, optional), max_depth (integer, optional), url_pattern (string, optional)

Memory Tools

memory_savestorage.persistent

Save text to the agent's memory store. Persists across sessions for long-term recall.

Params

key (string), content (string), tags (string[], optional)

memory_searchstorage.persistent

Search saved memories by keyword or tag. Returns matching entries sorted by relevance.

Params

query (string), limit (integer, optional)

memory_getstorage.persistent

Retrieve a specific memory entry by its key.

Params

key (string)

Vision Tools

image_analyzeai.inference

Analyze an image using the LLM's vision capabilities. Describe, extract text, or answer questions about the image.

Params

image_path (string), prompt (string, optional)

Messaging Tools

send_messagemessaging.chat

Send a message through a connected channel. The channel instance and recipient are specified by the agent. Checks channel bindings before sending.

Params

channel_id (string), recipient (string), text (string), media_url (string, optional)

list_channelsmessaging.chat

List all connected channel instances with their status and features.

Params

None

Notifications & Scheduling Tools

notifysystem.notifications

Send a system notification to the user's desktop. Returns structured JSON for the UI to display.

Params

title (string), body (string)

cron_schedulesystem.scheduling

Schedule a recurring task using a cron expression. The task is stored and executed at the specified intervals.

Params

name (string), cron_expression (string), action (string)

Sessions Tools

session_liststorage.persistent

List all chat sessions with their IDs, creation time, and metadata. Requires database access.

Params

limit (integer, optional)

session_historystorage.persistent

Retrieve the full message history for a specific session. Requires database access.

Params

session_id (string), limit (integer, optional)

Desktop Automation Tools

app_interactapp.automation

Launch and control desktop applications via Windows UI Automation APIs. Supports 11 actions: launch, list_windows, find_element, find_elements, click, type_text, read_text, get_tree, get_subtree, screenshot, and close. Security-hardened with LOLBIN blocklist, password field protection, rate limiting, and audit logging.

Params

action (string), executable (string, optional), window_title (string, optional), process_name (string, optional), element_name (string, optional), element_type (string, optional), automation_id (string, optional), element_ref (string, optional), text (string, optional), max_depth (integer, optional), max_results (integer, optional), timeout_ms (integer, optional), args (string[], optional)

Version Control Tools

gitvcs.operations

Version control operations returning structured JSON. 10 actions: status, diff, log, commit, branch, checkout, stash, merge, show_conflict, resolve. Includes automatic secret scanning before commits and conflict marker parsing.

Params

action (string), repo_path (string, optional), message (string, for commit), files (string[], for commit), branch (string), name (string), create (bool), delete (bool), list (bool), staged (bool), file (string), content (string), count (integer), since (string), author (string), pop (bool)

Testing Tools

test_runnerprocess.spawn

Run tests with automatic framework detection and structured output. 3 actions: run (execute tests and parse results), list (discover available tests), coverage (run with coverage enabled). Auto-detects: cargo test (Rust), jest/vitest/mocha (JS/TS), pytest (Python), go test (Go), dotnet test (.NET).

Params

action (string), framework (string, optional — auto-detected), file (string, optional), pattern (string, optional), coverage (bool, optional), working_dir (string, optional)

Clipboard Tools

clipboardclipboard.read

Read from or write to the system clipboard. 2 actions: read (get current clipboard text) and write (set clipboard text). Maximum content size: 1 MB.

Params

action (string: read | write), content (string, required for write)

Code Intelligence Tools

code_searchfilesystem.read

Offline code intelligence using syntax-aware regex analysis. 4 actions: index (build symbol index for a project), search (query symbols by name with type/language filters), symbols (list all symbols in a file), dependencies (show imports/uses for a file). Supports 9 languages: Rust, TypeScript, JavaScript, Python, Go, C, C++, Java, C#. Works without a language server.

Params

action (string), root_path (string), languages (string[], optional), query (string), type (string, optional), language (string, optional), limit (integer, optional), file (string)

lspcode.intelligence

Language Server Protocol client for real-time code intelligence. 8 actions: start (launch a language server), stop, goto_definition, find_references, hover, diagnostics, symbols (document or workspace), rename_preview. Auto-detects servers: rust-analyzer, typescript-language-server, pyright, gopls.

Params

action (string), language (string), root_path (string), file (string), position ({ line, character }), query (string, for workspace symbols)

Agent Orchestration Tools

agent_spawnagent.spawn

Spawn a sub-agent to handle a task in parallel. The sub-agent gets its own conversation context and tool access (except agent_spawn, to prevent recursion). Set wait=true to block until the sub-agent completes, or wait=false to get a task ID for later retrieval.

Params

task (string), context_files (string[], optional), model (string, optional), max_iterations (integer, optional — default 15), wait (bool, optional — default true)

Debugging Tools

debuggerdebug.session

Debug Adapter Protocol (DAP) client for controlling debug sessions. 11 actions: launch (start debug session), attach (connect to running process by PID), set_breakpoints, continue, step_over, step_into, step_out, evaluate (evaluate expression in frame), variables (list variables in scope), stack_trace, disconnect.

Params

action (string), program (string), adapter (string, optional — auto-detected), file (string), breakpoints (array of { line }), expression (string), frame_id (integer), process_id (integer, for attach)

Interactive Execution Tools

replprocess.spawn

Persistent REPL sessions for interactive code execution. 4 actions: execute (run code in a session), list (show active sessions), reset (clear session state), close (terminate session). Supports Python and Node.js. Up to 3 concurrent sessions, 30-second execution timeout.

Params

action (string), language (string: python | javascript), code (string), session_id (string, optional — auto-generated)

Web Scrape Modes

The web_scrape tool supports three modes with increasing capability and resource usage.

extract

Fast HTML parsing using the scraper crate. No browser needed. Best for static pages with predictable HTML structure.

500 KB/page, 2 MB download

browser

Full Puppeteer browser with stealth plugins. Handles JavaScript rendering, anti-bot protection, and dynamic content. Uses Mozilla Readability + Turndown for content extraction.

500 KB/page, random viewport/delays

crawl

BFS multi-page crawl. Follows links matching a URL pattern up to a configurable depth. Combines content from all visited pages.

100 pages max, depth 5, 5 MB total

App Interact Actions

The app_interact tool supports 11 actions for full desktop application control. Windows only (uses native UI Automation APIs).

launch

Start a desktop application. Returns PID and window title.

Params

executable (required), args (optional)

Returns

{ pid, executable, window_title }

list_windows

List all visible top-level windows with title, process name, PID, and bounds.

Params

process_name (optional filter)

Returns

{ windows: [...], count }

find_element

Find a single UI element by name, type, or automation ID. Returns an opaque element_ref for use in subsequent actions.

Params

window_title, process_name, element_name, element_type, automation_id, timeout_ms (default 5000)

Returns

{ element_ref, name, control_type, automation_id, is_enabled, patterns }

find_elements

Find multiple matching elements. Returns up to max_results matches.

Params

Same as find_element + max_results (default 20, max 100)

Returns

{ elements: [...], count }

click

Click a UI element using semantic patterns (InvokePattern, TogglePattern, SelectionItemPattern). Never uses screen coordinates.

Params

element_ref (required)

Returns

{ status: "clicked" }

type_text

Type text into an input element. Uses ValuePattern with SendKeys fallback. Blocked on password fields.

Params

element_ref (required), text (required)

Returns

{ status: "typed" }

read_text

Read text from an element. Tries ValuePattern, TextPattern, then element name. Blocked on password fields.

Params

element_ref (required)

Returns

{ text: "..." }

get_tree

Get the UI element tree of a window. Includes truncation reporting when element cap (500) or depth limit is hit.

Params

window_title or process_name, max_depth (default 4, max 8)

Returns

{ root: { name, control_type, children: [...] }, total_elements, depth_reached, truncated }

get_subtree

Get a subtree starting from a specific element. Useful for exploring deeper when get_tree is truncated.

Params

element_ref (required), max_depth (default 4, max 8)

Returns

Same structure as get_tree

screenshot

Capture a window as PNG. Uses Windows GDI PrintWindow (works for occluded windows) with BitBlt fallback. Capped at 4K. Returns base64 image via multimodal pipeline.

Params

window_title or process_name

Returns

{ window_title, width, height, _image_data: [{ mime_type, data }] }

close

Close a window. Tries graceful close first, then force-kills by PID if that fails.

Params

window_title or process_name

Returns

{ status: "closed" | "force_closed" }

App Interact Security

Desktop app automation is a high-risk capability. The app_interact tool enforces 12 layers of defense-in-depth to prevent misuse.

01
Permission Gating

The entire tool is gated by the app.automation capability. Requires explicit user approval before any action.

02
LOLBIN Blocklist

43 dangerous Windows executables (cmd.exe, powershell.exe, rundll32.exe, certutil.exe, mshta.exe, etc.) are permanently blocked from being launched. Case-insensitive, checked against filename regardless of path.

03
Executable Allowlist

The app.automation scope can restrict which applications are launchable via allowed_apps. Only apps on the list can be opened.

04
Password Field Hard-Block

The Windows backend checks the IsPassword property before any read or write. Password fields cannot be typed into or read from.

05
Sensitive Name Guard

Regex patterns detect element names containing password, secret, token, api_key, credit_card, cvv, ssn, pin_code, 2fa, otp, and similar. These elements are blocked for click, type_text, and read_text.

06
Rate Limiting

60-second sliding window per app, default 60 actions/minute. Configurable via scope. Prevents rapid-fire automation.

07
Max Concurrent Processes

Default 3 simultaneously running managed processes. Configurable via scope. Prevents resource exhaustion.

08
Tree Depth + Element Cap

UI tree walks are capped at depth 8 and 500 elements to prevent LLM context overflow. Truncation is reported with actionable suggestions.

09
Value Redaction

Password field values are automatically replaced with "[REDACTED]" in tree output. Sensitive data never enters the LLM context.

10
Semantic Actions Only

Interactions use UI Automation patterns (InvokePattern, ValuePattern), never raw screen coordinates or simulated mouse events. No way to bypass UI structure.

11
Guardian Scanning

All text scraped from desktop apps passes through the existing 4-layer Guardian pipeline at scan point SP-5, preventing prompt injection via app content.

12
Audit Events

Every action (launch, click, type_text, screenshot, etc.) emits an AppAutomationAction audit event with action type, target app, target element, and success/failure status.

App Automation Scope

The app.automation capability accepts a scope with 4 configurable fields to restrict what the tool can do.

Field
Default
Description
allowed_apps
None (all non-blocked)
Whitelist of executable names or paths that can be launched. All others are rejected.
allowed_actions
None (all 11 actions)
Whitelist of action names that can be used. All others are rejected.
rate_limit
60
Maximum actions per minute per app. Sliding 60-second window.
max_concurrent
3
Maximum simultaneously running managed processes.

Element References

When you call find_element or find_elements, each result includes an opaque element_ref string. This reference is used in subsequent actions like click, type_text, read_text, and get_subtree.

Element references are re-resolved on each use by re-searching the window for the matching element. This means references remain valid even if the window is restructured between calls. If the element is no longer found, the tool returns a descriptive error.

Do not parse or construct element references manually. Always obtain them from find_element, find_elements, or get_tree results.

MCP Client (Model Context Protocol)

Omni includes a built-in MCP client that can connect to external MCP servers and expose their tools to the agent. MCP tools are automatically namespaced as mcp_<server>_<tool> and appear alongside native tools in the agent loop. All MCP tool output is scanned by Guardian at SP-6.

Stdio Transport

Communicates with MCP servers over stdin/stdout using JSON-RPC 2.0. No HTTP server needed — fully local, no network surface.

Auto-Connect

MCP servers listed in [mcp.servers] config with auto_start=true are launched automatically on startup.

Tool Discovery

On connection, Omni sends tools/list to discover available tools and their JSON schemas. Tools are registered dynamically.

Namespacing

Each MCP tool is prefixed with the server name (e.g., filesystem server's read tool becomes mcp_filesystem_read) to prevent collisions.

Permission Gating

MCP tool execution requires the mcp.server capability. Scoped by server name and allowed tools list.

Guardian Scanning

All MCP tool responses are scanned at SP-6 before being returned to the LLM, preventing prompt injection via external tool output.

Lifecycle Management

McpManager supports add, remove, restart, list, and shutdown operations. Servers are killed on drop if unresponsive.

Git Tool Actions

The git tool provides 10 structured version control actions. Prefer this over exec git ... for parsed, JSON-structured output.

status
diff
log
commit
branch
checkout
stash
merge
show_conflict
resolve

Secret scanning: The commit action automatically scans staged content for API keys, tokens, passwords, and other secrets before committing. If secrets are detected, the commit is blocked with a detailed warning.

Conflict resolution: The show_conflict action parses conflict markers into structured JSON (ours/theirs/ancestor sections). The resolve action writes the final resolved content.

Debugger Actions

The debugger tool implements the Debug Adapter Protocol (DAP) for controlling debug sessions across languages. It auto-detects debug adapters for Rust (codelldb), Python (debugpy), Node.js (node-debug), and Go (dlv-dap).

launch

Start a debug session for a program

attach

Attach to a running process by PID

set_breakpoints

Set breakpoints in a source file

continue

Resume execution until next breakpoint

step_over

Step over to the next line

step_into

Step into a function call

step_out

Step out of the current function

evaluate

Evaluate an expression in the current frame

variables

List variables in the current scope

stack_trace

Get the current call stack

disconnect

End the debug session

LSP Tool Actions

The lsp tool manages Language Server Protocol connections and exposes real-time code intelligence. Auto-detects servers: rust-analyzer (Rust), typescript-language-server (TS/JS), pyright (Python), gopls (Go).

start

Launch a language server for a project

stop

Shut down a running language server

goto_definition

Jump to the definition of a symbol

find_references

Find all references to a symbol

hover

Get type info and docs for a position

diagnostics

Get compiler errors and warnings

symbols

List symbols in a file or workspace

rename_preview

Preview renames across files

Next Steps

Native Tools — Built-in AI Agent Functions | Omni AI Agent Builder