Documentation
Everything you need to automate iOS and Android devices with MobAI. HTTP API reference, DSL scripting, web automation, and performance profiling.
Overview
MobAI exposes a local HTTP API on localhost:8686
that lets you control connected Android and iOS devices. Use it directly, through the MCP server, or via DSL batch scripts.
RESTful endpoints for device control. Tap, swipe, type, screenshot, and more.
JSON-based automation scripts with predicates, retries, conditionals, and assertions.
Control Safari, Chrome, and WebViews with CSS selectors and JavaScript execution.
Quick Start
Start the MobAI app, connect a device, then hit the API. The bridge auto-starts on first request.
# List connected devices
GET http://127.0.0.1:8686/api/v1/devices
# Take a screenshot
GET http://127.0.0.1:8686/api/v1/devices/{id}/screenshot
# Tap at coordinates
POST http://127.0.0.1:8686/api/v1/devices/{id}/tap
{"x": 150, "y": 300}
# Run a DSL script
POST http://127.0.0.1:8686/api/v1/devices/{id}/dsl/execute
{"version": "0.2", "steps": [{"action": "open_app", "bundle_id": "com.apple.Preferences"}]}
Base URL: http://127.0.0.1:8686/api/v1
HTTP API Reference
REST Endpoints
Devices
/devices
List all connected devices
Returns an array of connected Android and iOS devices with their status.
Response
[{
"id": "00008101-001A0C3E1234",
"name": "iPhone 15 Pro",
"platform": "ios",
"model": "iPhone15,3",
"osVersion": "18.2",
"bridgeRunning": true,
"virtual": false
}]
/devices/{id}
Get device info
Returns information about a specific device. The id is the serial number (Android) or UDID (iOS).
Screenshots & UI Tree
/devices/{id}/screenshot
Take screenshot
Captures a screenshot. Default mode saves to file and returns the path. Use ?format=base64 for inline data.
Response
{"path": "/tmp/mobai/screenshots/device-1234.png", "format": "png"}
/devices/{id}/ui-tree
Get UI accessibility tree
Returns the current UI accessibility tree with element indices, types, text, and bounds.
| Parameter | Type | Description |
|---|---|---|
| onlyVisible | bool | Filter to visible elements only (default: true) |
| includeKeyboard | bool | Include keyboard elements (default: false) |
| verbose | bool | Include elements array with bounds (default: false) |
Response
{
"tree": "[0] Button \"Settings\" (10,100 200x50)\n[1] Label \"Wi-Fi\" ...",
"activity": "com.apple.Preferences"
}
/devices/{id}/ocr
OCR text recognition (iOS only)
Uses Apple Vision framework to detect text on screen. Useful for system dialogs that don't expose accessibility elements.
{
"elements": [{"text": "Sign In", "confidence": 0.98, "x": 150, "y": 400, "width": 100, "height": 30}],
"imageSize": {"width": 1170, "height": 2532},
"screenSize": {"width": 390, "height": 844},
"scale": 3.0
}
Actions
/devices/{id}/tap
Tap on screen
Tap at coordinates or on an element by index. Provide either index or x+y.
// By coordinates
{"x": 150, "y": 300}
// By UI tree element index
{"index": 5}
/devices/{id}/double-tap
Double tap
Double tap at coordinates or on an element by index. Same request format as /tap.
{"index": 5}
// or
{"x": 150, "y": 300}
/devices/{id}/long-press
Long press
Long press at coordinates or on an element by index. Uses a fixed 0.5s hold duration.
{"index": 5}
/devices/{id}/two-finger-tap
Two-finger tap (iOS)
Two-finger tap at coordinates or element index. iOS only.
{"x": 150, "y": 300}
/devices/{id}/swipe
Swipe gesture
{"fromX": 200, "fromY": 500, "toX": 200, "toY": 100, "duration": 300}
/devices/{id}/drag
Drag gesture
Drag from one point to another. Optional duration in ms (default: 500). Optional pressDuration in ms to hold before dragging (for moving icons).
{"fromX": 100, "fromY": 200, "toX": 300, "toY": 400, "duration": 500, "pressDuration": 0}
/devices/{id}/type
Type text
Types text into the currently focused input field.
{"text": "Hello World"}
/devices/{id}/dismiss-keyboard
Dismiss keyboard
/devices/{id}/go-home
Go to home screen
Navigates to the device home screen. No request body required.
/devices/{id}/open-url
Open URL in browser
{"url": "https://example.com"}
/devices/{id}/location
Set simulated GPS location
{"lat": 40.7128, "lon": -74.0060}
Note: On Android physical devices, location simulation requires Android 12+. Emulators and iOS devices have no version restriction.
/devices/{id}/location
Reset to real GPS location
Apps
/devices/{id}/apps
List installed apps
[{"bundleId": "com.apple.Preferences", "name": "Settings"}]
/devices/{id}/launch-app
Launch application
{"bundleId": "com.apple.mobilesafari"}
// With debugger (iOS only, enables JIT for Flutter)
{"bundleId": "com.example.app", "withDebugger": true}
/devices/{id}/kill-app
Force-kill application
{"bundleId": "com.apple.mobilesafari"}
/devices/{id}/install-app
Install APK/IPA
Installs an app from a local file path. For iOS, optionally resign with Apple ID.
{"path": "/path/to/app.apk"}
// iOS with re-signing
{"path": "/path/to/app.ipa", "resign": true, "appleId": "[email protected]"}
/devices/{id}/apps/{bundleId}
Uninstall application
Bridge Management
The bridge is the on-device automation agent (accessibility service on Android, WebDriverAgent on iOS). It auto-starts on first API call, but you can manage it explicitly.
/devices/{id}/bridge/start
Start device bridge
/devices/{id}/bridge/stop
Stop device bridge
AI Agent
/devices/{id}/agent/run
Run AI agent task
Runs an AI agent that autonomously executes a task on the device. Synchronous — returns when the agent completes. Requires bridge running and LLM provider configured in app settings.
{
"task": "Open Settings and enable Wi-Fi",
"agentType": "toolagent", // optional: toolagent | hierarchical | classic
"useVision": true // optional: enable screenshot analysis
}
Response
{"success": true, "result": "Wi-Fi has been enabled successfully"}
Web Automation
Control browser content and WebViews. iOS uses WebInspector (physical devices only). Android uses Chrome DevTools Protocol.
/devices/{id}/web/pages
List browser tabs / WebViews
/devices/{id}/web/select
Select a web page
{"pageId": 0}
/devices/{id}/web/dom
Get DOM tree
/devices/{id}/web/navigate
Navigate to URL
{"url": "https://example.com"}
/devices/{id}/web/click
Click element by CSS selector
{"selector": "button.submit"}
/devices/{id}/web/type
Type into element
{"selector": "input#email", "text": "[email protected]"}
/devices/{id}/web/execute
Execute JavaScript
{"script": "return document.title"}
/devices/{id}/web
Disconnect web session
Port Forwarding
/devices/{id}/forward
Start port forwarding
{"devicePort": 8080, "hostPort": 0} // 0 = auto-assign
Response
{"id": "fwd-abc123", "hostPort": 49152, "devicePort": 8080}
/devices/{id}/forward
List port forwards
/devices/{id}/forward/{forwardId}
Stop port forwarding
Log Streaming
/devices/{id}/logs
Stream device logs (SSE)
Streams device logs in real-time using Server-Sent Events.
| Parameter | Type | Description |
|---|---|---|
| process | string | Filter by process name |
| level | string | Min level (iOS: Debug/Info/Error, Android: V/D/I/W/E/F) |
| tag | string | Filter by tag (Android only) |
| contains | string | Filter messages containing string |
Performance Metrics
/devices/{id}/metrics/start
Start metrics collection
/devices/{id}/metrics/stop
Stop and get summary
/devices/{id}/metrics/stream
Stream metrics (SSE)
/devices/{id}/metrics/snapshot
Get current snapshot
/devices/{id}/metrics/summary
Get session summary
/devices/{id}/dsl/execute
Execute DSL script
Executes a DSL automation script on the device. See the DSL section below for the full script format.
Automation DSL
DSL Script Reference
Script Format
DSL scripts are JSON objects sent to POST /devices/{id}/dsl/execute. Each script contains a version, an array of steps, and an optional failure strategy.
{
"version": "0.2",
"steps": [
{"action": "open_app", "bundle_id": "com.apple.Preferences"},
{"action": "tap", "predicate": {"text_contains": "Wi-Fi"}},
{"action": "wait_for", "predicate": {"type": "switch"}, "timeout_ms": 3000}
],
"on_fail": {"strategy": "abort"}
}
Native Actions
| Action | Description |
|---|---|
| open_app | Launch app by bundle ID |
| tap | Tap element by predicate or coordinates |
| long_press | Long press (0.5s hold) |
| type | Type text, optionally clear first |
| swipe | Swipe by direction or coordinates |
| scroll | Scroll to find an element |
| press_key | Press hardware/keyboard key |
| toggle | Toggle a switch on or off |
| navigate | Go home, back, or recent apps |
| observe | Get UI tree, screenshot, activity, or OCR |
| wait_for | Wait for element or UI stability |
| delay | Fixed time delay |
| if_exists | Conditional execution |
| screenshot | Save screenshot to file |
| kill_app | Force-kill a running app |
| set_location | Simulate GPS coordinates (Android physical: 12+ only) |
| reset_location | Reset to real GPS |
tap
// By predicate
{"action": "tap", "predicate": {"text_contains": "Settings"}}
// By coordinates
{"action": "tap", "coords": {"x": 100, "y": 200}}
type
{"action": "type", "text": "Hello World"}
// With clear and predicate
{"action": "type", "text": "Hello", "predicate": {"type": "input"}, "clear_first": true}
swipe
// By direction
{"action": "swipe", "direction": "up", "distance": "medium"}
// By coordinates
{"action": "swipe", "from_coords": {"x": 200, "y": 400}, "to_coords": {"x": 200, "y": 100}}
scroll
{"action": "scroll", "direction": "down", "to_element": {"predicate": {"text": "Privacy"}}, "max_scrolls": 10}
wait_for
// Wait for element
{"action": "wait_for", "predicate": {"text": "Welcome"}, "timeout_ms": 5000}
// Wait for UI stability (animations to finish)
{"action": "wait_for", "stable": true, "timeout_ms": 5000}
if_exists
{
"action": "if_exists",
"predicate": {"text": "Allow"},
"then": [{"action": "tap", "predicate": {"text": "Allow"}}],
"else": []
}
toggle
{"action": "toggle", "predicate": {"type": "switch", "text_contains": "Wi-Fi"}, "state": "on"}
observe
{"action": "observe", "context": "native", "include": ["ui_tree", "screenshot"]}
// OCR for system dialogs (iOS only)
{"action": "observe", "include": ["ocr"]}
press_key
Android: enter, tab, delete, escape, volume_up, volume_down, home, back, recent_apps, mute, power
iOS: enter, tab, delete, home, volume_up, volume_down
{"action": "press_key", "key": "enter"}
Web Actions
Web actions use "context": "web" and CSS selectors. Select a web context first.
// Select browser tab
{"action": "select_web_context", "url_contains": "google.com"}
// Click DOM element
{"action": "tap", "context": "web", "predicate": {"css_selector": "button.submit"}}
// Type into form field
{"action": "type", "context": "web", "predicate": {"css_selector": "input#email"}, "text": "[email protected]"}
// Navigate to URL
{"action": "navigate", "url": "https://example.com"}
// Execute JavaScript
{"action": "execute_js", "script": "return document.title"}
// Wait for DOM element
{"action": "wait_for", "context": "web", "predicate": {"css_selector": "div.loaded"}, "timeout_ms": 5000}
Requirements
iOS: Physical device only (WebInspector uses usbmuxd). Android: Physical device or emulator with Chrome remote debugging.
Assertions
// Assert element exists
{"action": "assert_exists", "predicate": {"text": "Success"}, "timeout_ms": 3000}
// Assert element doesn't exist
{"action": "assert_not_exists", "predicate": {"text": "Error"}}
// Assert element count
{"action": "assert_count", "predicate": {"type": "cell"}, "count": 5}
// Assert screen changed after action
{"action": "assert_screen_changed", "threshold_percent": 15}
Predicates
Predicates match UI elements for actions. Combine multiple fields for precise targeting.
Native Predicates
| Field | Description |
|---|---|
| text | Exact text match |
| text_contains | Contains substring (case-insensitive) |
| text_starts_with | Starts with prefix |
| text_regex | Regular expression pattern |
| type | Element type: button, input, switch, text, image, cell, scrollview |
| label | Accessibility label (exact match) |
| label_contains | Accessibility label (partial match) |
| enabled | Enabled state (bool) |
| visible | Visible state (bool) |
| selected | Selected state (bool) |
| near | Spatial relation to another element |
| bounds_hint | Screen region: top_half, bottom_half, left_half, right_half, center |
| parent_of | Find parent by child predicate |
| index | Nth match (0-based) |
Web Predicates
| Selector | Description |
|---|---|
| #login-btn | Element with id="login-btn" |
| .btn-primary | Elements with class "btn-primary" |
| button.submit | Button with class "submit" |
| input[type='email'] | Input with type="email" |
| a[href*='login'] | Links containing "login" in href |
Failure Strategies
Each step can have an on_fail handler to control what happens when it fails.
| Strategy | Description |
|---|---|
| abort | Stop execution immediately (default) |
| skip | Skip this step and continue |
| retry | Retry with max_retries and retry_delay_ms |
| replan | Return for replanning |
| require_user | Request user intervention |
{
"action": "tap",
"predicate": {"text": "Continue"},
"on_fail": {
"strategy": "retry",
"max_retries": 3,
"retry_delay_ms": 1000,
"fallback_strategy": {"strategy": "skip"}
}
}
Performance Metrics
Collect CPU, memory, FPS, network, and battery metrics during automation runs.
| Metric Type | Description |
|---|---|
| system_cpu | System-wide CPU usage (%) |
| system_memory | System memory usage (bytes, %) |
| fps | Frame rate and jank detection |
| network | Network I/O (rx/tx bytes) |
| battery | Battery level, drain rate, temperature |
| process | Per-app CPU/memory (requires bundle_id) |
{
"version": "0.2",
"steps": [
{"action": "metrics_start", "types": ["system_cpu", "system_memory", "fps"], "label": "app_launch"},
{"action": "open_app", "bundle_id": "com.example.myapp"},
{"action": "wait_for", "predicate": {"text": "Home"}, "timeout_ms": 10000},
{"action": "delay", "duration_ms": 2000},
{"action": "metrics_stop"}
]
}
Complete Examples
Open Settings and Toggle Wi-Fi
{
"version": "0.2",
"steps": [
{"action": "open_app", "bundle_id": "com.apple.Preferences"},
{"action": "delay", "duration_ms": 1000},
{"action": "tap", "predicate": {"text_contains": "Wi-Fi"}},
{"action": "wait_for", "predicate": {"type": "switch"}, "timeout_ms": 3000},
{"action": "toggle", "predicate": {"type": "switch"}, "state": "off"}
],
"on_fail": {"strategy": "abort"}
}
Web: Google Search via Safari
{
"version": "0.2",
"steps": [
{"action": "open_app", "bundle_id": "com.apple.mobilesafari"},
{"action": "delay", "duration_ms": 1000},
{"action": "tap", "predicate": {"type": "XCUIElementTypeTextField"}},
{"action": "type", "text": "google.com\n"},
{"action": "delay", "duration_ms": 2000},
{"action": "select_web_context", "url_contains": "google"},
{"action": "wait_for", "context": "web", "predicate": {"css_selector": "textarea[name='q']"}, "timeout_ms": 5000},
{"action": "type", "context": "web", "predicate": {"css_selector": "textarea[name='q']"}, "text": "weather"},
{"action": "press_key", "context": "web", "key": "enter"}
],
"on_fail": {"strategy": "retry", "max_retries": 2}
}
Webhooks
Overview
MobAI can notify external services when device and bridge events occur. Configure webhook URLs from Integrations → Webhooks in the app title bar.
Each webhook is delivered as an HTTP POST with Content-Type: application/json. Failed deliveries are retried up to 4 times with exponential backoff (1s, 2s, 4s, 8s). Each request has a 10-second timeout.
You can create multiple webhooks, each subscribed to different events, and toggle them on or off individually.
Events
| Event | Description |
|---|---|
device:connected |
An iOS or Android device was detected and connected |
device:disconnected |
A previously connected device was disconnected |
bridge:started |
The on-device bridge agent was started successfully |
bridge:stopped |
The on-device bridge agent was stopped or crashed |
Payload
Every webhook delivery sends a JSON body with the following structure:
{
"event": "device:connected",
"timestamp": "2026-02-15T10:30:00Z",
"data": {
"deviceId": "00008101-000A1C3E2100001E",
"platform": "ios",
"name": "iPhone 15 Pro"
}
}
| Field | Type | Description |
|---|---|---|
event |
string | One of the event types listed above |
timestamp |
string | ISO 8601 UTC timestamp |
data.deviceId |
string | Unique device identifier (UDID for iOS, serial for Android) |
data.platform |
string | ios or android |
data.name |
string | Human-readable device name |
Custom Headers
Each webhook can include custom HTTP headers sent with every delivery. Use this for authentication tokens, API keys, or any other headers your endpoint requires.
Headers are stored securely in your operating system's keychain (macOS Keychain, Windows Credential Manager, or encrypted file on Linux) — they never leave your machine and are not included in settings exports.