Documentation

Everything you need to automate iOS and Android devices with MobAI. HTTP API reference, DSL scripting, web automation, and performance profiling.

Overview

MobAI exposes a local HTTP API on localhost:8686 that lets you control connected Android and iOS devices. Use it directly, through the MCP server, or via DSL batch scripts.

HTTP API

RESTful endpoints for device control. Tap, swipe, type, screenshot, and more.

DSL Scripts

JSON-based automation scripts with predicates, retries, conditionals, and assertions.

Web Automation

Control Safari, Chrome, and WebViews with CSS selectors and JavaScript execution.

Quick Start

Start the MobAI app, connect a device, then hit the API. The bridge auto-starts on first request.

# List connected devices
GET http://127.0.0.1:8686/api/v1/devices

# Take a screenshot
GET http://127.0.0.1:8686/api/v1/devices/{id}/screenshot

# Tap at coordinates
POST http://127.0.0.1:8686/api/v1/devices/{id}/tap
{"x": 150, "y": 300}

# Run a DSL script
POST http://127.0.0.1:8686/api/v1/devices/{id}/dsl/execute
{"version": "0.2", "steps": [{"action": "open_app", "bundle_id": "com.apple.Preferences"}]}

Base URL: http://127.0.0.1:8686/api/v1

HTTP API Reference

REST Endpoints

Devices

GET /devices List all connected devices

Returns an array of connected Android and iOS devices with their status.

Response

[{
  "id": "00008101-001A0C3E1234",
  "name": "iPhone 15 Pro",
  "platform": "ios",
  "model": "iPhone15,3",
  "osVersion": "18.2",
  "bridgeRunning": true,
  "virtual": false
}]

GET /devices/{id} Get device info

Returns information about a specific device. The id is the serial number (Android) or UDID (iOS).

Screenshots & UI Tree

GET /devices/{id}/screenshot Take screenshot

Captures a screenshot. Default mode saves to file and returns the path. Use ?format=base64 for inline data.

Response

{"path": "/tmp/mobai/screenshots/device-1234.png", "format": "png"}

GET /devices/{id}/ui-tree Get UI accessibility tree

Returns the current UI accessibility tree with element indices, types, text, and bounds.

Parameter	Type	Description
onlyVisible	bool	Filter to visible elements only (default: true)
includeKeyboard	bool	Include keyboard elements (default: false)
verbose	bool	Include elements array with bounds (default: false)

Response

{
  "tree": "[0] Button \"Settings\" (10,100 200x50)\n[1] Label \"Wi-Fi\" ...",
  "activity": "com.apple.Preferences"
}

GET /devices/{id}/ocr OCR text recognition (iOS only)

Uses Apple Vision framework to detect text on screen. Useful for system dialogs that don't expose accessibility elements.

{
  "elements": [{"text": "Sign In", "confidence": 0.98, "x": 150, "y": 400, "width": 100, "height": 30}],
  "imageSize": {"width": 1170, "height": 2532},
  "screenSize": {"width": 390, "height": 844},
  "scale": 3.0
}

Actions

POST /devices/{id}/tap Tap on screen

Tap at coordinates or on an element by index. Provide either index or x+y.

// By coordinates
{"x": 150, "y": 300}

// By UI tree element index
{"index": 5}

POST /devices/{id}/double-tap Double tap

Double tap at coordinates or on an element by index. Same request format as /tap.

{"index": 5}
// or
{"x": 150, "y": 300}

POST /devices/{id}/long-press Long press

Long press at coordinates or on an element by index. Uses a fixed 0.5s hold duration.

{"index": 5}

POST /devices/{id}/two-finger-tap Two-finger tap (iOS)

Two-finger tap at coordinates or element index. iOS only.

{"x": 150, "y": 300}

POST /devices/{id}/swipe Swipe gesture

{"fromX": 200, "fromY": 500, "toX": 200, "toY": 100, "duration": 300}

POST /devices/{id}/drag Drag gesture

Drag from one point to another. Optional duration in ms (default: 500). Optional pressDuration in ms to hold before dragging (for moving icons).

{"fromX": 100, "fromY": 200, "toX": 300, "toY": 400, "duration": 500, "pressDuration": 0}

POST /devices/{id}/type Type text

Types text into the currently focused input field.

{"text": "Hello World"}

POST /devices/{id}/dismiss-keyboard Dismiss keyboard

POST /devices/{id}/go-home Go to home screen

Navigates to the device home screen. No request body required.

POST /devices/{id}/open-url Open URL in browser

{"url": "https://example.com"}

POST /devices/{id}/location Set simulated GPS location

{"lat": 40.7128, "lon": -74.0060}

Note: On Android physical devices, location simulation requires Android 12+. Emulators and iOS devices have no version restriction.

DELETE /devices/{id}/location Reset to real GPS location

Apps

GET /devices/{id}/apps List installed apps

[{"bundleId": "com.apple.Preferences", "name": "Settings"}]

POST /devices/{id}/launch-app Launch application

{"bundleId": "com.apple.mobilesafari"}

// With debugger (iOS only, enables JIT for Flutter)
{"bundleId": "com.example.app", "withDebugger": true}

POST /devices/{id}/kill-app Force-kill application

{"bundleId": "com.apple.mobilesafari"}

POST /devices/{id}/install-app Install APK/IPA

Installs an app from a local file path. For iOS, optionally resign with Apple ID.

{"path": "/path/to/app.apk"}

// iOS with re-signing
{"path": "/path/to/app.ipa", "resign": true, "appleId": "[email protected]"}

DELETE /devices/{id}/apps/{bundleId} Uninstall application

Bridge Management

The bridge is the on-device automation agent (accessibility service on Android, WebDriverAgent on iOS). It auto-starts on first API call, but you can manage it explicitly.

POST /devices/{id}/bridge/start Start device bridge

POST /devices/{id}/bridge/stop Stop device bridge

AI Agent

POST /devices/{id}/agent/run Run AI agent task

Runs an AI agent that autonomously executes a task on the device. Synchronous — returns when the agent completes. Requires bridge running and LLM provider configured in app settings.

{
  "task": "Open Settings and enable Wi-Fi",
  "agentType": "toolagent",    // optional: toolagent | hierarchical | classic
  "useVision": true             // optional: enable screenshot analysis
}

Response

{"success": true, "result": "Wi-Fi has been enabled successfully"}

Web Automation

Control browser content and WebViews. iOS uses WebInspector (physical devices only). Android uses Chrome DevTools Protocol.

GET /devices/{id}/web/pages List browser tabs / WebViews

POST /devices/{id}/web/select Select a web page

{"pageId": 0}

GET /devices/{id}/web/dom Get DOM tree

POST /devices/{id}/web/navigate Navigate to URL

{"url": "https://example.com"}

POST /devices/{id}/web/click Click element by CSS selector

{"selector": "button.submit"}

POST /devices/{id}/web/type Type into element

{"selector": "input#email", "text": "[email protected]"}

POST /devices/{id}/web/execute Execute JavaScript

{"script": "return document.title"}

DELETE /devices/{id}/web Disconnect web session

Port Forwarding

POST /devices/{id}/forward Start port forwarding

{"devicePort": 8080, "hostPort": 0}  // 0 = auto-assign

Response

{"id": "fwd-abc123", "hostPort": 49152, "devicePort": 8080}

GET /devices/{id}/forward List port forwards

DELETE /devices/{id}/forward/{forwardId} Stop port forwarding

Log Streaming

GET /devices/{id}/logs Stream device logs (SSE)

Streams device logs in real-time using Server-Sent Events.

Parameter	Type	Description
process	string	Filter by process name
level	string	Min level (iOS: Debug/Info/Error, Android: V/D/I/W/E/F)
tag	string	Filter by tag (Android only)
contains	string	Filter messages containing string

Performance Metrics

POST /devices/{id}/metrics/start Start metrics collection

POST /devices/{id}/metrics/stop Stop and get summary

GET /devices/{id}/metrics/stream Stream metrics (SSE)

GET /devices/{id}/metrics/snapshot Get current snapshot

GET /devices/{id}/metrics/summary Get session summary

POST /devices/{id}/dsl/execute Execute DSL script

Executes a DSL automation script on the device. See the DSL section below for the full script format.

Automation DSL

DSL Script Reference

Script Format

DSL scripts are JSON objects sent to POST /devices/{id}/dsl/execute. Each script contains a version, an array of steps, and an optional failure strategy.

{
  "version": "0.2",
  "steps": [
    {"action": "open_app", "bundle_id": "com.apple.Preferences"},
    {"action": "tap", "predicate": {"text_contains": "Wi-Fi"}},
    {"action": "wait_for", "predicate": {"type": "switch"}, "timeout_ms": 3000}
  ],
  "on_fail": {"strategy": "abort"}
}

Native Actions

Action	Description
open_app	Launch app by bundle ID
tap	Tap element by predicate or coordinates
long_press	Long press (0.5s hold)
type	Type text, optionally clear first
swipe	Swipe by direction or coordinates
scroll	Scroll to find an element
press_key	Press hardware/keyboard key
toggle	Toggle a switch on or off
navigate	Go home, back, or recent apps
observe	Get UI tree, screenshot, activity, or OCR
wait_for	Wait for element or UI stability
delay	Fixed time delay
if_exists	Conditional execution
screenshot	Save screenshot to file
kill_app	Force-kill a running app
set_location	Simulate GPS coordinates (Android physical: 12+ only)
reset_location	Reset to real GPS

tap

// By predicate
{"action": "tap", "predicate": {"text_contains": "Settings"}}

// By coordinates
{"action": "tap", "coords": {"x": 100, "y": 200}}

type

{"action": "type", "text": "Hello World"}

// With clear and predicate
{"action": "type", "text": "Hello", "predicate": {"type": "input"}, "clear_first": true}

swipe

// By direction
{"action": "swipe", "direction": "up", "distance": "medium"}

// By coordinates
{"action": "swipe", "from_coords": {"x": 200, "y": 400}, "to_coords": {"x": 200, "y": 100}}

scroll

{"action": "scroll", "direction": "down", "to_element": {"predicate": {"text": "Privacy"}}, "max_scrolls": 10}

wait_for

// Wait for element
{"action": "wait_for", "predicate": {"text": "Welcome"}, "timeout_ms": 5000}

// Wait for UI stability (animations to finish)
{"action": "wait_for", "stable": true, "timeout_ms": 5000}

if_exists

{
  "action": "if_exists",
  "predicate": {"text": "Allow"},
  "then": [{"action": "tap", "predicate": {"text": "Allow"}}],
  "else": []
}

toggle

{"action": "toggle", "predicate": {"type": "switch", "text_contains": "Wi-Fi"}, "state": "on"}

observe

{"action": "observe", "context": "native", "include": ["ui_tree", "screenshot"]}

// OCR for system dialogs (iOS only)
{"action": "observe", "include": ["ocr"]}

press_key

Android: enter, tab, delete, escape, volume_up, volume_down, home, back, recent_apps, mute, power
iOS: enter, tab, delete, home, volume_up, volume_down

{"action": "press_key", "key": "enter"}

Web Actions

Web actions use "context": "web" and CSS selectors. Select a web context first.

// Select browser tab
{"action": "select_web_context", "url_contains": "google.com"}

// Click DOM element
{"action": "tap", "context": "web", "predicate": {"css_selector": "button.submit"}}

// Type into form field
{"action": "type", "context": "web", "predicate": {"css_selector": "input#email"}, "text": "[email protected]"}

// Navigate to URL
{"action": "navigate", "url": "https://example.com"}

// Execute JavaScript
{"action": "execute_js", "script": "return document.title"}

// Wait for DOM element
{"action": "wait_for", "context": "web", "predicate": {"css_selector": "div.loaded"}, "timeout_ms": 5000}

Requirements

iOS: Physical device only (WebInspector uses usbmuxd). Android: Physical device or emulator with Chrome remote debugging.

Assertions

// Assert element exists
{"action": "assert_exists", "predicate": {"text": "Success"}, "timeout_ms": 3000}

// Assert element doesn't exist
{"action": "assert_not_exists", "predicate": {"text": "Error"}}

// Assert element count
{"action": "assert_count", "predicate": {"type": "cell"}, "count": 5}

// Assert screen changed after action
{"action": "assert_screen_changed", "threshold_percent": 15}

Predicates

Predicates match UI elements for actions. Combine multiple fields for precise targeting.

Native Predicates

Field	Description
text	Exact text match
text_contains	Contains substring (case-insensitive)
text_starts_with	Starts with prefix
text_regex	Regular expression pattern
type	Element type: button, input, switch, text, image, cell, scrollview
label	Accessibility label (exact match)
label_contains	Accessibility label (partial match)
enabled	Enabled state (bool)
visible	Visible state (bool)
selected	Selected state (bool)
near	Spatial relation to another element
bounds_hint	Screen region: top_half, bottom_half, left_half, right_half, center
parent_of	Find parent by child predicate
index	Nth match (0-based)

Web Predicates

Selector	Description
#login-btn	Element with id="login-btn"
.btn-primary	Elements with class "btn-primary"
button.submit	Button with class "submit"
input[type='email']	Input with type="email"
a[href*='login']	Links containing "login" in href

Failure Strategies

Each step can have an on_fail handler to control what happens when it fails.

Strategy	Description
abort	Stop execution immediately (default)
skip	Skip this step and continue
retry	Retry with max_retries and retry_delay_ms
replan	Return for replanning
require_user	Request user intervention

{
  "action": "tap",
  "predicate": {"text": "Continue"},
  "on_fail": {
    "strategy": "retry",
    "max_retries": 3,
    "retry_delay_ms": 1000,
    "fallback_strategy": {"strategy": "skip"}
  }
}

Performance Metrics

Collect CPU, memory, FPS, network, and battery metrics during automation runs.

Metric Type	Description
system_cpu	System-wide CPU usage (%)
system_memory	System memory usage (bytes, %)
fps	Frame rate and jank detection
network	Network I/O (rx/tx bytes)
battery	Battery level, drain rate, temperature
process	Per-app CPU/memory (requires bundle_id)

{
  "version": "0.2",
  "steps": [
    {"action": "metrics_start", "types": ["system_cpu", "system_memory", "fps"], "label": "app_launch"},
    {"action": "open_app", "bundle_id": "com.example.myapp"},
    {"action": "wait_for", "predicate": {"text": "Home"}, "timeout_ms": 10000},
    {"action": "delay", "duration_ms": 2000},
    {"action": "metrics_stop"}
  ]
}

Complete Examples

Open Settings and Toggle Wi-Fi

{
  "version": "0.2",
  "steps": [
    {"action": "open_app", "bundle_id": "com.apple.Preferences"},
    {"action": "delay", "duration_ms": 1000},
    {"action": "tap", "predicate": {"text_contains": "Wi-Fi"}},
    {"action": "wait_for", "predicate": {"type": "switch"}, "timeout_ms": 3000},
    {"action": "toggle", "predicate": {"type": "switch"}, "state": "off"}
  ],
  "on_fail": {"strategy": "abort"}
}

Web: Google Search via Safari

{
  "version": "0.2",
  "steps": [
    {"action": "open_app", "bundle_id": "com.apple.mobilesafari"},
    {"action": "delay", "duration_ms": 1000},
    {"action": "tap", "predicate": {"type": "XCUIElementTypeTextField"}},
    {"action": "type", "text": "google.com\n"},
    {"action": "delay", "duration_ms": 2000},
    {"action": "select_web_context", "url_contains": "google"},
    {"action": "wait_for", "context": "web", "predicate": {"css_selector": "textarea[name='q']"}, "timeout_ms": 5000},
    {"action": "type", "context": "web", "predicate": {"css_selector": "textarea[name='q']"}, "text": "weather"},
    {"action": "press_key", "context": "web", "key": "enter"}
  ],
  "on_fail": {"strategy": "retry", "max_retries": 2}
}

Webhooks

Overview

MobAI can notify external services when device and bridge events occur. Configure webhook URLs from Integrations → Webhooks in the app title bar.

Each webhook is delivered as an HTTP POST with Content-Type: application/json. Failed deliveries are retried up to 4 times with exponential backoff (1s, 2s, 4s, 8s). Each request has a 10-second timeout.

You can create multiple webhooks, each subscribed to different events, and toggle them on or off individually.

Events

Event	Description
`device:connected`	An iOS or Android device was detected and connected
`device:disconnected`	A previously connected device was disconnected
`bridge:started`	The on-device bridge agent was started successfully
`bridge:stopped`	The on-device bridge agent was stopped or crashed

Payload

Every webhook delivery sends a JSON body with the following structure:

{
  "event": "device:connected",
  "timestamp": "2026-02-15T10:30:00Z",
  "data": {
    "deviceId": "00008101-000A1C3E2100001E",
    "platform": "ios",
    "name": "iPhone 15 Pro"
  }
}

Field	Type	Description
`event`	string	One of the event types listed above
`timestamp`	string	ISO 8601 UTC timestamp
`data.deviceId`	string	Unique device identifier (UDID for iOS, serial for Android)
`data.platform`	string	`ios` or `android`
`data.name`	string	Human-readable device name

Custom Headers

Each webhook can include custom HTTP headers sent with every delivery. Use this for authentication tokens, API keys, or any other headers your endpoint requires.

Headers are stored securely in your operating system's keychain (macOS Keychain, Windows Credential Manager, or encrypted file on Linux) — they never leave your machine and are not included in settings exports.