Documentation

Everything you need to automate iOS and Android devices with MobAI. HTTP API reference, DSL scripting, web automation, and performance profiling.

Overview

MobAI exposes a local HTTP API on localhost:8686 that lets you control connected Android and iOS devices. Use it directly, through the MCP server, or via DSL batch scripts.

HTTP API

RESTful endpoints for device control. Tap, swipe, type, screenshot, and more.

DSL Scripts

JSON-based automation scripts with predicates, retries, conditionals, and assertions.

Web Automation

Control Safari, Chrome, and WebViews with CSS selectors and JavaScript execution.

Quick Start

Start the MobAI app, connect a device, then hit the API. The bridge auto-starts on first request.

# List connected devices
GET http://127.0.0.1:8686/api/v1/devices

# Take a screenshot
GET http://127.0.0.1:8686/api/v1/devices/{id}/screenshot

# Tap at coordinates
POST http://127.0.0.1:8686/api/v1/devices/{id}/tap
{"x": 150, "y": 300}

# Run a DSL script
POST http://127.0.0.1:8686/api/v1/devices/{id}/dsl/execute
{"version": "0.2", "steps": [{"action": "open_app", "bundle_id": "com.apple.Preferences"}]}

Base URL: http://127.0.0.1:8686/api/v1

HTTP API Reference

REST Endpoints

Devices

GET /devices List all connected devices

Returns an array of connected Android and iOS devices with their status.

Response

[{
  "id": "00008101-001A0C3E1234",
  "name": "iPhone 15 Pro",
  "platform": "ios",
  "model": "iPhone15,3",
  "osVersion": "18.2",
  "bridgeRunning": true,
  "virtual": false
}]
GET /devices/{id} Get device info

Returns information about a specific device. The id is the serial number (Android) or UDID (iOS).

Screenshots & UI Tree

GET /devices/{id}/screenshot Take screenshot

Captures a screenshot. Default mode saves to file and returns the path. Use ?format=base64 for inline data.

Response

{"path": "/tmp/mobai/screenshots/device-1234.png", "format": "png"}
GET /devices/{id}/ui-tree Get UI accessibility tree

Returns the current UI accessibility tree with element indices, types, text, and bounds.

ParameterTypeDescription
onlyVisibleboolFilter to visible elements only (default: true)
includeKeyboardboolInclude keyboard elements (default: false)
verboseboolInclude elements array with bounds (default: false)

Response

{
  "tree": "[0] Button \"Settings\" (10,100 200x50)\n[1] Label \"Wi-Fi\" ...",
  "activity": "com.apple.Preferences"
}
GET /devices/{id}/ocr OCR text recognition (iOS only)

Uses Apple Vision framework to detect text on screen. Useful for system dialogs that don't expose accessibility elements.

{
  "elements": [{"text": "Sign In", "confidence": 0.98, "x": 150, "y": 400, "width": 100, "height": 30}],
  "imageSize": {"width": 1170, "height": 2532},
  "screenSize": {"width": 390, "height": 844},
  "scale": 3.0
}

Actions

POST /devices/{id}/tap Tap on screen

Tap at coordinates or on an element by index. Provide either index or x+y.

// By coordinates
{"x": 150, "y": 300}

// By UI tree element index
{"index": 5}
POST /devices/{id}/double-tap Double tap

Double tap at coordinates or on an element by index. Same request format as /tap.

{"index": 5}
// or
{"x": 150, "y": 300}
POST /devices/{id}/long-press Long press

Long press at coordinates or on an element by index. Uses a fixed 0.5s hold duration.

{"index": 5}
POST /devices/{id}/two-finger-tap Two-finger tap (iOS)

Two-finger tap at coordinates or element index. iOS only.

{"x": 150, "y": 300}
POST /devices/{id}/swipe Swipe gesture
{"fromX": 200, "fromY": 500, "toX": 200, "toY": 100, "duration": 300}
POST /devices/{id}/drag Drag gesture

Drag from one point to another. Optional duration in ms (default: 500). Optional pressDuration in ms to hold before dragging (for moving icons).

{"fromX": 100, "fromY": 200, "toX": 300, "toY": 400, "duration": 500, "pressDuration": 0}
POST /devices/{id}/type Type text

Types text into the currently focused input field.

{"text": "Hello World"}
POST /devices/{id}/dismiss-keyboard Dismiss keyboard
POST /devices/{id}/go-home Go to home screen

Navigates to the device home screen. No request body required.

POST /devices/{id}/open-url Open URL in browser
{"url": "https://example.com"}
POST /devices/{id}/location Set simulated GPS location
{"lat": 40.7128, "lon": -74.0060}

Note: On Android physical devices, location simulation requires Android 12+. Emulators and iOS devices have no version restriction.

DELETE /devices/{id}/location Reset to real GPS location

Apps

GET /devices/{id}/apps List installed apps
[{"bundleId": "com.apple.Preferences", "name": "Settings"}]
POST /devices/{id}/launch-app Launch application
{"bundleId": "com.apple.mobilesafari"}

// With debugger (iOS only, enables JIT for Flutter)
{"bundleId": "com.example.app", "withDebugger": true}
POST /devices/{id}/kill-app Force-kill application
{"bundleId": "com.apple.mobilesafari"}
POST /devices/{id}/install-app Install APK/IPA

Installs an app from a local file path. For iOS, optionally resign with Apple ID.

{"path": "/path/to/app.apk"}

// iOS with re-signing
{"path": "/path/to/app.ipa", "resign": true, "appleId": "[email protected]"}
DELETE /devices/{id}/apps/{bundleId} Uninstall application

Bridge Management

The bridge is the on-device automation agent (accessibility service on Android, WebDriverAgent on iOS). It auto-starts on first API call, but you can manage it explicitly.

POST /devices/{id}/bridge/start Start device bridge
POST /devices/{id}/bridge/stop Stop device bridge

AI Agent

POST /devices/{id}/agent/run Run AI agent task

Runs an AI agent that autonomously executes a task on the device. Synchronous — returns when the agent completes. Requires bridge running and LLM provider configured in app settings.

{
  "task": "Open Settings and enable Wi-Fi",
  "agentType": "toolagent",    // optional: toolagent | hierarchical | classic
  "useVision": true             // optional: enable screenshot analysis
}

Response

{"success": true, "result": "Wi-Fi has been enabled successfully"}

Web Automation

Control browser content and WebViews. iOS uses WebInspector (physical devices only). Android uses Chrome DevTools Protocol.

GET /devices/{id}/web/pages List browser tabs / WebViews
POST /devices/{id}/web/select Select a web page
{"pageId": 0}
GET /devices/{id}/web/dom Get DOM tree
POST /devices/{id}/web/navigate Navigate to URL
{"url": "https://example.com"}
POST /devices/{id}/web/click Click element by CSS selector
{"selector": "button.submit"}
POST /devices/{id}/web/type Type into element
{"selector": "input#email", "text": "[email protected]"}
POST /devices/{id}/web/execute Execute JavaScript
{"script": "return document.title"}
DELETE /devices/{id}/web Disconnect web session

Port Forwarding

POST /devices/{id}/forward Start port forwarding
{"devicePort": 8080, "hostPort": 0}  // 0 = auto-assign

Response

{"id": "fwd-abc123", "hostPort": 49152, "devicePort": 8080}
GET /devices/{id}/forward List port forwards
DELETE /devices/{id}/forward/{forwardId} Stop port forwarding

Log Streaming

GET /devices/{id}/logs Stream device logs (SSE)

Streams device logs in real-time using Server-Sent Events.

ParameterTypeDescription
processstringFilter by process name
levelstringMin level (iOS: Debug/Info/Error, Android: V/D/I/W/E/F)
tagstringFilter by tag (Android only)
containsstringFilter messages containing string

Performance Metrics

POST /devices/{id}/metrics/start Start metrics collection
POST /devices/{id}/metrics/stop Stop and get summary
GET /devices/{id}/metrics/stream Stream metrics (SSE)
GET /devices/{id}/metrics/snapshot Get current snapshot
GET /devices/{id}/metrics/summary Get session summary
POST /devices/{id}/dsl/execute Execute DSL script

Executes a DSL automation script on the device. See the DSL section below for the full script format.

Automation DSL

DSL Script Reference

Script Format

DSL scripts are JSON objects sent to POST /devices/{id}/dsl/execute. Each script contains a version, an array of steps, and an optional failure strategy.

{
  "version": "0.2",
  "steps": [
    {"action": "open_app", "bundle_id": "com.apple.Preferences"},
    {"action": "tap", "predicate": {"text_contains": "Wi-Fi"}},
    {"action": "wait_for", "predicate": {"type": "switch"}, "timeout_ms": 3000}
  ],
  "on_fail": {"strategy": "abort"}
}

Native Actions

ActionDescription
open_appLaunch app by bundle ID
tapTap element by predicate or coordinates
long_pressLong press (0.5s hold)
typeType text, optionally clear first
swipeSwipe by direction or coordinates
scrollScroll to find an element
press_keyPress hardware/keyboard key
toggleToggle a switch on or off
navigateGo home, back, or recent apps
observeGet UI tree, screenshot, activity, or OCR
wait_forWait for element or UI stability
delayFixed time delay
if_existsConditional execution
screenshotSave screenshot to file
kill_appForce-kill a running app
set_locationSimulate GPS coordinates (Android physical: 12+ only)
reset_locationReset to real GPS

tap

// By predicate
{"action": "tap", "predicate": {"text_contains": "Settings"}}

// By coordinates
{"action": "tap", "coords": {"x": 100, "y": 200}}

type

{"action": "type", "text": "Hello World"}

// With clear and predicate
{"action": "type", "text": "Hello", "predicate": {"type": "input"}, "clear_first": true}

swipe

// By direction
{"action": "swipe", "direction": "up", "distance": "medium"}

// By coordinates
{"action": "swipe", "from_coords": {"x": 200, "y": 400}, "to_coords": {"x": 200, "y": 100}}

scroll

{"action": "scroll", "direction": "down", "to_element": {"predicate": {"text": "Privacy"}}, "max_scrolls": 10}

wait_for

// Wait for element
{"action": "wait_for", "predicate": {"text": "Welcome"}, "timeout_ms": 5000}

// Wait for UI stability (animations to finish)
{"action": "wait_for", "stable": true, "timeout_ms": 5000}

if_exists

{
  "action": "if_exists",
  "predicate": {"text": "Allow"},
  "then": [{"action": "tap", "predicate": {"text": "Allow"}}],
  "else": []
}

toggle

{"action": "toggle", "predicate": {"type": "switch", "text_contains": "Wi-Fi"}, "state": "on"}

observe

{"action": "observe", "context": "native", "include": ["ui_tree", "screenshot"]}

// OCR for system dialogs (iOS only)
{"action": "observe", "include": ["ocr"]}

press_key

Android: enter, tab, delete, escape, volume_up, volume_down, home, back, recent_apps, mute, power
iOS: enter, tab, delete, home, volume_up, volume_down

{"action": "press_key", "key": "enter"}

Web Actions

Web actions use "context": "web" and CSS selectors. Select a web context first.

// Select browser tab
{"action": "select_web_context", "url_contains": "google.com"}

// Click DOM element
{"action": "tap", "context": "web", "predicate": {"css_selector": "button.submit"}}

// Type into form field
{"action": "type", "context": "web", "predicate": {"css_selector": "input#email"}, "text": "[email protected]"}

// Navigate to URL
{"action": "navigate", "url": "https://example.com"}

// Execute JavaScript
{"action": "execute_js", "script": "return document.title"}

// Wait for DOM element
{"action": "wait_for", "context": "web", "predicate": {"css_selector": "div.loaded"}, "timeout_ms": 5000}

Requirements

iOS: Physical device only (WebInspector uses usbmuxd). Android: Physical device or emulator with Chrome remote debugging.

Assertions

// Assert element exists
{"action": "assert_exists", "predicate": {"text": "Success"}, "timeout_ms": 3000}

// Assert element doesn't exist
{"action": "assert_not_exists", "predicate": {"text": "Error"}}

// Assert element count
{"action": "assert_count", "predicate": {"type": "cell"}, "count": 5}

// Assert screen changed after action
{"action": "assert_screen_changed", "threshold_percent": 15}

Predicates

Predicates match UI elements for actions. Combine multiple fields for precise targeting.

Native Predicates

FieldDescription
textExact text match
text_containsContains substring (case-insensitive)
text_starts_withStarts with prefix
text_regexRegular expression pattern
typeElement type: button, input, switch, text, image, cell, scrollview
labelAccessibility label (exact match)
label_containsAccessibility label (partial match)
enabledEnabled state (bool)
visibleVisible state (bool)
selectedSelected state (bool)
nearSpatial relation to another element
bounds_hintScreen region: top_half, bottom_half, left_half, right_half, center
parent_ofFind parent by child predicate
indexNth match (0-based)

Web Predicates

SelectorDescription
#login-btnElement with id="login-btn"
.btn-primaryElements with class "btn-primary"
button.submitButton with class "submit"
input[type='email']Input with type="email"
a[href*='login']Links containing "login" in href

Failure Strategies

Each step can have an on_fail handler to control what happens when it fails.

StrategyDescription
abortStop execution immediately (default)
skipSkip this step and continue
retryRetry with max_retries and retry_delay_ms
replanReturn for replanning
require_userRequest user intervention
{
  "action": "tap",
  "predicate": {"text": "Continue"},
  "on_fail": {
    "strategy": "retry",
    "max_retries": 3,
    "retry_delay_ms": 1000,
    "fallback_strategy": {"strategy": "skip"}
  }
}

Performance Metrics

Collect CPU, memory, FPS, network, and battery metrics during automation runs.

Metric TypeDescription
system_cpuSystem-wide CPU usage (%)
system_memorySystem memory usage (bytes, %)
fpsFrame rate and jank detection
networkNetwork I/O (rx/tx bytes)
batteryBattery level, drain rate, temperature
processPer-app CPU/memory (requires bundle_id)
{
  "version": "0.2",
  "steps": [
    {"action": "metrics_start", "types": ["system_cpu", "system_memory", "fps"], "label": "app_launch"},
    {"action": "open_app", "bundle_id": "com.example.myapp"},
    {"action": "wait_for", "predicate": {"text": "Home"}, "timeout_ms": 10000},
    {"action": "delay", "duration_ms": 2000},
    {"action": "metrics_stop"}
  ]
}

Complete Examples

Open Settings and Toggle Wi-Fi

{
  "version": "0.2",
  "steps": [
    {"action": "open_app", "bundle_id": "com.apple.Preferences"},
    {"action": "delay", "duration_ms": 1000},
    {"action": "tap", "predicate": {"text_contains": "Wi-Fi"}},
    {"action": "wait_for", "predicate": {"type": "switch"}, "timeout_ms": 3000},
    {"action": "toggle", "predicate": {"type": "switch"}, "state": "off"}
  ],
  "on_fail": {"strategy": "abort"}
}

Web: Google Search via Safari

{
  "version": "0.2",
  "steps": [
    {"action": "open_app", "bundle_id": "com.apple.mobilesafari"},
    {"action": "delay", "duration_ms": 1000},
    {"action": "tap", "predicate": {"type": "XCUIElementTypeTextField"}},
    {"action": "type", "text": "google.com\n"},
    {"action": "delay", "duration_ms": 2000},
    {"action": "select_web_context", "url_contains": "google"},
    {"action": "wait_for", "context": "web", "predicate": {"css_selector": "textarea[name='q']"}, "timeout_ms": 5000},
    {"action": "type", "context": "web", "predicate": {"css_selector": "textarea[name='q']"}, "text": "weather"},
    {"action": "press_key", "context": "web", "key": "enter"}
  ],
  "on_fail": {"strategy": "retry", "max_retries": 2}
}

Webhooks

Overview

MobAI can notify external services when device and bridge events occur. Configure webhook URLs from Integrations → Webhooks in the app title bar.

Each webhook is delivered as an HTTP POST with Content-Type: application/json. Failed deliveries are retried up to 4 times with exponential backoff (1s, 2s, 4s, 8s). Each request has a 10-second timeout.

You can create multiple webhooks, each subscribed to different events, and toggle them on or off individually.

Events

Event Description
device:connected An iOS or Android device was detected and connected
device:disconnected A previously connected device was disconnected
bridge:started The on-device bridge agent was started successfully
bridge:stopped The on-device bridge agent was stopped or crashed

Payload

Every webhook delivery sends a JSON body with the following structure:

{
  "event": "device:connected",
  "timestamp": "2026-02-15T10:30:00Z",
  "data": {
    "deviceId": "00008101-000A1C3E2100001E",
    "platform": "ios",
    "name": "iPhone 15 Pro"
  }
}
Field Type Description
event string One of the event types listed above
timestamp string ISO 8601 UTC timestamp
data.deviceId string Unique device identifier (UDID for iOS, serial for Android)
data.platform string ios or android
data.name string Human-readable device name

Custom Headers

Each webhook can include custom HTTP headers sent with every delivery. Use this for authentication tokens, API keys, or any other headers your endpoint requires.

Headers are stored securely in your operating system's keychain (macOS Keychain, Windows Credential Manager, or encrypted file on Linux) — they never leave your machine and are not included in settings exports.