Automate Any UI Effortlessly

OmniMCP

Agent-Native Interface for Vision-Language UI Automation

Unlock powerful automation through scene graph tracking, rich visual context, persistent memory, and intuitive interactions powered by OmniParser and the Model Context Protocol (MCP).

Core Features:

  • Agent-Native Interface
  • Rich Visual Context
  • Scene Graph Tracking
  • Memory Persistence
  • Natural Language UI
  • Comprehensive Actions
  • Structured Types
  • Robust Error Handling
from omnimcp import Omni

omni = Omni(endpoint="localhost:1024")  # or omni.api

# Log in and get applicant's latest underwriting date
@omni.publish
def extract_underwriting_date(o):
    if o.is("Login form ready"):
        o.do(f"Enter {o.recall('credentials')}")
        o.do("Submit login")
        o.observe("latest underwriting date")
        o.store("applicant.last_underwriting_date")

omni.session("extract_underwriting_date").run()

Simple, powerful interface for UI automation

Read the technical whitepaper

Technical Deep Dive

Understand the architecture and capabilities of OmniMCP in depth with our comprehensive technical whitepaper.

Core Features

OmniMCP delivers powerful features to enable deep UI understanding and reliable automation.

Rich Visual Context

Deep understanding of UI elements and their relationships for accurate interaction.

Natural Language Interface

Target and analyze elements using natural descriptions without complex selectors.

Comprehensive Interactions

Full range of UI operations with verification and robust error handling.

Structured Types

Clean, typed responses using dataclasses for reliable integration.

Robust Error Handling

Detailed error context and recovery strategies for reliable automation.

MCP Protocol Integration

Standardized interface for AI model interaction with UI automation.

How OmniMCP Works

Our four-step process creates rich UI understanding for AI models

Spatial Feature Understanding

1. Spatial Feature Understanding

OmniMCP begins by developing a deep understanding of the user interface's visual layout. Using OmniParser, it performs detailed visual parsing, segmenting the screen and identifying all interactive and informational elements.

Temporal Feature Understanding

2. Temporal Feature Understanding

To capture the dynamic aspects of the UI, OmniMCP tracks user interactions and the resulting state transitions. It builds a Process Graph that represents the flow of user workflows.

Internal API Generation

3. Internal API Generation

Utilizing the rich spatial and temporal context, OmniMCP leverages a Large Language Model to generate an internal, context-specific API through In-Context Learning.

External API Publication (MCP)

4. External API Publication (MCP)

Finally, OmniMCP exposes this dynamically generated internal API through the Model Context Protocol (MCP), providing a consistent interface for both humans and AI models.

Simple, Transparent Pricing

Choose between self-hosting our open source solution or let us handle everything with our managed plans.

Community

Free/forever
  • Full open source access
  • Self-hosted deployment
  • Community support
  • MIT license
Get Started

Developer Plan

$49/month
  • Fully managed cloud hosting
  • Unlimited automation workflows
  • Email support
  • Regular updates and enhancements
Start Free Trial
Recommended

Team Plan

$199/month
  • Up to 5 team members
  • Collaboration tools and shared workspaces
  • Priority email support
  • Advanced analytics and usage insights
Start Free Trial

Enterprise

Custom/pricing
  • Unlimited users
  • Dedicated infrastructure
  • 24/7 premium support
  • Personalized onboarding and training
Contact Sales

All paid plans include:

Free 14-day trial, cancel anytime
Comprehensive documentation
Secure, reliable cloud infrastructure
Regular feature updates

Join the Waitlist

Be the first to access our managed OmniMCP service when it launches.