OmniMCP
Agent-Native Interface for Vision-Language UI Automation
Unlock powerful automation through scene graph tracking, rich visual context, persistent memory, and intuitive interactions powered by OmniParser and the Model Context Protocol (MCP).
Core Features:
- Agent-Native Interface
- Rich Visual Context
- Scene Graph Tracking
- Memory Persistence
- Natural Language UI
- Comprehensive Actions
- Structured Types
- Robust Error Handling
from omnimcp import Omni
omni = Omni(endpoint="localhost:1024") # or omni.api
# Log in and get applicant's latest underwriting date
@omni.publish
def extract_underwriting_date(o):
if o.is("Login form ready"):
o.do(f"Enter {o.recall('credentials')}")
o.do("Submit login")
o.observe("latest underwriting date")
o.store("applicant.last_underwriting_date")
omni.session("extract_underwriting_date").run()
Simple, powerful interface for UI automation
Read the technical whitepaperTechnical Deep Dive
Understand the architecture and capabilities of OmniMCP in depth with our comprehensive technical whitepaper.
Core Features
OmniMCP delivers powerful features to enable deep UI understanding and reliable automation.
Rich Visual Context
Deep understanding of UI elements and their relationships for accurate interaction.
Natural Language Interface
Target and analyze elements using natural descriptions without complex selectors.
Comprehensive Interactions
Full range of UI operations with verification and robust error handling.
Structured Types
Clean, typed responses using dataclasses for reliable integration.
Robust Error Handling
Detailed error context and recovery strategies for reliable automation.
MCP Protocol Integration
Standardized interface for AI model interaction with UI automation.
How OmniMCP Works
Our four-step process creates rich UI understanding for AI models
1. Spatial Feature Understanding
OmniMCP begins by developing a deep understanding of the user interface's visual layout. Using OmniParser, it performs detailed visual parsing, segmenting the screen and identifying all interactive and informational elements.
2. Temporal Feature Understanding
To capture the dynamic aspects of the UI, OmniMCP tracks user interactions and the resulting state transitions. It builds a Process Graph that represents the flow of user workflows.
3. Internal API Generation
Utilizing the rich spatial and temporal context, OmniMCP leverages a Large Language Model to generate an internal, context-specific API through In-Context Learning.
4. External API Publication (MCP)
Finally, OmniMCP exposes this dynamically generated internal API through the Model Context Protocol (MCP), providing a consistent interface for both humans and AI models.
Simple, Transparent Pricing
Choose between self-hosting our open source solution or let us handle everything with our managed plans.
Community
- Full open source access
- Self-hosted deployment
- Community support
- MIT license
Developer Plan
- Fully managed cloud hosting
- Unlimited automation workflows
- Email support
- Regular updates and enhancements
Team Plan
- Up to 5 team members
- Collaboration tools and shared workspaces
- Priority email support
- Advanced analytics and usage insights
Enterprise
- Unlimited users
- Dedicated infrastructure
- 24/7 premium support
- Personalized onboarding and training