Mobile-Agent-v3.5 - A multi-platform GUI agent framework open-sourced by Alibaba Tongyi. - AiBoss

What is Mobile-Agent-v3.5?

Mobile-Agent-v3.5 is a new generation of multi-platform GUI agent framework open-sourced by Alibaba Tongyi Labs, marking the transition of open-source GUI agents from "demonstration level" to "engineering-ready level." The framework natively supports desktop, mobile, and browser platforms, enabling automated operations across Android, Ubuntu, macOS, and Windows. The accompanying GUI-Owl-1.5 model family provides multi-parameter scales from 2B to 235B and decouples two variants: Instruct (lightweight and low-latency) and Thinking (strong planning and reflection), supporting end-to-end deployment from edge to cloud. Mobile-Agent-v3.5 has achieved state-of-the-art (SOTA) results in the open-source field on over 20 mainstream GUI benchmarks, including OSWorld-Verified, AndroidWorld, and VisualWebArena. Through three core technologies—hybrid data flywheel, unified thought chain synthesis, and the MRPO multi-platform reinforcement learning algorithm—it solves challenges such as cross-platform action space differences and instability during long-term task training, providing the community with a complete open-source technical reference from the underlying base model to the agent framework.

Main functions of Mobile-Agent-v3.5

Cross-platform GUI automationIt natively supports three major platforms: desktop, mobile, and browser, enabling unified control and automated operation across multiple devices including Android, Ubuntu, macOS, and Windows.
Multi-parameter model coverageIt comes with the GUI-Owl-1.5 model family, providing multiple parameter scales of 2B/4B/8B/32B/235B, and supports end-to-end deployment solutions from the edge to the cloud.
Dual-mode inference architectureIt decouples two variants, Instruct (lightweight and low-latency) and Thinking (strong planning and reflection), to balance the needs of real-time response and deep reasoning for complex tasks.
Long-term mission planningThrough unified thought chain synthesis technology, it systematically injects capabilities such as tool/MCP invocation, memory management, knowledge retrieval, and multi-agent collaboration to support the execution of complex long-term tasks.
High-performance benchmark performanceIt has achieved state-of-the-art (SOTA) results in the open-source field on more than 20 mainstream GUI benchmarks, including OSWorld-Verified (56.5), AndroidWorld (71.6), and VisualWebArena (46.6).
Multimodal perception and understandingIt possesses visual perception and semantic understanding capabilities, enabling it to recognize interface elements, understand operational intentions, and achieve precise GUI interactions such as clicking, inputting, and swiping.
Reinforcement learning optimizationThe MRPO multi-platform reinforcement learning algorithm is adopted to resolve gradient conflicts caused by differences in action space across platforms and improve the training stability of long-term tasks.

Technical Principles of Mobile-Agent-v3.5

Hybrid Data FlywheelBy combining simulation environments with cloud-based sandboxes, high-quality grounding data and long-range trajectories can be generated at scale, solving the problems of high cost and limited scale of data collection in real environments.
Unified thinking chain synthesisThe system integrates advanced capabilities such as tool/MCP invocation, memory management, knowledge retrieval, and multi-agent collaboration, enabling the model to have long-term planning, reflection, and self-correction capabilities.
MRPO multi-platform reinforcement learning algorithmTo address the gradient conflict caused by differences in action space across platforms, as well as challenges such as unstable training and difficulty in credit allocation for long-term tasks, a unified training and optimization system is achieved across multiple platforms.
GUI-Owl-1.5 base modelAs a native multimodal understanding model, it provides a complete parameter spectrum from 2B to 235B and supports end-to-end GUI interaction for visual perception and semantic reasoning..
Dual variant architecture designThe Instruct variant is optimized for low-latency scenarios, while the Thinking variant enhances planning and reflection capabilities. The decoupled design of the two meets the needs of different application scenarios.
End-to-end training frameworkIt forms a closed loop from data generation and model training to reinforcement learning optimization, supporting unified learning and transfer across platforms and tasks.
Open source ecosystem compatibilityBased on the Qwen3 series architecture optimization, it is compatible with the mainstream AI development ecosystem and supports one-click deployment of ModelScope and HuggingFace model repositories.

Project address for Mobile-Agent-v3.5

Github repositoryhttps://github.com/X-PLUG/MobileAgent

Application scenarios of Mobile-Agent-v3.5

Intelligent device automationAutomatically operate your phone to complete tasks such as using apps, querying information, and adjusting settings, such as automatically ordering takeout, checking the weather, and managing your schedule.
Cross-platform office assistanceIt can automatically perform repetitive office tasks such as document processing, email sending, meeting scheduling, and data entry on Windows, macOS, and Ubuntu desktops.
Web page automated testingIt supports automated operations on the browser side and is suitable for scenarios such as web application testing, form filling, data collection, and e-commerce price comparison.
Deployment of edge AI assistantLeveraging the lightweight 2B/4B model, it enables low-latency local GUI automation assistants on mobile phones, IoT devices, and other edge devices.
Enterprise Process AutomationIn line with RPA requirements, it automates the interface operations of enterprise systems such as ERP and CRM, thereby improving business process efficiency.
Accessibility toolsIt helps visually impaired or operationally limited users to automatically complete complex interface interactions, lowering the barrier to entry for using digital devices.