Chinese AI Startup Z.AI Releases AutoGLM-Phone Model To Control Phones With Natural Language

Chinese AI companies aren’t only creating foundational models that compete with their US counterparts, but they’re also creating models for some interesting use-cases.

Z.ai, a prominent Chinese AI developer, has released Phone Agent, a multimodal AI system built on its AutoGLM framework that can understand smartphone interfaces and autonomously control Android devices through natural language commands.

The system works by connecting to Android devices via ADB (Android Debug Bridge) and using vision-language models to interpret what’s displayed on screen. Users can issue straightforward commands like “Open Xiaohongshu and search for food recommendations,” and Phone Agent will parse the intent, analyze the current user interface, plan the necessary steps, and execute the entire workflow automatically.

Phone Agent incorporates safety features designed for real-world deployment. The system includes confirmation prompts for sensitive actions and implements human-in-the-loop fallbacks for scenarios requiring manual intervention, such as login procedures or verification code entry. The framework also supports remote ADB debugging, enabling device connection over WiFi or network for flexible remote control and development workflows.

The model architecture mirrors Z.ai’s GLM-4.1V-9B-Thinking model, and the company has made both the framework and model available as open source. Detailed deployment instructions are available through Z.ai’s GitHub repository and the GLM-V repository.

Z.ai was originally founded as Beijing Zhipu Huazhang Technology in 2019 and rebranded internationally as Z.ai. The company originated from Tsinghua University researchers focused on advancing artificial intelligence, particularly large language models. It has grown rapidly, earning recognition as one of China’s “AI Tigers” and the third-largest LLM player in the domestic market by 2024. Key backers include Alibaba, Tencent, Ant Group, Meituan, Xiaomi, and HongShan, with a notable 2023 funding round of approximately $350 million. Z.ai specializes in the GLM series of models, with recent releases like GLM-4.5 and GLM-4.6 emphasizing reasoning, coding, agentic tasks, vision, and video generation.

Posted in AI