引言

DroidRun 是一个强大的框架，用于通过 LLM 代理控制 Android 设备。它允许您使用自然语言命令自动化 Android 设备交互。

特点

使用自然语言命令控制 Android 设备
支持多个 LLM 提供商(OpenAI、Anthropic、Gemini)
易于使用的 CLI
用于自定义自动化的可扩展 Python API
屏幕截图分析，直观了解设备

开始使用

前提条件

系统已安装adb工具
通过 USB 或 ADB 通过 TCP/IP 连接 Android 设备或模拟器
Android或模拟器上已安装DroidRun Portal 应用程序
至少一个受支持的 LLM 提供程序的 API 密钥(OpenAI、Anthropic、Gemini)

新建项目

打开Pycharm新建项目droidrun-demo

设置环境

安装依赖

在Pycharm里打开终端输入命令：

pip install droidrun

安装DroidRun Portal

下载link：https://github.com/droidrun/droidrun-portal

下载完成后使用adb命令安装

adb install /Users/wan/Downloads/App/droidrun.apk

设置API key

在droidrun-demo目录打开终端输入命令

touch .env

我用的是Gemini，配置如下：

# Choose at least one of these based on your preferred provider
export GEMINI_API_KEY="***"

然后保存，输入命令

source .env

编写脚本

# coding=utf-8
import asyncio
import os
from droidrun.agent.react_agent import ReActAgent
from droidrun.agent.llm_reasoning import LLMReasoner
from dotenv import load_dotenv# Load environment variables from .env file
load_dotenv()async def main():# Create an LLM instancellm = LLMReasoner(llm_provider="gemini",  # Can be "openai", "anthropic", or "gemini"model_name="gemini-2.5-pro-preview-03-25",  # Choose appropriate model for your providerapi_key=os.environ.get("GEMINI_API_KEY"),  # Get API key from environmenttemperature=0.2,vision=True  # Enable vision capabilities)# Create and run the agentagent = ReActAgent(task="打开Chrome，搜索droidrun",llm=llm,vision=True  # Enable screenshot analysis)steps = await agent.run()print(f"Execution completed with {len(steps)} steps")if __name__ == "__main__":asyncio.run(main())

运行脚本

输出结果

INFO:droidrun:Vision capabilities enabled: screenshot tool available
INFO:droidrun:Using provided LLM reasoner: provider=gemini, model=gemini-2.5-pro-preview-03-25
INFO:droidrun:Using first available device: emulator-5554
INFO:droidrun:Connected to device: emulator-5554
INFO:droidrun:🎯 GOAL: 打开Chrome，搜索droidrun
INFO:droidrun:Token usage before API call: {'prompt_tokens': 0, 'completion_tokens': 0, 'total_tokens': 0, 'api_calls': 0}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #1
INFO:droidrun:This call: 713 prompt + 225 completion = 938 tokens
INFO:droidrun:Cumulative: 713 prompt + 225 completion = 938 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 713, 'completion_tokens': 225, 'total_tokens': 938, 'api_calls': 1}
INFO:droidrun:🤔 THOUGHT: I need to open the Chrome browser first. I will check the clickable elements on the current screen to see if the Chrome icon is present.
INFO:droidrun:🔄 ACTION: get_clickables()
ERROR:droidrun:Error executing tool get_clickables: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:👁️ OBSERVATION: Error: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:Token usage before API call: {'prompt_tokens': 713, 'completion_tokens': 225, 'total_tokens': 938, 'api_calls': 1}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #2
INFO:droidrun:This call: 785 prompt + 174 completion = 959 tokens
INFO:droidrun:Cumulative: 1498 prompt + 399 completion = 1897 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 1498, 'completion_tokens': 399, 'total_tokens': 1897, 'api_calls': 2}
INFO:droidrun:🤔 THOUGHT: The previous attempt to get clickable elements failed. This could be due to various reasons, possibly a temporary issue with the UI automation framework or the device state. To get a better understanding of the current screen and confirm if Chrome is visible, I will take a screenshot.
INFO:droidrun:🔄 ACTION: take_screenshot()
INFO:droidrun:Screenshot compressed successfully: 588.2KB → 56.8KB (90.3% reduction)
INFO:droidrun:👁️ OBSERVATION: Screenshot captured and available for analysis
INFO:droidrun:Token usage before API call: {'prompt_tokens': 1498, 'completion_tokens': 399, 'total_tokens': 1897, 'api_calls': 2}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #3
INFO:droidrun:This call: 1139 prompt + 196 completion = 1335 tokens
INFO:droidrun:Cumulative: 2637 prompt + 595 completion = 3232 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 2637, 'completion_tokens': 595, 'total_tokens': 3232, 'api_calls': 3}
INFO:droidrun:🤔 THOUGHT: The screenshot confirms that the Chrome icon is visible on the home screen dock. I need to tap it to open the browser. I will attempt to get the clickable elements again to find the index for the Chrome icon.
INFO:droidrun:🔄 ACTION: get_clickables()
ERROR:droidrun:Error executing tool get_clickables: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:👁️ OBSERVATION: Error: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:Token usage before API call: {'prompt_tokens': 2637, 'completion_tokens': 595, 'total_tokens': 3232, 'api_calls': 3}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #4
INFO:droidrun:This call: 947 prompt + 312 completion = 1259 tokens
INFO:droidrun:Cumulative: 3584 prompt + 907 completion = 4491 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 3584, 'completion_tokens': 907, 'total_tokens': 4491, 'api_calls': 4}
INFO:droidrun:🤔 THOUGHT: The `get_clickables` tool is consistently failing, preventing me from identifying the Chrome icon's index for tapping. However, I know the goal is to open Chrome. A more robust way to achieve this, especially when UI interaction tools are unreliable, is to start the app directly using its package name. The package name for Google Chrome is 'com.android.chrome'. I will use the `start_app` tool to launch Chrome.
INFO:droidrun:🔄 ACTION: start_app(package=com.android.chrome)
INFO:droidrun:👁️ OBSERVATION: Started com.android.chrome
INFO:droidrun:Token usage before API call: {'prompt_tokens': 3584, 'completion_tokens': 907, 'total_tokens': 4491, 'api_calls': 4}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #5
INFO:droidrun:This call: 1068 prompt + 144 completion = 1212 tokens
INFO:droidrun:Cumulative: 4652 prompt + 1051 completion = 5703 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 4652, 'completion_tokens': 1051, 'total_tokens': 5703, 'api_calls': 5}
INFO:droidrun:🤔 THOUGHT: Chrome has been successfully launched. Now I need to find the search/address bar to input the search query 'droidrun'. I will use `get_clickables` to identify the elements on the current Chrome screen.
INFO:droidrun:🔄 ACTION: get_clickables()
ERROR:droidrun:Error executing tool get_clickables: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:👁️ OBSERVATION: Error: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:Token usage before API call: {'prompt_tokens': 4652, 'completion_tokens': 1051, 'total_tokens': 5703, 'api_calls': 5}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #6
INFO:droidrun:This call: 1156 prompt + 980 completion = 2136 tokens
INFO:droidrun:Cumulative: 5808 prompt + 2031 completion = 7839 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 5808, 'completion_tokens': 2031, 'total_tokens': 7839, 'api_calls': 6}
INFO:droidrun:🤔 THOUGHT: The `get_clickables` tool is consistently failing, preventing me from identifying the search bar in Chrome. However, when Chrome starts, the address bar is often focused by default. I will attempt to directly input the search query 'droidrun' assuming the address bar is already focused.
INFO:droidrun:🔄 ACTION: input_text(text=droidrun)
INFO:droidrun:👁️ OBSERVATION: Text input completed: droidrun
INFO:droidrun:Token usage before API call: {'prompt_tokens': 5808, 'completion_tokens': 2031, 'total_tokens': 7839, 'api_calls': 6}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #7
INFO:droidrun:This call: 1241 prompt + 277 completion = 1518 tokens
INFO:droidrun:Cumulative: 7049 prompt + 2308 completion = 9357 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 7049, 'completion_tokens': 2308, 'total_tokens': 9357, 'api_calls': 7}
INFO:droidrun:🤔 THOUGHT: I have successfully inputted the search query 'droidrun' into what I assume is the address/search bar in Chrome. The next logical step to perform the search is to press the Enter key. The keycode for Enter is 66. I will use the `press_key` tool to simulate this action.
INFO:droidrun:🔄 ACTION: press_key(keycode=66)
INFO:droidrun:👁️ OBSERVATION: Pressed key 66
INFO:droidrun:Token usage before API call: {'prompt_tokens': 7049, 'completion_tokens': 2308, 'total_tokens': 9357, 'api_calls': 7}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #8
INFO:droidrun:This call: 1331 prompt + 300 completion = 1631 tokens
INFO:droidrun:Cumulative: 8380 prompt + 2608 completion = 10988 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 8380, 'completion_tokens': 2608, 'total_tokens': 10988, 'api_calls': 8}
INFO:droidrun:🤔 THOUGHT: The sequence of actions indicates that Chrome was opened, the search term 'droidrun' was entered, and the Enter key was pressed. This completes the user's request to open Chrome and search for 'droidrun'. The search results page should now be displayed in Chrome. Therefore, the goal has been achieved.
INFO:droidrun:🔄 ACTION: complete(result=Opened Chrome and initiated a search for 'droidrun'.)
INFO:droidrun:👁️ OBSERVATION: Task completed: Opened Chrome and initiated a search for 'droidrun'.
INFO:droidrun:👁️ OBSERVATION: Goal achieved in 8 steps.===== Final Token Usage and Cost =====
Total Tokens Used: 10,988
Total API Calls: 8
Estimated Cost: $0.0011
===================================Summary: Task completed: Opened Chrome and initiated a search for 'droidrun'.
Execution completed with 26 steps

更多使用文档参考：https://www.droidrun.ai/