引言
DroidRun 是一个强大的框架,用于通过 LLM 代理控制 Android 设备。它允许您使用自然语言命令自动化 Android 设备交互。
特点
-
使用自然语言命令控制 Android 设备
-
支持多个 LLM 提供商(OpenAI、Anthropic、Gemini)
-
易于使用的 CLI
-
用于自定义自动化的可扩展 Python API
-
屏幕截图分析,直观了解设备
开始使用
前提条件
-
系统已安装adb工具
-
通过 USB 或 ADB 通过 TCP/IP 连接 Android 设备或模拟器
-
Android或模拟器上已安装DroidRun Portal 应用程序
-
至少一个受支持的 LLM 提供程序的 API 密钥(OpenAI、Anthropic、Gemini)
新建项目
打开Pycharm新建项目droidrun-demo
设置环境
安装依赖
在Pycharm里打开终端输入命令:
pip install droidrun
安装DroidRun Portal
下载link:https://github.com/droidrun/droidrun-portal
下载完成后使用adb命令安装
adb install /Users/wan/Downloads/App/droidrun.apk
设置API key
在droidrun-demo目录打开终端输入命令
touch .env
我用的是Gemini,配置如下:
# Choose at least one of these based on your preferred provider
export GEMINI_API_KEY="***"
然后保存,输入命令
source .env
编写脚本
# coding=utf-8
import asyncio
import os
from droidrun.agent.react_agent import ReActAgent
from droidrun.agent.llm_reasoning import LLMReasoner
from dotenv import load_dotenv# Load environment variables from .env file
load_dotenv()async def main():# Create an LLM instancellm = LLMReasoner(llm_provider="gemini", # Can be "openai", "anthropic", or "gemini"model_name="gemini-2.5-pro-preview-03-25", # Choose appropriate model for your providerapi_key=os.environ.get("GEMINI_API_KEY"), # Get API key from environmenttemperature=0.2,vision=True # Enable vision capabilities)# Create and run the agentagent = ReActAgent(task="打开Chrome,搜索droidrun",llm=llm,vision=True # Enable screenshot analysis)steps = await agent.run()print(f"Execution completed with {len(steps)} steps")if __name__ == "__main__":asyncio.run(main())
运行脚本
输出结果
INFO:droidrun:Vision capabilities enabled: screenshot tool available
INFO:droidrun:Using provided LLM reasoner: provider=gemini, model=gemini-2.5-pro-preview-03-25
INFO:droidrun:Using first available device: emulator-5554
INFO:droidrun:Connected to device: emulator-5554
INFO:droidrun:🎯 GOAL: 打开Chrome,搜索droidrun
INFO:droidrun:Token usage before API call: {'prompt_tokens': 0, 'completion_tokens': 0, 'total_tokens': 0, 'api_calls': 0}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #1
INFO:droidrun:This call: 713 prompt + 225 completion = 938 tokens
INFO:droidrun:Cumulative: 713 prompt + 225 completion = 938 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 713, 'completion_tokens': 225, 'total_tokens': 938, 'api_calls': 1}
INFO:droidrun:🤔 THOUGHT: I need to open the Chrome browser first. I will check the clickable elements on the current screen to see if the Chrome icon is present.
INFO:droidrun:🔄 ACTION: get_clickables()
ERROR:droidrun:Error executing tool get_clickables: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:👁️ OBSERVATION: Error: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:Token usage before API call: {'prompt_tokens': 713, 'completion_tokens': 225, 'total_tokens': 938, 'api_calls': 1}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #2
INFO:droidrun:This call: 785 prompt + 174 completion = 959 tokens
INFO:droidrun:Cumulative: 1498 prompt + 399 completion = 1897 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 1498, 'completion_tokens': 399, 'total_tokens': 1897, 'api_calls': 2}
INFO:droidrun:🤔 THOUGHT: The previous attempt to get clickable elements failed. This could be due to various reasons, possibly a temporary issue with the UI automation framework or the device state. To get a better understanding of the current screen and confirm if Chrome is visible, I will take a screenshot.
INFO:droidrun:🔄 ACTION: take_screenshot()
INFO:droidrun:Screenshot compressed successfully: 588.2KB → 56.8KB (90.3% reduction)
INFO:droidrun:👁️ OBSERVATION: Screenshot captured and available for analysis
INFO:droidrun:Token usage before API call: {'prompt_tokens': 1498, 'completion_tokens': 399, 'total_tokens': 1897, 'api_calls': 2}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #3
INFO:droidrun:This call: 1139 prompt + 196 completion = 1335 tokens
INFO:droidrun:Cumulative: 2637 prompt + 595 completion = 3232 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 2637, 'completion_tokens': 595, 'total_tokens': 3232, 'api_calls': 3}
INFO:droidrun:🤔 THOUGHT: The screenshot confirms that the Chrome icon is visible on the home screen dock. I need to tap it to open the browser. I will attempt to get the clickable elements again to find the index for the Chrome icon.
INFO:droidrun:🔄 ACTION: get_clickables()
ERROR:droidrun:Error executing tool get_clickables: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:👁️ OBSERVATION: Error: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:Token usage before API call: {'prompt_tokens': 2637, 'completion_tokens': 595, 'total_tokens': 3232, 'api_calls': 3}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #4
INFO:droidrun:This call: 947 prompt + 312 completion = 1259 tokens
INFO:droidrun:Cumulative: 3584 prompt + 907 completion = 4491 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 3584, 'completion_tokens': 907, 'total_tokens': 4491, 'api_calls': 4}
INFO:droidrun:🤔 THOUGHT: The `get_clickables` tool is consistently failing, preventing me from identifying the Chrome icon's index for tapping. However, I know the goal is to open Chrome. A more robust way to achieve this, especially when UI interaction tools are unreliable, is to start the app directly using its package name. The package name for Google Chrome is 'com.android.chrome'. I will use the `start_app` tool to launch Chrome.
INFO:droidrun:🔄 ACTION: start_app(package=com.android.chrome)
INFO:droidrun:👁️ OBSERVATION: Started com.android.chrome
INFO:droidrun:Token usage before API call: {'prompt_tokens': 3584, 'completion_tokens': 907, 'total_tokens': 4491, 'api_calls': 4}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #5
INFO:droidrun:This call: 1068 prompt + 144 completion = 1212 tokens
INFO:droidrun:Cumulative: 4652 prompt + 1051 completion = 5703 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 4652, 'completion_tokens': 1051, 'total_tokens': 5703, 'api_calls': 5}
INFO:droidrun:🤔 THOUGHT: Chrome has been successfully launched. Now I need to find the search/address bar to input the search query 'droidrun'. I will use `get_clickables` to identify the elements on the current Chrome screen.
INFO:droidrun:🔄 ACTION: get_clickables()
ERROR:droidrun:Error executing tool get_clickables: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:👁️ OBSERVATION: Error: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:Token usage before API call: {'prompt_tokens': 4652, 'completion_tokens': 1051, 'total_tokens': 5703, 'api_calls': 5}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #6
INFO:droidrun:This call: 1156 prompt + 980 completion = 2136 tokens
INFO:droidrun:Cumulative: 5808 prompt + 2031 completion = 7839 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 5808, 'completion_tokens': 2031, 'total_tokens': 7839, 'api_calls': 6}
INFO:droidrun:🤔 THOUGHT: The `get_clickables` tool is consistently failing, preventing me from identifying the search bar in Chrome. However, when Chrome starts, the address bar is often focused by default. I will attempt to directly input the search query 'droidrun' assuming the address bar is already focused.
INFO:droidrun:🔄 ACTION: input_text(text=droidrun)
INFO:droidrun:👁️ OBSERVATION: Text input completed: droidrun
INFO:droidrun:Token usage before API call: {'prompt_tokens': 5808, 'completion_tokens': 2031, 'total_tokens': 7839, 'api_calls': 6}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #7
INFO:droidrun:This call: 1241 prompt + 277 completion = 1518 tokens
INFO:droidrun:Cumulative: 7049 prompt + 2308 completion = 9357 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 7049, 'completion_tokens': 2308, 'total_tokens': 9357, 'api_calls': 7}
INFO:droidrun:🤔 THOUGHT: I have successfully inputted the search query 'droidrun' into what I assume is the address/search bar in Chrome. The next logical step to perform the search is to press the Enter key. The keycode for Enter is 66. I will use the `press_key` tool to simulate this action.
INFO:droidrun:🔄 ACTION: press_key(keycode=66)
INFO:droidrun:👁️ OBSERVATION: Pressed key 66
INFO:droidrun:Token usage before API call: {'prompt_tokens': 7049, 'completion_tokens': 2308, 'total_tokens': 9357, 'api_calls': 7}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #8
INFO:droidrun:This call: 1331 prompt + 300 completion = 1631 tokens
INFO:droidrun:Cumulative: 8380 prompt + 2608 completion = 10988 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 8380, 'completion_tokens': 2608, 'total_tokens': 10988, 'api_calls': 8}
INFO:droidrun:🤔 THOUGHT: The sequence of actions indicates that Chrome was opened, the search term 'droidrun' was entered, and the Enter key was pressed. This completes the user's request to open Chrome and search for 'droidrun'. The search results page should now be displayed in Chrome. Therefore, the goal has been achieved.
INFO:droidrun:🔄 ACTION: complete(result=Opened Chrome and initiated a search for 'droidrun'.)
INFO:droidrun:👁️ OBSERVATION: Task completed: Opened Chrome and initiated a search for 'droidrun'.
INFO:droidrun:👁️ OBSERVATION: Goal achieved in 8 steps.===== Final Token Usage and Cost =====
Total Tokens Used: 10,988
Total API Calls: 8
Estimated Cost: $0.0011
===================================Summary: Task completed: Opened Chrome and initiated a search for 'droidrun'.
Execution completed with 26 steps
更多使用文档参考:https://www.droidrun.ai/
总结
体验了一下droidrun,感觉一言难尽,相比传统 App UI自动化来说:
1.执行测试的速度很慢,上面的脚本也才2个步骤
2.执行1次测试调用API会产生费用,Estimated Cost: $0.0011,而传统UI自动化没有这种成本,随着脚本的复杂度提高,产生的成本也不便宜
3.从测试效果比较也不如传统UI自动化测试
不知道其他LLM(OpenAI、Anthropic)的效果如何,支不支持DeepSeek,感兴趣的同学可以自行测试效果!