Computer Use (Beta)
Beta Feature Notice
The upgraded Claude 3.5 Sonnet model is capable of interacting with tools that can manipulate a computer desktop environment through specialized tools that control mouse, keyboard, and take screenshots.
Safety Considerations
Computer use poses unique risks distinct from standard API features. Consider these key precautions:
- Use a dedicated virtual machine or container with minimal privileges
- Avoid exposing sensitive data or login information
- Limit internet access to an allowlist of domains
- Require human confirmation for meaningful real-world consequences and tasks requiring affirmative consent (cookies, financial transactions, terms of service)
- Be aware that Claude may follow commands found in content even if it conflicts with user instructions
- Take precautions to isolate Claude from sensitive data and actions to avoid risks related to prompt injection
- Inform end users of relevant risks and obtain consent before enabling computer use in products
How Computer Use Works
-
Provide Tools and Prompt
Add Anthropic-defined computer use tools to your API request and include a user prompt that might require these tools.
{ "messages": [ { "role": "user", "content": "Save a picture of a cat to my desktop" } ], "tools": [ { "type": "computer_20241022", "name": "computer" } ] } -
Tool Selection
Claude evaluates the tools and constructs a properly formatted tool use request when needed. The API responds with a
stop_reasonoftool_use. -
Tool Execution
Your application extracts the tool request, executes it in a controlled environment, and returns the results to Claude.
-
Completion Loop
Claude continues using tools as needed until the task is complete, forming an "agent loop" of tool use and evaluation.
Getting Started
Reference Implementation
We provide a complete reference implementation including:
- Containerized environment
- Tool implementations
- Agent loop implementation
- Web interface
Try our reference implementation before diving into the documentation.
Optimization Tips
- Specify simple, well-defined tasks and provide explicit instructions for each step
- Prompt Claude to verify actions with screenshots: "After each step, take a screenshot and carefully evaluate if you have achieved the right outcome. Explicitly show your thinking."
- Use keyboard shortcuts for UI elements like dropdowns and scrollbars that may be tricky to manipulate
- Include example screenshots and tool calls of successful outcomes for repeatable tasks
- Use system prompt to provide explicit tips for known tasks
Anthropic-defined Tools
These beta tools enable Claude to effectively use computers. Tools are user-executed and require explicit evaluation.
Computer Tool
For optimal performance, keep screenshots at XGA/WXGA resolution or lower. Higher resolutions may impact model accuracy and performance.
Required Parameters
The width of the display in pixels
The height of the display in pixels
Display number for X11 environments
Available Actions
key: Press keys (e.g., "Return", "alt+Tab")type: Type text stringsmouse_move: Move cursor to coordinatesleft_click: Click left mouse buttonright_click: Click right mouse buttondouble_click: Double-click left buttonscreenshot: Capture screen imagecursor_position: Get cursor coordinates
Text Editor Tool
Available Commands
view: Display file contents or directory structurecreate: Create new file with contentstr_replace: Replace text in fileinsert: Insert text at specific lineundo_edit: Revert last edit
Usage Notes
- State persists across command calls
- Exact string matching required for replacements
- Non-unique matches will not be replaced
- Long outputs may be truncated
Bash Tool
Features
- Access to common Linux and Python packages
- Persistent state across commands
- Background process support
- No XML escaping required for commands
Avoid commands that produce excessive output or require long execution times. Use background processes for long-running tasks.
Advanced Topics
Current Limitations
While Claude's computer use capabilities are cutting edge, developers should be aware of these key limitations:
Performance Limitations
-
Latency:
Current interaction speed may be slower than human operation. Best suited for non-time-critical tasks.
-
Computer Vision Accuracy:
Potential mistakes or hallucinations when interpreting visual elements and coordinates.
-
Tool Selection Reliability:
May make errors in tool selection or take unexpected actions, especially with niche applications.
Interaction Limitations
-
Scrolling:
May be unreliable. Use PgUp/PgDown keys as alternative.
-
Spreadsheet Interaction:
Mouse-based cell selection can be unreliable. Prefer arrow keys.
-
Social Platform Limitations:
Restricted ability to create accounts or generate content on social platforms.
Security Considerations
-
Vulnerabilities:
Potential for jailbreaking or prompt injection from webpage content.
-
Content Override:
May follow commands found in content, potentially conflicting with user instructions.
Building Custom Environments
While our reference implementation provides a starting point, you can build custom environments tailored to your needs.
Required Components
- Virtualized or containerized environment
- Implementation of Anthropic-defined tools
- Agent loop for API interaction
- User interface for input/output
Best Practices
- Implement strict security controls
- Monitor and log all actions
- Provide clear user feedback
- Include error handling and recovery
Pricing Information
Computer use requests are priced as standard Claude API requests with additional token considerations.
Base System Prompt Tokens
| Model | Tool Choice | System Prompt Tokens |
|---|---|---|
| Claude 3.5 Sonnet | auto | 466 tokens |
| Claude 3.5 Sonnet | tool | 499 tokens |
Additional Tool Tokens
| Tool | Additional Tokens |
|---|---|
| computer_20241022 | 683 tokens |
| text_editor_20241022 | 700 tokens |
| bash_20241022 | 245 tokens |