Computer Use (Beta)

Beta Feature Notice

The upgraded Claude 3.5 Sonnet model is capable of interacting with tools that can manipulate a computer desktop environment through specialized tools that control mouse, keyboard, and take screenshots.

Safety Considerations

Computer use poses unique risks distinct from standard API features. Consider these key precautions:

Use a dedicated virtual machine or container with minimal privileges
Avoid exposing sensitive data or login information
Limit internet access to an allowlist of domains
Require human confirmation for meaningful real-world consequences and tasks requiring affirmative consent (cookies, financial transactions, terms of service)
Be aware that Claude may follow commands found in content even if it conflicts with user instructions
Take precautions to isolate Claude from sensitive data and actions to avoid risks related to prompt injection
Inform end users of relevant risks and obtain consent before enabling computer use in products

How Computer Use Works

Provide Tools and Prompt

Add Anthropic-defined computer use tools to your API request and include a user prompt that might require these tools.

{ "messages": [ { "role": "user", "content": "Save a picture of a cat to my desktop" } ], "tools": [ { "type": "computer_20241022", "name": "computer" } ] }
Tool Selection

Claude evaluates the tools and constructs a properly formatted tool use request when needed. The API responds with a stop_reason of tool_use.
Tool Execution

Your application extracts the tool request, executes it in a controlled environment, and returns the results to Claude.
Completion Loop

Claude continues using tools as needed until the task is complete, forming an "agent loop" of tool use and evaluation.

Getting Started

Reference Implementation

We provide a complete reference implementation including:

Containerized environment
Tool implementations
Agent loop implementation
Web interface

Try our reference implementation before diving into the documentation.

Optimization Tips

Specify simple, well-defined tasks and provide explicit instructions for each step
Prompt Claude to verify actions with screenshots: "After each step, take a screenshot and carefully evaluate if you have achieved the right outcome. Explicitly show your thinking."
Use keyboard shortcuts for UI elements like dropdowns and scrollbars that may be tricky to manipulate
Include example screenshots and tool calls of successful outcomes for repeatable tasks
Use system prompt to provide explicit tips for known tasks

Anthropic-defined Tools

These beta tools enable Claude to effectively use computers. Tools are user-executed and require explicit evaluation.

Computer Tool

{ "type": "computer_20241022", "name": "computer" }

For optimal performance, keep screenshots at XGA/WXGA resolution or lower. Higher resolutions may impact model accuracy and performance.

Required Parameters

display_width_px Required

The width of the display in pixels

display_height_px Required

The height of the display in pixels

display_number Optional

Display number for X11 environments

Available Actions

key: Press keys (e.g., "Return", "alt+Tab")
type: Type text strings
mouse_move: Move cursor to coordinates
left_click: Click left mouse button
right_click: Click right mouse button
double_click: Double-click left button
screenshot: Capture screen image
cursor_position: Get cursor coordinates

Text Editor Tool

{ "type": "text_editor_20241022", "name": "str_replace_editor" }

Available Commands

view: Display file contents or directory structure
create: Create new file with content
str_replace: Replace text in file
insert: Insert text at specific line
undo_edit: Revert last edit

Usage Notes

State persists across command calls
Exact string matching required for replacements
Non-unique matches will not be replaced
Long outputs may be truncated

Bash Tool

{ "type": "bash_20241022", "name": "bash" }

Features

Access to common Linux and Python packages
Persistent state across commands
Background process support
No XML escaping required for commands

Avoid commands that produce excessive output or require long execution times. Use background processes for long-running tasks.

Advanced Topics

Current Limitations

While Claude's computer use capabilities are cutting edge, developers should be aware of these key limitations:

Performance Limitations

Latency:
Current interaction speed may be slower than human operation. Best suited for non-time-critical tasks.
Computer Vision Accuracy:
Potential mistakes or hallucinations when interpreting visual elements and coordinates.
Tool Selection Reliability:
May make errors in tool selection or take unexpected actions, especially with niche applications.

Interaction Limitations

Scrolling:
May be unreliable. Use PgUp/PgDown keys as alternative.
Spreadsheet Interaction:
Mouse-based cell selection can be unreliable. Prefer arrow keys.
Social Platform Limitations:
Restricted ability to create accounts or generate content on social platforms.

Security Considerations

Vulnerabilities:
Potential for jailbreaking or prompt injection from webpage content.
Content Override:
May follow commands found in content, potentially conflicting with user instructions.

Building Custom Environments

While our reference implementation provides a starting point, you can build custom environments tailored to your needs.

Required Components

Virtualized or containerized environment
Implementation of Anthropic-defined tools
Agent loop for API interaction
User interface for input/output

Best Practices

Implement strict security controls
Monitor and log all actions
Provide clear user feedback
Include error handling and recovery

Pricing Information

Computer use requests are priced as standard Claude API requests with additional token considerations.

Base System Prompt Tokens

Model	Tool Choice	System Prompt Tokens
Claude 3.5 Sonnet	auto	466 tokens
Claude 3.5 Sonnet	tool	499 tokens

Additional Tool Tokens

Tool	Additional Tokens
computer_20241022	683 tokens
text_editor_20241022	700 tokens
bash_20241022	245 tokens