Computer Use (Beta)

Beta Feature Notice

The upgraded Claude 3.5 Sonnet model is capable of interacting with tools that can manipulate a computer desktop environment through specialized tools that control mouse, keyboard, and take screenshots.

Safety Considerations

Computer use poses unique risks distinct from standard API features. Consider these key precautions:

  • Use a dedicated virtual machine or container with minimal privileges
  • Avoid exposing sensitive data or login information
  • Limit internet access to an allowlist of domains
  • Require human confirmation for meaningful real-world consequences and tasks requiring affirmative consent (cookies, financial transactions, terms of service)
  • Be aware that Claude may follow commands found in content even if it conflicts with user instructions
  • Take precautions to isolate Claude from sensitive data and actions to avoid risks related to prompt injection
  • Inform end users of relevant risks and obtain consent before enabling computer use in products

How Computer Use Works

  1. Provide Tools and Prompt

    Add Anthropic-defined computer use tools to your API request and include a user prompt that might require these tools.

    { "messages": [ { "role": "user", "content": "Save a picture of a cat to my desktop" } ], "tools": [ { "type": "computer_20241022", "name": "computer" } ] }
  2. Tool Selection

    Claude evaluates the tools and constructs a properly formatted tool use request when needed. The API responds with a stop_reason of tool_use.

  3. Tool Execution

    Your application extracts the tool request, executes it in a controlled environment, and returns the results to Claude.

  4. Completion Loop

    Claude continues using tools as needed until the task is complete, forming an "agent loop" of tool use and evaluation.

Getting Started

Reference Implementation

We provide a complete reference implementation including:

  • Containerized environment
  • Tool implementations
  • Agent loop implementation
  • Web interface

Try our reference implementation before diving into the documentation.

Optimization Tips

  • Specify simple, well-defined tasks and provide explicit instructions for each step
  • Prompt Claude to verify actions with screenshots: "After each step, take a screenshot and carefully evaluate if you have achieved the right outcome. Explicitly show your thinking."
  • Use keyboard shortcuts for UI elements like dropdowns and scrollbars that may be tricky to manipulate
  • Include example screenshots and tool calls of successful outcomes for repeatable tasks
  • Use system prompt to provide explicit tips for known tasks

Anthropic-defined Tools

These beta tools enable Claude to effectively use computers. Tools are user-executed and require explicit evaluation.

Computer Tool

{ "type": "computer_20241022", "name": "computer" }

For optimal performance, keep screenshots at XGA/WXGA resolution or lower. Higher resolutions may impact model accuracy and performance.

Required Parameters

display_width_px Required

The width of the display in pixels

display_height_px Required

The height of the display in pixels

display_number Optional

Display number for X11 environments

Available Actions

  • key: Press keys (e.g., "Return", "alt+Tab")
  • type: Type text strings
  • mouse_move: Move cursor to coordinates
  • left_click: Click left mouse button
  • right_click: Click right mouse button
  • double_click: Double-click left button
  • screenshot: Capture screen image
  • cursor_position: Get cursor coordinates

Text Editor Tool

{ "type": "text_editor_20241022", "name": "str_replace_editor" }

Available Commands

  • view: Display file contents or directory structure
  • create: Create new file with content
  • str_replace: Replace text in file
  • insert: Insert text at specific line
  • undo_edit: Revert last edit

Usage Notes

  • State persists across command calls
  • Exact string matching required for replacements
  • Non-unique matches will not be replaced
  • Long outputs may be truncated

Bash Tool

{ "type": "bash_20241022", "name": "bash" }

Features

  • Access to common Linux and Python packages
  • Persistent state across commands
  • Background process support
  • No XML escaping required for commands

Avoid commands that produce excessive output or require long execution times. Use background processes for long-running tasks.

Advanced Topics

Current Limitations

While Claude's computer use capabilities are cutting edge, developers should be aware of these key limitations:

Performance Limitations

  • Latency:

    Current interaction speed may be slower than human operation. Best suited for non-time-critical tasks.

  • Computer Vision Accuracy:

    Potential mistakes or hallucinations when interpreting visual elements and coordinates.

  • Tool Selection Reliability:

    May make errors in tool selection or take unexpected actions, especially with niche applications.

Interaction Limitations

  • Scrolling:

    May be unreliable. Use PgUp/PgDown keys as alternative.

  • Spreadsheet Interaction:

    Mouse-based cell selection can be unreliable. Prefer arrow keys.

  • Social Platform Limitations:

    Restricted ability to create accounts or generate content on social platforms.

Security Considerations

  • Vulnerabilities:

    Potential for jailbreaking or prompt injection from webpage content.

  • Content Override:

    May follow commands found in content, potentially conflicting with user instructions.

Building Custom Environments

While our reference implementation provides a starting point, you can build custom environments tailored to your needs.

Required Components

  • Virtualized or containerized environment
  • Implementation of Anthropic-defined tools
  • Agent loop for API interaction
  • User interface for input/output

Best Practices

  • Implement strict security controls
  • Monitor and log all actions
  • Provide clear user feedback
  • Include error handling and recovery

Pricing Information

Computer use requests are priced as standard Claude API requests with additional token considerations.

Base System Prompt Tokens

Model Tool Choice System Prompt Tokens
Claude 3.5 Sonnet auto 466 tokens
Claude 3.5 Sonnet tool 499 tokens

Additional Tool Tokens

Tool Additional Tokens
computer_20241022 683 tokens
text_editor_20241022 700 tokens
bash_20241022 245 tokens