Skip to main content
Computer use agents interact with graphical desktops the same way a human would — viewing the screen, clicking, typing, and scrolling. E2B provides the sandboxed desktop environment where these agents operate safely, with VNC streaming for real-time visual feedback. For a complete working implementation, see E2B Surf — an open-source computer use agent you can try via the live demo.

How It Works

The computer use agent loop follows this pattern:
  1. User sends a command — e.g., “Open Firefox and search for AI news”
  2. Agent creates a desktop sandbox — an Ubuntu 22.04 environment with XFCE desktop and pre-installed applications
  3. Agent takes a screenshot — captures the current desktop state via E2B Desktop SDK
  4. LLM analyzes the screenshot — a vision model (e.g., OpenAI Computer Use API) decides what action to take
  5. Action is executed — click, type, scroll, or keypress via E2B Desktop SDK
  6. Repeat — new screenshot is taken and sent back to the LLM until the task is complete

Install the E2B Desktop SDK

The E2B Desktop SDK gives your agent a full Linux desktop with mouse, keyboard, and screen capture APIs.
npm i @e2b/desktop

Core Implementation

The following snippets are adapted from E2B Surf.

Setting up the sandbox

Create a desktop sandbox and start VNC streaming so you can view the desktop in a browser.
import { Sandbox } from '@e2b/desktop'

// Create a desktop sandbox with a 5-minute timeout
const sandbox = await Sandbox.create({
  resolution: [1024, 720],
  dpi: 96,
  timeoutMs: 300_000,
})

// Start VNC streaming for browser-based viewing
await sandbox.stream.start()
const streamUrl = sandbox.stream.getUrl()
console.log('View desktop at:', streamUrl)

Executing desktop actions

The E2B Desktop SDK maps directly to mouse and keyboard actions. Here’s how Surf translates LLM-returned actions into desktop interactions.
import { Sandbox } from '@e2b/desktop'

const sandbox = await Sandbox.create({ timeoutMs: 300_000 })

// Mouse actions
await sandbox.leftClick(500, 300)
await sandbox.rightClick(500, 300)
await sandbox.doubleClick(500, 300)
await sandbox.middleClick(500, 300)
await sandbox.moveMouse(500, 300)
await sandbox.drag([100, 200], [400, 500])

// Keyboard actions
await sandbox.write('Hello, world!')  // Type text
await sandbox.press('Enter')          // Press a key

// Scrolling
await sandbox.scroll('down', 3)  // Scroll down 3 ticks
await sandbox.scroll('up', 3)    // Scroll up 3 ticks

// Screenshots
const screenshot = await sandbox.screenshot()  // Returns Buffer

// Run terminal commands
await sandbox.commands.run('ls -la /home')

Agent loop

The core loop takes screenshots, sends them to an LLM, and executes the returned actions on the desktop. This is a simplified version of how Surf drives the computer use cycle.
import { Sandbox } from '@e2b/desktop'

const sandbox = await Sandbox.create({
  resolution: [1024, 720],
  timeoutMs: 300_000,
})
await sandbox.stream.start()

while (true) {
  // 1. Capture the current desktop state
  const screenshot = await sandbox.screenshot()

  // 2. Send screenshot to your LLM and get the next action
  //    (use OpenAI Computer Use, Anthropic Claude, etc.)
  const action = await getNextActionFromLLM(screenshot)

  if (!action) break // LLM signals task is complete

  // 3. Execute the action on the desktop
  switch (action.type) {
    case 'click':
      await sandbox.leftClick(action.x, action.y)
      break
    case 'type':
      await sandbox.write(action.text)
      break
    case 'keypress':
      await sandbox.press(action.keys)
      break
    case 'scroll':
      await sandbox.scroll(
        action.scrollY < 0 ? 'up' : 'down',
        Math.abs(action.scrollY)
      )
      break
    case 'drag':
      await sandbox.drag(
        [action.startX, action.startY],
        [action.endX, action.endY]
      )
      break
  }
}

await sandbox.kill()
The getNextActionFromLLM / get_next_action_from_llm function is where you integrate your chosen LLM. See Connect LLMs to E2B for integration patterns with OpenAI, Anthropic, and other providers.