Guide

Guide

Guide

Sep 9, 2024

Tereza Tizkova

Tereza Tizkova

Tereza Tizkova


Code Interpreting with Groq and E2B in JavaScript


This AI data analyst can plot a linear regression chart based on CSV data. It uses LLMs powered by Groq, and the Code Interpreter SDK by E2B for the code interpreting capabilities. The SDK quickly creates a secure cloud sandbox powered by Firecracker. Inside this sandbox is a running Jupyter server that the LLM can use.

Read more about models powered by Groq here.

The AI agent performs a data analysis task on an uploaded CSV file, executes the AI-generated code in the sandboxed environment by E2B, and returns a chart, saving it as a PNG file. The code is processing the data in the CSV file, cleaning the data, and performing the assigned analysis, which includes plotting a chart.


Full code


Key links

Outline

  1. Prerequisites

  2. Install the SDKs

  3. Set up the API keys and model instructions

  4. Add code interpreting capabilities and initialize the model

  5. Upload the dataset


1. Prerequisites

Create an index.ts file for the main program, copy the env.template file, and save it to a .gitignore file.

Get the E2B API key here and the Groq API key here.

Download the CSV file from here and upload it to the same directory as your program. Rename it to data.csv.

2. Install the SDKs
npm install @e2b/code-interpreter@0.0.5 groq-sdk@0.5.0 dotenv@16.4.5
3. Set up the API keys and model instructions

In this step you upload your E2B and Groq API keys to the program. In the JS & TS case, the API keys are stored in the .env file, You pick the model of your choice by uncommenting it. There are some recommended models that are great at code generation, but you can add a different one from here.

The model is assigned a data scientist role and explained the uploaded CSV. You can choose different data but need to update the instructions accordingly.

import { CodeInterpreter, Result, ProcessMessage } from '@e2b/code-interpreter'
import { Groq } from 'groq-sdk'
import fs from 'node:fs'
import dotenv from 'dotenv'

dotenv.config()

// Define API keys

// TODO: Get your Groq AI API key from https://console.groq.com/
const GROQ_API_KEY = process.env.GROQ_API_KEY

// TODO: Get your E2B API key from https://e2b.dev/docs
const E2B_API_KEY = process.env.E2B_API_KEY

// Choose the model

const MODEL_NAME = 'llama-3.1-70b-versatile'
// const MODEL_NAME = 'llama-3.1-8b-instant'
// const MODEL_NAME = 'llama3-groq-70b-8192-tool-use-preview'
// const MODEL_NAME = 'llama3-groq-8b-8192-tool-use-preview'
// const MODEL_NAME = 'llama3-70b-8192'
// const MODEL_NAME = 'gemma2-9b-it'

// Provide system prompt
const SYSTEM_PROMPT = `
You're a python data scientist. You are given tasks to complete and you run Python code to solve them.

Information about the csv dataset:
- It's in the \`/home/user/data.csv\` file
- The CSV file is using , as the delimiter
- It has the following columns (examples included):
    - country: "Argentina", "Australia"
    - Region: "SouthAmerica", "Oceania"
    - Surface area (km2): for example, 2780400
    - Population in thousands (2017): for example, 44271
    - Population density (per km2, 2017): for example, 16.2
    - Sex ratio (m per 100 f, 2017): for example, 95.9
    - GDP: Gross domestic product (million current US$): for example, 632343
    - GDP growth rate (annual %, const. 2005 prices): for example, 2.4
    - GDP per capita (current US$): for example, 14564.5
    - Economy: Agriculture (% of GVA): for example, 10.0
    - Economy: Industry (% of GVA): for example, 28.1
    - Economy: Services and other activity (% of GVA): for example, 61.9
    - Employment: Agriculture (% of employed): for example, 4.8
    - Employment: Industry (% of employed): for example, 20.6
    - Employment: Services (% of employed): for example, 74.7
    - Unemployment (% of labour force): for example, 8.5
    - Employment: Female (% of employed): for example, 43.7
    - Employment: Male (% of employed): for example, 56.3
    - Labour force participation (female %): for example, 48.5
    - Labour force participation (male %): for example, 71.1
    - International trade: Imports (million US$): for example, 59253
    - International trade: Exports (million US$): for example, 57802
    - International trade: Balance (million US$): for example, -1451
    - Education: Government expenditure (% of GDP): for example, 5.3
    - Health: Total expenditure (% of GDP): for example, 8.1
    - Health: Government expenditure (% of total health expenditure): for example, 69.2
    - Health: Private expenditure (% of total health expenditure): for example, 30.8
    - Health: Out-of-pocket expenditure (% of total health expenditure): for example, 20.2
    - Health: External health expenditure (% of total health expenditure): for example, 0.2
    - Education: Primary gross enrollment ratio (f/m per 100 pop): for example, 111.5/107.6
    - Education: Secondary gross enrollment ratio (f/m per 100 pop): for example, 104.7/98.9
    - Education: Tertiary gross enrollment ratio (f/m per 100 pop): for example, 90.5/72.3
    - Education: Mean years of schooling (female): for example, 10.4
    - Education: Mean years of schooling (male): for example, 9.7
    - Urban population (% of total population): for example, 91.7
    - Population growth rate (annual %): for example, 0.9
    - Fertility rate (births per woman): for example, 2.3
    - Infant mortality rate (per 1,000 live births): for example, 8.9
    - Life expectancy at birth, female (years): for example, 79.7
    - Life expectancy at birth, male (years): for example, 72.9
    - Life expectancy at birth, total (years): for example, 76.4
    - Military expenditure (% of GDP): for example, 0.9
    - Population, female: for example, 22572521
    - Population, male: for example, 21472290
    - Tax revenue (% of GDP): for example, 11.0
    - Taxes on income, profits and capital gains (% of revenue): for example, 12.9
    - Urban population (% of total population): for example, 91.7

Generally, you follow these rules:
- ALWAYS FORMAT YOUR RESPONSE IN MARKDOWN
- ALWAYS RESPOND ONLY WITH CODE IN CODE BLOCK LIKE THIS:
\`\`\`python
{code}
\`\`\`
- the Python code runs in jupyter notebook.
- every time you generate Python, the code is executed in a separate cell. it's okay to make multiple calls to \`execute_python\`.
- display visualizations using matplotlib or any other visualization library directly in the notebook. don't worry about saving the visualizations to a file.
- you have access to the internet and can make api requests.
- you also have access to the filesystem and can read/write files.
- you can install any pip package (if it exists) if you need to be running \`!pip install {package}\`. The usual packages for data analysis are already preinstalled though.
- you can run any Python code you want, everything is running in a secure sandbox environment
`
4. Add code interpreting capabilities and initialize the model

Now we define the function that will use the code interpreter by E2B. Every time the LLM assistant decides that it needs to execute code, this function will be used. Read more about the Code Interpreter SDK here.

We also initialize the Groq client. The function for matching code blocks is important because we need to pick the right part of the output that contains the code produced by the LLM. The chat function takes care of the interaction with the LLM. It calls the E2B code interpreter anytime there is a code to be run.

const client = new Groq({ apiKey: GROQ_API_KEY })

async function codeInterpret(codeInterpreter: CodeInterpreter, code: string): Promise<Result[]> {
  console.log('Running code interpreter...')

  const exec = await codeInterpreter.notebook.execCell(code, {
      onStderr: (msg: ProcessMessage) => console.log('[Code Interpreter stderr]', msg),
      onStdout: (stdout: ProcessMessage) => console.log('[Code Interpreter stdout]', stdout)
  })

  if (exec.error) {
      console.error('[Code Interpreter ERROR]', exec.error)
      throw new Error(exec.error.value)
  }

  return exec.results
}


async function chat(codeInterpreter: CodeInterpreter, userMessage: string): Promise<Result[]> {
  console.log(`\n${'='.repeat(50)}\nUser Message: ${userMessage}\n${'='.repeat(50)}`)

  const messages = [
      { role: 'system', content: SYSTEM_PROMPT },
      { role: 'user', content: userMessage }
  ]

  try {
        const response = await client.chat.completions.create({
        model: MODEL_NAME,
        messages,
        // tools,
        max_tokens: 4096,
       })

      const responseMessage = response.choices[0].message.content
      const codeBlockMatch = responseMessage ? responseMessage.match(/```python\n([\s\S]*?)\n```/) : null;

      if (codeBlockMatch && codeBlockMatch[1]) {
          const pythonCode = codeBlockMatch[1]
          console.log('CODE TO RUN')
          console.log(pythonCode)
          const codeInterpreterResults = await codeInterpret(codeInterpreter, pythonCode)
          return codeInterpreterResults
      } else {
          console.error('Failed to match any Python code in model\'s response')
          return []
      }
  } catch (error) {
      console.error('Error during API call:', error)
      throw error
  }
}
5. Upload the dataset

The CSV data is uploaded programmatically, not via AI-generated code. The code interpreter by E2B runs inside the E2B sandbox. Read more about the file upload here.

async function uploadDataset(codeInterpreter: CodeInterpreter): Promise<string> {
  console.log('Uploading dataset to Code Interpreter sandbox...')
  const datasetPath = './data.csv'

  if (!fs.existsSync(datasetPath)) {
      throw new Error('Dataset file not found')
  }

  const fileBuffer = fs.readFileSync(datasetPath)

  try {
      const remotePath = await codeInterpreter.uploadFile(fileBuffer, 'data.csv')
      if (!remotePath) {
          throw new Error('Failed to upload dataset')
      }
      console.log('Uploaded at', remotePath)
      return remotePath
  } catch (error) {
      console.error('Error during file upload:', error)
      throw error
  }
}
6. Put everything together

Finally, we put everything together and let the AI assistant upload the data, run an analysis, and generate a PNG file with a chart. You can update the task for the assistant in this step. If you decide to change the CSV file you are using, don't forget to update the prompt too.

async function run() {
  const codeInterpreter = await CodeInterpreter.create({apiKey: E2B_API_KEY})

  try {
      const remotePath = await uploadDataset(codeInterpreter)
      console.log('Remote path of the uploaded dataset:', remotePath)

      const codeInterpreterResults = await chat(
          codeInterpreter,
          // Task for the model
          'Make a chart showing linear regression of the relationship between GDP per capita and life expectancy from the data. Filter out any missing values or values in wrong format.'
      )
      console.log('codeInterpreterResults:', codeInterpreterResults)

      const result = codeInterpreterResults[0]
      console.log('Result object:', result)

      if (result && result.png) {
          fs.writeFileSync('image_1.png', Buffer.from(result.png, 'base64'))
          console.log('Success: Image generated and saved as image_1.png')
      } else {
          console.error('Error: No PNG data available.')
      }

  } catch (error) {
      console.error('An error occurred:', error)
  } finally {
      await codeInterpreter.close()
  }
}

run()
7. Run the program and see the results

The file is generated within the notebook. The plot shows the linear regression of the relationship between GDP per capita and life expectancy from the CSV data.

> Groq-code-interpreter@1.0.0 start
> tsx index.ts

(node:21539) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
Uploading dataset to Code Interpreter sandbox...
Uploaded at /home/user/data.csv
Remote path of the uploaded dataset: /home/user/data.csv

==================================================
User Message: Make a chart showing linear regression of the relationship between GDP per capita and life expectancy from the data. Filter out any missing values or values in wrong format.
==================================================
CODE TO RUN
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Load the data
data = pd.read_csv('/home/user/data.csv', delimiter=',')

# Filter out missing values or values in wrong format
data = data.dropna(subset=['GDP per capita (current US$)', 'Life expectancy at birth, total (years)'])

# Convert columns to numeric
data['GDP per capita (current US$)'] = pd.to_numeric(data['GDP per capita (current US$)'], errors='coerce')
data['Life expectancy at birth, total (years)'] = pd.to_numeric(data['Life expectancy at birth, total (years)'], errors='coerce')

# Filter out any remaining non-numeric values
data = data.dropna(subset=['GDP per capita (current US$)', 'Life expectancy at birth, total (years)'])

# Fit linear regression model
X = data['GDP per capita (current US$)'].values.reshape(-1, 1)
y = data['Life expectancy at birth, total (years)'].values.reshape(-1, 1)
model = LinearRegression().fit(X, y)

# Plot the data and the regression line
plt.scatter(X, y, color='blue')
plt.plot(X, model.predict(X), color='red')
plt.xlabel('GDP per capita (current US$)')
plt.ylabel('Life expectancy at birth, total (years)')
plt.show()
Running code interpreter...
codeInterpreterResults: [
...
...
...
Success: Image generated and saved as


Full code


Resources

©2024 FoundryLabs, Inc. All rights reserved.

©2024 FoundryLabs, Inc. All rights reserved.

©2024 FoundryLabs, Inc. All rights reserved.