Runtime: Executing and Tracking LLM Instructions ¶

The Runtime subsystem is at the heart of Conatus. If LLMs are able to give us instructions (e.g. code snippets or tool calls), the Runtime is able to execute those instructions, track the state of the execution, store metadata that can be passed back to the LLM to allow for better reasoning, and more. At the end of the day, it is essential to create Playbooks to replay a Task without needing a LLM.

Overview ¶

At a high level, the Runtime subsystem consists of several key components:

Variables and State:
- RuntimeVariable wraps each value, storing its history (based on its string representation) and type.
- RuntimeState holds the current set of variables and the history of execution steps.
- RuntimeStateStep represents one step in the execution, which may include several instructions.

Instructions: The system records every instruction the LLM provides, using two formats:
- JSON instructions (tool calls) are captured by RuntimeJSONInstruction . These instructions come with structured arguments and return types.
- Code snippets are captured by RuntimePythonInstruction . They allow the LLM to express more complex ideas, and to execute arbitrary-ish Python code.

Actions and Execution: Actions are executed by the Runtime. Their input parameters and output types are validated, and only those actions whose required variables are available are exposed to the LLM.

Why use `Runtime`? ¶

Dual Instruction Types: LLMs can issue instructions in two formats. We want either to be executed smoothly and consistently.

Metadata capture: We want to capture all the metadata associated with the execution of an instruction. This includes the stdout/stderr output, the variables that were changed, and more. This enables us to communicate that information to AI models to enable better reasoning.

Passing by Reference: Unlike traditional AI agent frameworks that only support JSON-serializable data, Conatus enables AI models to refer to RuntimeVariables by using the syntax <<var:{name}>>. This makes it possible to pass complex objects (like pandas DataFrames or Playwright browsers) directly to actions. ¹

Error minimization: The Runtime pulls all the stops to minimize avoidable errors. For example, it makes sure that AI models can only execute tool calls on actions that can actually be executed given the available variables, and it ensures that passing by reference only happens when possible.

Replay-ability: Every instruction can be converted to Python code, and is the basis of replay-able tasks.

Initializing the `Runtime` ¶

You initialize the Runtime with starting variables and a set of actions. The Runtime supports multiple input formats for the starting variables (singletons, lists, or dictionaries) and automatically normalizes them.

Initializing with a dictionaryInitializing with a list

from conatus.runtime.runtime import Runtime

# Define starting variables (can be a dict, list, or singleton)
starting_vars = {"ten": 10, "ten_as_string": "10"}

# Define actions
def add(a: int, b: int) -> int:
    return a + b

# Initialize the runtime with actions and starting variables.
runtime = Runtime(starting_variables=starting_vars, actions=[add])

print([(v.name, v.value, v.type_hint) for v in runtime.variables.values()])
# > [('ten', 10, <class 'int'>), ('ten_as_string', '10', <class 'str'>)]

from conatus.runtime.runtime import Runtime

# Define starting variables (can be a dict, list, or singleton)
starting_vars = [9009, "10"]

# Define actions
def add(a: int, b: int) -> int:
    return a + b

# Initialize the runtime with actions and starting variables.
runtime = Runtime(starting_variables=starting_vars, actions=[add])

# The variables are named automatically, based on their type.
print([(v.name, v.value, v.type_hint) for v in runtime.variables.values()])
# > [('int_0', 9009, <class 'int'>), ('str_0', '10', <class 'str'>)]

You can also pass a TaskConfig to the Runtime constructor. For now, this is only used to configure the maximum length of the string representation of a variable.

One other thing you can do is pass an expected_input_types argument. This is a dictionary that maps variable names to their expected types. This is used to validate the types of the starting variables, as well as to infer the names of the variables if you pass a list or a singleton.

from collections import OrderedDict

from conatus.runtime.runtime import Runtime

# Define the expected input types.
# Useful for something like:
# def my_task(number: int, string: str) -> ...
expected_input_types = OrderedDict({"number": int, "string": str})

# And now we want to pass a list of starting variables.
starting_vars = [9009, "hey ya"]

# Initialize the runtime with actions and starting variables.
runtime = Runtime(
    starting_variables=starting_vars,
    expected_input_types=expected_input_types,
)

# The variables are named automatically, based on their type.
print([(v.name, v.value, v.type_hint) for v in runtime.variables.values()])
# > [('number', 9009, <class 'int'>), ('string', 'hey ya', <class 'str'>)]

Executing Instructions ¶

Execution is performed through the Runtime.run method. This method takes two lists:

Code Snippets: Executed first via RuntimePythonInstruction . All variables in the runtime are passed to Python's exec function, and changes to variables are tracked.

Tool Calls: Executed second via RuntimeJSONInstruction . The runtime uses a compatibility matrix to ensure that only actions with available and type-compatible variables are exposed.

During execution, the runtime captures:

Variable Histories: Only when a variable's string representation (as provided by its RuntimeVariable.value_repr ) changes, a new history entry is recorded. The history is comprised of a tuple of the step number and two representations of the variable: one of text, and one of image. (The latter is especially useful for things like browsers, where you can take screenshots of the current state.)
- By default, we use the repr protocol to get the text representation, and the image is set to None. But you can customize the text representation by providing a custom implementation of the llm_repr protocol. If you're curious about customizing the string representation of your variables, or adding an image representation, check out the documentation for the RuntimeVariable class.
Standard Output and Error: Both stdout and stderr are captured and stored within each instruction's metadata.

Let's see a few examples. Click on the (+) buttons to see comments.

Example 1: Executing instructions and handling variables ¶

This examples shows how:

You can mix instructions: You can pass both code snippets and tool calls to the Runtime.run method. The code snippets are executed first, then the tool calls.
You can handle variables:
- Variables can be automatically generated if they are not named.
- You can use the value of a variable in a tool call by using the syntax <<var:{name}>>. This is useful for passing complex objects (like pandas DataFrames or Playwright browsers) directly to actions.
- You can set the return value of a tool call by using the return keyword. If we determine that that variable name is valid, we will use it. Otherwise, we will automatically generate a name for the variable.

from conatus.runtime.runtime import Runtime
from conatus.models.inputs_outputs.tool_calls import AIToolCall

def echo_twice(text: str) -> str:
    return f"{text} {text}"

# We initialize the runtime without starting variables.
runtime = Runtime(actions=[echo_twice])
# runtime.variables == OrderedDict()

success = runtime.run(
    code_snippets=["bark = 'woof'"], # (2)!
    tool_calls=[
        AIToolCall(
            name="echo_twice",
            returned_arguments={"text": "meow"}
        ), # (3)!
        AIToolCall(
            name="echo_twice",
            returned_arguments={
                "text": "<<var:bark>>",
                "return": "bark_twice"
            }
        ), # (4)!
    ],
)
print([(v.name, v.value) for v in runtime.variables.values()]) # (5)!
# > [('bark', 'woof'), ('str_0', 'meow meow'), ('bark_twice', 'woof woof')]

print(runtime.state.code()) # (6)!
# # Step 0
# bark = 'woof'
# str_0: str = echo_twice(text='meow')
# bark_twice: str = echo_twice(text=bark)

This is an example of an Action.
The code snippet is executed first.
By default, the value of a tool call will be stored in a variable with an automatically generated name (in this case, str_0)
This tool call does two unusual things:
- It uses the value of the bark variable.
- It sets the return value of the tool call to bark_twice.
And now we have three variables in the runtime:
- bark was defined in the code snippet.
- str_0 was automatically generated for the first tool call.
- bark_twice was set manually by the second tool call.
If you want to replay the execution, you can use the code method of the RuntimeState.

Example 2: Capturing the output of a tool call and handling errors ¶

This example shows how:

You can capture the output and error of a tool call: We capture the output and error of every instruction.
The errors are formatted nicely to make it easier for LLMs to debug: In the case an error of a tool call / JSON instruction, we format the error with context about the action, its arguments, and the traceback.
We keep the successful instructions and discard the others: Only the successful instructions are kept in the RuntimeState . You can later look at the code of the entire execution by using the RuntimeState.code either with or without the failed instructions.

from conatus.runtime.runtime import Runtime
from conatus.models.inputs_outputs.tool_calls import AIToolCall

def assert_false() -> None:
    assert False


runtime = Runtime(actions=[assert_false])
success = runtime.run(  # (2)!
    code_snippets=["print('hello')"],
    tool_calls=[AIToolCall(name="assert_false", returned_arguments={})]
)

print(runtime.state.last_step.stdout)
# hello

print(runtime.state.last_step.stderr) # (3)!
# The action you provided raised the following error:
# Action: assert_false
# Arguments: {}
# Error:
# Traceback: Traceback (most recent call last):
# ...
#   File "/var/folders/....py", line 7, in assert_false
#     assert False
#            ^^^^^
# AssertionError

print(runtime.state.code(include_failed=True)) # (4)!
# # Step 0
# print('hello')
# # Failed to execute:
# # assert_false()

We define an Action that is guaranteed to raise an error.
We run a code snippet that prints "hello" (to stdout, the default output channel), then our tool call, which will print an error to stderr, the default error channel.
When an action fails, the Runtime formats the error with context about the action, its arguments, and the traceback. This is useful for LLMs to debug.
If you want to replay the execution, you can use the code method of the RuntimeState. Here, we include the failed instructions in the code, so that we can see which ones failed. (This is off by default, and we have to pass include_failed=True to the method.)

Example 3: Terminating the runtime ¶

This example shows how to check that the Runtime is terminated. This is useful for BaseAgents that need to know when to stop the execution loop.

from conatus.actions.preloaded.standard_actions import terminate
from conatus.runtime.runtime import Runtime

# Creating a termination action that expects a "greeting" variable.
termination_action = terminate(
  expected_outputs={"greeting": (str, "The greeting to use.")}
)
runtime = Runtime(actions=[termination_action])
assert not runtime.is_terminated

code_snippets_iterator = iter([
    "greeting = 'hello'",
    "terminate(success=True, greeting=greeting)",
    "greeting = 'goodbye'",
])

# And now we can execute the code snippets until the runtime is terminated.
while not runtime.is_terminated:
    code_snippet = next(code_snippets_iterator)
    runtime.run(code_snippets=[code_snippet])

# We see that the "greeting" variable was never set to "goodbye", because the
# termination action was executed.
assert runtime.variables["greeting"].value == "hello"

print(runtime.state.code())
# # Step 0 -- No variables imported
#
# # Step 1
# greeting = 'hello'
# # Step 2
# terminate(greeting=greeting)

Getting tool specifications ¶

The Runtime.get_tool_specifications method generates the JSON schemas for actions that can be passed to the LLM.

What makes it "smart" is that it only exposes actions that can actually be called given the current state of the runtime. For example, if you have an action that requires a pandas DataFrame but no compatible variables are available, that action won't be included in the tool specifications:

import pandas as pd
from conatus.runtime import Runtime

def print_df(df: pd.DataFrame):
    print(df)

runtime = Runtime(actions=[print_df])

# No tool specifications yet, because no compatible variables are available
assert runtime.get_tool_specifications() == []

# Import a variable
runtime.import_variable(name="df1", value=pd.DataFrame({"a": [1, 2, 3]}))

# Now we have a tool specification
print(runtime.get_tool_specifications()[0].json_schema.model_json_schema())

You should get something like this:

JSON Schema

{
  "$defs": {
    "df_possible_variables": {
      "description": "You can pass 'df' by reference with a formatted reference '<<var:{name}>>' to a variable compatible with type 'pandas.DataFrame' among ['df1']",
      "enum": ["<<var:df1>>"],
      "title": "df_possible_variables",
      "type": "string"
    },
    "possible_return_assignment": {
      "enum": ["df1"],
      "title": "possible_return_assignment",
      "type": "string"
    }
  },
  "properties": {
    "df": {
      "$ref": "#/$defs/df_possible_variables",
      "description": "(type: pandas.DataFrame) <No description>"
    },
    "return": {
      "anyOf": [
        { "$ref": "#/$defs/possible_return_assignment" },
        { "type": "null" }
      ],
      "description": "If you want this action to assign the return value to a variable, pass the name of the variable in this `return` parameter. If you pass a null value, we will create a new variable automatically.\nThis is OPTIONAL. Only use it if it makes sense."
    }
  },
  "required": ["df", "return"],
  "title": "print_dfJSONSchema",
  "type": "object"
}

This ensures that the LLM only sees actions that it can actually use, reducing the chance of errors and making the interface more intuitive.

Hiding the runtime variables from the LLM ¶

The Runtime is designed to expose the variables to the LLM. However, there are scenarios where you want to disable this capability and make all tool schemas "classic" (i.e., only JSON-serializable parameters): for instance, if you're building a simple ReAct-style agent, and you don't want to confuse the LLM with the variables.

Only JSON-serializable methods

When hide_from_ai is set to True, only JSON-serializable methods can be called. This is similar to most traditional AI agent frameworks. This will be the case for all actions in the runtime.

You can control this with the hide_from_ai parameter when constructing your Runtime:

from conatus.runtime.runtime import Runtime

runtime = Runtime(hide_from_ai=True)

When this is set:

Tool calls schemas will not include any variable reference mechanism.
The LLM will only be allowed to pass raw values; the tool response will present results similarly.
All actions in this mode must use JSON-serializable types exclusively.

Replay-ability ¶

Because every variable and every instruction (code snippet, tool call, computer use action) is tracked in detail, including the representation of variable values at every step, Conatus enables robust replay, debugging, and analysis.

You can always obtain a pythonic replay of the session so far through the RuntimeState.code method.

This will produce code resembling the order and structure of execution, with variables and return values corresponding to what happened in the actual run. This makes it easy to:

Debug what happened during a run
Share reproducible traces with collaborators
Audit LLM behavior in production

You can also inspect the history of any variable over time through the RuntimeVariable.value_repr_history attribute.

from conatus.models.inputs_outputs.tool_calls import AIToolCall
from conatus.runtime.runtime import Runtime

def add(a: int, b: int) -> int:
    return a + b

runtime = Runtime(actions=[add])

runtime.run(tool_calls=[AIToolCall(name="add", returned_arguments={"a": 1, "b": 2})])

code = runtime.state.code()
target_code = """# Step 0 -- No variables imported

# Step 1
int_0: int = add(a=1, b=2)"""
assert code == target_code

var = runtime.variables["int_0"]
assert var.value_repr_history == [(1, ('3', None))]

Next steps ¶

To deepen your understanding on advanced runtime concepts, such as how variable compatibility or tool schemas are generated, and implementation choices, check out the internals page.

Obviously, this is only implemented for JSON / tool call instructions, since you can do anything with Python code. ↩

Runtime: Executing and Tracking LLM Instructions ¶

Overview ¶

Why use Runtime? ¶

Initializing the Runtime ¶

Executing Instructions ¶

Example 1: Executing instructions and handling variables ¶

Example 2: Capturing the output of a tool call and handling errors ¶

Example 3: Terminating the runtime ¶

Getting tool specifications ¶

Hiding the runtime variables from the LLM ¶

Replay-ability ¶

Next steps ¶

Why use `Runtime`? ¶

Initializing the `Runtime` ¶