Runtime: Executing and Tracking LLM Instructions ¶
The Runtime subsystem is at the heart of
Conatus. If LLMs are able to give us instructions (e.g. code snippets or tool
calls), the Runtime is able to execute
those instructions, track the state of the execution, store metadata that can be
passed back to the LLM to allow for better reasoning, and more. At the end of
the day, it is essential to create Playbooks to replay a Task
without needing a LLM.
Overview ¶
At a high level, the Runtime subsystem
consists of several key components:
-
Variables and State:
RuntimeVariablewraps each value, storing its history (based on its string representation) and type.RuntimeStateholds the current set of variables and the history of execution steps.RuntimeStateSteprepresents one step in the execution, which may include several instructions.
-
Instructions: The system records every instruction the LLM provides, using two formats:
- JSON instructions (tool calls) are captured by
RuntimeJSONInstruction. These instructions come with structured arguments and return types. - Code snippets are captured by
RuntimePythonInstruction. They allow the LLM to express more complex ideas, and to execute arbitrary-ish Python code.
- JSON instructions (tool calls) are captured by
- Actions and Execution:
Actions are executed by theRuntime. Their input parameters and output types are validated, and only those actions whose required variables are available are exposed to the LLM.
Why use Runtime? ¶
- Dual Instruction Types: LLMs can issue instructions in two formats. We want either to be executed smoothly and consistently.
- Metadata capture: We want to capture all the metadata associated with the execution of an instruction. This includes the stdout/stderr output, the variables that were changed, and more. This enables us to communicate that information to AI models to enable better reasoning.
- Passing by Reference: Unlike traditional AI agent frameworks that only
support JSON-serializable data, Conatus enables AI models to refer to
RuntimeVariables by using the syntax<<var:{name}>>. This makes it possible to pass complex objects (like pandas DataFrames or Playwright browsers) directly to actions. 1
- Error minimization: The
Runtimepulls all the stops to minimize avoidable errors. For example, it makes sure that AI models can only execute tool calls on actions that can actually be executed given the available variables, and it ensures that passing by reference only happens when possible.
- Replay-ability: Every instruction can be converted to Python code, and is the basis of replay-able tasks.
Initializing the Runtime ¶
You initialize the Runtime with starting
variables and a set of actions. The Runtime
supports multiple input formats for the
starting variables (singletons, lists, or dictionaries) and automatically
normalizes them.
from conatus.runtime.runtime import Runtime
# Define starting variables (can be a dict, list, or singleton)
starting_vars = {"ten": 10, "ten_as_string": "10"}
# Define actions
def add(a: int, b: int) -> int:
return a + b
# Initialize the runtime with actions and starting variables.
runtime = Runtime(starting_variables=starting_vars, actions=[add])
print([(v.name, v.value, v.type_hint) for v in runtime.variables.values()])
# > [('ten', 10, <class 'int'>), ('ten_as_string', '10', <class 'str'>)]
from conatus.runtime.runtime import Runtime
# Define starting variables (can be a dict, list, or singleton)
starting_vars = [9009, "10"]
# Define actions
def add(a: int, b: int) -> int:
return a + b
# Initialize the runtime with actions and starting variables.
runtime = Runtime(starting_variables=starting_vars, actions=[add])
# The variables are named automatically, based on their type.
print([(v.name, v.value, v.type_hint) for v in runtime.variables.values()])
# > [('int_0', 9009, <class 'int'>), ('str_0', '10', <class 'str'>)]
You can also pass a TaskConfig to the
Runtime constructor. For now, this is only
used to configure the maximum length of the string representation of a variable.
One other thing you can do is pass an expected_input_types argument. This is a
dictionary that maps variable names to their expected types. This is used to
validate the types of the starting variables, as well as to infer the names of
the variables if you pass a list or a singleton.
from collections import OrderedDict
from conatus.runtime.runtime import Runtime
# Define the expected input types.
# Useful for something like:
# def my_task(number: int, string: str) -> ...
expected_input_types = OrderedDict({"number": int, "string": str})
# And now we want to pass a list of starting variables.
starting_vars = [9009, "hey ya"]
# Initialize the runtime with actions and starting variables.
runtime = Runtime(
starting_variables=starting_vars,
expected_input_types=expected_input_types,
)
# The variables are named automatically, based on their type.
print([(v.name, v.value, v.type_hint) for v in runtime.variables.values()])
# > [('number', 9009, <class 'int'>), ('string', 'hey ya', <class 'str'>)]
Executing Instructions ¶
Execution is performed through the Runtime.run
method. This method takes two lists:
- Code Snippets: Executed first via
RuntimePythonInstruction. All variables in the runtime are passed to Python's exec function, and changes to variables are tracked.
- Tool Calls: Executed second via
RuntimeJSONInstruction. The runtime uses a compatibility matrix to ensure that only actions with available and type-compatible variables are exposed.
During execution, the runtime captures:
- Variable Histories: Only when a variable's string representation (as
provided by its
RuntimeVariable.value_repr) changes, a new history entry is recorded. The history is comprised of a tuple of the step number and two representations of the variable: one of text, and one of image. (The latter is especially useful for things like browsers, where you can take screenshots of the current state.)- By default, we use the
reprprotocol to get the text representation, and the image is set toNone. But you can customize the text representation by providing a custom implementation of thellm_reprprotocol. If you're curious about customizing the string representation of your variables, or adding an image representation, check out the documentation for theRuntimeVariableclass.
- By default, we use the
- Standard Output and Error: Both stdout and stderr are captured and stored within each instruction's metadata.
Let's see a few examples. Click on the (+) buttons to see comments.
Example 1: Executing instructions and handling variables ¶
This examples shows how:
- You can mix instructions: You can pass both code snippets and tool calls
to the
Runtime.runmethod. The code snippets are executed first, then the tool calls. - You can handle variables:
- Variables can be automatically generated if they are not named.
- You can use the value of a variable in a tool call by using the syntax
<<var:{name}>>. This is useful for passing complex objects (like pandas DataFrames or Playwright browsers) directly to actions. - You can set the return value of a tool call by using the
returnkeyword. If we determine that that variable name is valid, we will use it. Otherwise, we will automatically generate a name for the variable.
from conatus.runtime.runtime import Runtime
from conatus.models.inputs_outputs.tool_calls import AIToolCall
def echo_twice(text: str) -> str:
return f"{text} {text}"
# We initialize the runtime without starting variables.
runtime = Runtime(actions=[echo_twice])
# runtime.variables == OrderedDict()
success = runtime.run(
code_snippets=["bark = 'woof'"], # (2)!
tool_calls=[
AIToolCall(
name="echo_twice",
returned_arguments={"text": "meow"}
), # (3)!
AIToolCall(
name="echo_twice",
returned_arguments={
"text": "<<var:bark>>",
"return": "bark_twice"
}
), # (4)!
],
)
print([(v.name, v.value) for v in runtime.variables.values()]) # (5)!
# > [('bark', 'woof'), ('str_0', 'meow meow'), ('bark_twice', 'woof woof')]
print(runtime.state.code()) # (6)!
# # Step 0
# bark = 'woof'
# str_0: str = echo_twice(text='meow')
# bark_twice: str = echo_twice(text=bark)
- This is an example of an
Action. - The code snippet is executed first.
- By default, the value of a tool call will be stored in a variable with an
automatically generated name (in this case,
str_0) - This tool call does two unusual things:
- It uses the value of the
barkvariable. - It sets the return value of the tool call to
bark_twice.
- It uses the value of the
- And now we have three variables in the runtime:
barkwas defined in the code snippet.str_0was automatically generated for the first tool call.bark_twicewas set manually by the second tool call.
- If you want to replay the execution, you can use the
codemethod of theRuntimeState.
Example 2: Capturing the output of a tool call and handling errors ¶
This example shows how:
- You can capture the output and error of a tool call: We capture the output and error of every instruction.
- The errors are formatted nicely to make it easier for LLMs to debug: In the case an error of a tool call / JSON instruction, we format the error with context about the action, its arguments, and the traceback.
- We keep the successful instructions and discard the others: Only the
successful instructions are kept in the
RuntimeState. You can later look at the code of the entire execution by using theRuntimeState.codeeither with or without the failed instructions.
from conatus.runtime.runtime import Runtime
from conatus.models.inputs_outputs.tool_calls import AIToolCall
def assert_false() -> None:
assert False
runtime = Runtime(actions=[assert_false])
success = runtime.run( # (2)!
code_snippets=["print('hello')"],
tool_calls=[AIToolCall(name="assert_false", returned_arguments={})]
)
print(runtime.state.last_step.stdout)
# hello
print(runtime.state.last_step.stderr) # (3)!
# The action you provided raised the following error:
# Action: assert_false
# Arguments: {}
# Error:
# Traceback: Traceback (most recent call last):
# ...
# File "/var/folders/....py", line 7, in assert_false
# assert False
# ^^^^^
# AssertionError
print(runtime.state.code(include_failed=True)) # (4)!
# # Step 0
# print('hello')
# # Failed to execute:
# # assert_false()
- We define an
Actionthat is guaranteed to raise an error. - We run a code snippet that prints "hello" (to
stdout, the default output channel), then our tool call, which will print an error tostderr, the default error channel. - When an action fails, the
Runtimeformats the error with context about the action, its arguments, and the traceback. This is useful for LLMs to debug. - If you want to replay the execution, you can use the
codemethod of theRuntimeState. Here, we include the failed instructions in the code, so that we can see which ones failed. (This is off by default, and we have to passinclude_failed=Trueto the method.)
Example 3: Terminating the runtime ¶
This example shows how to check that the Runtime
is terminated. This is useful for
BaseAgents that need to know when to stop the
execution loop.
from conatus.actions.preloaded.standard_actions import terminate
from conatus.runtime.runtime import Runtime
# Creating a termination action that expects a "greeting" variable.
termination_action = terminate(
expected_outputs={"greeting": (str, "The greeting to use.")}
)
runtime = Runtime(actions=[termination_action])
assert not runtime.is_terminated
code_snippets_iterator = iter([
"greeting = 'hello'",
"terminate(success=True, greeting=greeting)",
"greeting = 'goodbye'",
])
# And now we can execute the code snippets until the runtime is terminated.
while not runtime.is_terminated:
code_snippet = next(code_snippets_iterator)
runtime.run(code_snippets=[code_snippet])
# We see that the "greeting" variable was never set to "goodbye", because the
# termination action was executed.
assert runtime.variables["greeting"].value == "hello"
print(runtime.state.code())
# # Step 0 -- No variables imported
#
# # Step 1
# greeting = 'hello'
# # Step 2
# terminate(greeting=greeting)
Getting tool specifications ¶
The
Runtime.get_tool_specifications
method generates the JSON schemas for actions that can be passed to the LLM.
What makes it "smart" is that it only exposes actions that can actually be called given the current state of the runtime. For example, if you have an action that requires a pandas DataFrame but no compatible variables are available, that action won't be included in the tool specifications:
import pandas as pd
from conatus.runtime import Runtime
def print_df(df: pd.DataFrame):
print(df)
runtime = Runtime(actions=[print_df])
# No tool specifications yet, because no compatible variables are available
assert runtime.get_tool_specifications() == []
# Import a variable
runtime.import_variable(name="df1", value=pd.DataFrame({"a": [1, 2, 3]}))
# Now we have a tool specification
print(runtime.get_tool_specifications()[0].json_schema.model_json_schema())
You should get something like this:
JSON Schema
{
"$defs": {
"df_possible_variables": {
"description": "You can pass 'df' by reference with a formatted reference '<<var:{name}>>' to a variable compatible with type 'pandas.DataFrame' among ['df1']",
"enum": ["<<var:df1>>"],
"title": "df_possible_variables",
"type": "string"
},
"possible_return_assignment": {
"enum": ["df1"],
"title": "possible_return_assignment",
"type": "string"
}
},
"properties": {
"df": {
"$ref": "#/$defs/df_possible_variables",
"description": "(type: pandas.DataFrame) <No description>"
},
"return": {
"anyOf": [
{ "$ref": "#/$defs/possible_return_assignment" },
{ "type": "null" }
],
"description": "If you want this action to assign the return value to a variable, pass the name of the variable in this `return` parameter. If you pass a null value, we will create a new variable automatically.\nThis is OPTIONAL. Only use it if it makes sense."
}
},
"required": ["df", "return"],
"title": "print_dfJSONSchema",
"type": "object"
}
This ensures that the LLM only sees actions that it can actually use, reducing the chance of errors and making the interface more intuitive.
Hiding the runtime variables from the LLM ¶
The Runtime is designed to expose the
variables to the LLM. However, there are scenarios where you want to disable
this capability and make all tool schemas "classic" (i.e., only
JSON-serializable parameters): for instance, if you're building a simple
ReAct-style agent, and you don't want to confuse the LLM with the variables.
Only JSON-serializable methods
When hide_from_ai is set to True, only JSON-serializable methods can be
called. This is similar to most traditional AI agent frameworks. This will be
the case for all actions in the runtime.
You can control this with the hide_from_ai parameter when constructing your
Runtime:
When this is set:
- Tool calls schemas will not include any variable reference mechanism.
- The LLM will only be allowed to pass raw values; the tool response will present results similarly.
- All actions in this mode must use JSON-serializable types exclusively.
Replay-ability ¶
Because every variable and every instruction (code snippet, tool call, computer use action) is tracked in detail, including the representation of variable values at every step, Conatus enables robust replay, debugging, and analysis.
You can always obtain a pythonic replay of the session so far through the
RuntimeState.code method.
This will produce code resembling the order and structure of execution, with variables and return values corresponding to what happened in the actual run. This makes it easy to:
- Debug what happened during a run
- Share reproducible traces with collaborators
- Audit LLM behavior in production
You can also inspect the history of any variable over time through the
RuntimeVariable.value_repr_history
attribute.
from conatus.models.inputs_outputs.tool_calls import AIToolCall
from conatus.runtime.runtime import Runtime
def add(a: int, b: int) -> int:
return a + b
runtime = Runtime(actions=[add])
runtime.run(tool_calls=[AIToolCall(name="add", returned_arguments={"a": 1, "b": 2})])
code = runtime.state.code()
target_code = """# Step 0 -- No variables imported
# Step 1
int_0: int = add(a=1, b=2)"""
assert code == target_code
var = runtime.variables["int_0"]
assert var.value_repr_history == [(1, ('3', None))]
Next steps ¶
To deepen your understanding on advanced runtime concepts, such as how variable compatibility or tool schemas are generated, and implementation choices, check out the internals page.
-
Obviously, this is only implemented for JSON / tool call instructions, since you can do anything with Python code. ↩