Runtime internals ¶
This page lists a few topics that detail some implementation choices made in the
Runtime and its related classes.
Enabling the LLM to pass variables by reference ¶
Traditionally, AI agents pass information to one another through JSON. One feature of Conatus is the ability to pass variables by reference. This is particularly useful when:
- The LLM needs to pass a complex object to the agent. By complex object, here, we mean non-JSON serializable objects.
- We want to design for repeatability, and therefore want to avoid hard-coding values.
- Because some
actionscan only work with references / complex objects, they sometimes cannot be used immediately unless a new variable of the correct type is created. So knowing which variables are compatible with a parameter can help us determine whichactionsto show to the LLM at any given time, which is achieved with theActionAvailabilitydata structure.
For instance, if you initialize aRuntimeinstance withbrowsing_actionsasactions, and without passing aSimpleBrowservariable, the LLM will only see thebrowser_startaction, since the other browser actions need aSimpleBrowserinstance to be called.
To enable this, we use the <<var:name>> syntax. It's a syntax that is
sufficiently obscure so that it unlikely to clash with the natural language
intended by the LLM.
How to deactivate this feature
This feature is activated by default. You can deactivate it by setting the
hide_from_ai
attribute of the Runtime to
True.
When this is True, the LLM will not be able to pass variables
by reference. Instead, the JSON schemas that will be sent to the LLM will
look like "normal" AI agent JSON schemas, with parameters that are either
primitives or JSON objects.
Note that this means that all parameters of all actions
will need to be in
JSON-serializable types.
The way this works is by doing two things:
- Keep track of variables compatibility: We keep track of which variables are compatible with which parameters of which actions .
- Generate the JSON schemas: We generate the JSON schemas for the actions based on the compatible variables.
- Resolve the variable reference: When the LLM provides a variable reference, we resolve it to the actual variable, and return it to the LLM.
Keeping track of variables compatibility ¶
One thing we do when initializing a Runtime
is to compute the so-called "compatible variables matrix"
. This
is a matrix that contains the variables that are compatible with each parameter
of each action.
That matrix is computed by the
compute_compatible_variables_matrix
and has the following form:
The key is the so-called "action_name$param_name" string, which identifies
the parameter of the action that we are looking at. Since $ is not allowed in
parameter names, we can safely use it to identify the parameter.
The matrix is done by checking the type of the parameter and the type of the
variable. For instance, if the parameter is an int and the variable is a
float, they are not compatible.
We use Pydantic to check the type compatibility. We first create a
parameter_validation_model
for all actions. This model contains every parameter of every action, with their
type annotations. We do this essentially for performance reasons: the overhead
of running the model is negligible compared to the overhead of creating the
model, so it's better to have just one big model than one model per action.
We then use this model to check which variables are compatible with which parameters. We do this by simply trying to instantiate the model with the variable as the value for all parameters. We then check for each error: if the variable is not compatible with the parameter, we remove the variable from the set of compatible variables for that parameter.
This matrix should be updated whenever a variable is updated or an action added
or removed. This is done by the
update_action_availability_and_compatibility_matrix
method. This method also updates the
action_availability
attribute, which contains information about which actions are available at any
given time.
Converting the compatibility matrix to JSON schemas ¶
Other classes (specifically, AIInterface
s) need to have access to
the JSON schemas of the actions. The
Runtime class contains a method that
converts the compatibility matrix to JSON schemas:
get_action_json_schemas.
Underneath the hood, this method does the following:
- It uses the
format_compatibility_matrix_for_json_schemamethod to format the compatibility matrix. That method ensures that the "success" parameter of the termination action cannot receive any variable reference, since we want the LLM to pass the success status as raw boolean value. It also splits the keys of the matrix into action name and parameter name, since we will need to create one JSON schema per action. - For each action, it then passes this formatted compatibility matrix to the
generate_pydantic_json_schema_modelfunction alongside the action'sFunctionInfoand the list of all variables in the runtime. Note that we also take care of the return value of the action, so that it can also be passed by reference if needed.
You then end up with a dictionary of action name and a BaseModel
that is the JSON schema of the action.
There's more transformations involved afterwards, but what ends up being sent to the LLM looks like this:
Example of JSON schema sent to the LLM
{
"$defs":{
"location_possible_variables":{
"description":"You can pass 'location' by reference with a formatted reference '<<var:{name}>>' to a variable compatible with type 'str' among ['language', 'location', 'country_of_origin']",
"enum":[
"<<var:language>>",
"<<var:location>>",
"<<var:country_of_origin>>"
],
"title":"location_possible_variables",
"type":"string"
},
"possible_return_assignment":{
"enum":[
"location",
"language",
"country_of_origin"
],
"title":"possible_return_assignment",
"type":"string"
}
},
"description":"Get the weather for a given location.",
"properties":{
"location":{
"anyOf":[
{
"type":"string"
},
{
"$ref":"#/$defs/location_possible_variables"
}
],
"description":"(type: str) The location to get the weather for.",
"title":"Location"
},
"unit":{
"description":"(type: Literal['c', 'f']) The unit of the weather.",
"enum":[
"c",
"f"
],
"title":"Unit",
"type":"string"
},
"return":{
"anyOf":[
{
"$ref":"#/$defs/possible_return_assignment"
},
{
"type":"null"
}
],
"description":"If you want this action to assign the return value to a variable, pass the name of the variable in this `return` parameter. If you pass a null value, we will create a new variable automatically.\nThis is OPTIONAL. Only use it if it makes sense."
}
},
"required":[
"location",
"unit",
"return"
],
"title":"get_weatherJSONSchema",
"type":"object"
}
Resolving the variable reference ¶
When the LLM provides a variable reference, we resolve it to the actual
RuntimeVariable. This is done by the
resolve_var_or_ref
method, which takes care of checking that the reference is a reference to a
variable that exists in the RuntimeState
and that it is in the compatible_variables_matrix
.
Running code and JSON instructions ¶
The Runtime class has a run
method that essentially accepts two
arguments:
code_snippets: A list of Python code snippets to execute.tool_calls: A list ofAIToolCallorComputerUseActionto execute. We also call these JSON instructions.
In other words, this method gives enough flexibility to allow the LLM to provide instructions in the form of Python code and JSON instructions. By convention, we execute the Python code first, and then the JSON instructions.
Running the Python code ¶
The
run_code_instruction
method takes care of running the Python code. In theory, execution Python code
is straightforward: we just need to execute the code in the global scope, and it
will modify the global variables.
However, we need to do a few things:
- Capture the standard output and standard error of the code, to return them in the tool response.
- Ensure that only the
variablesthat are present in theRuntimeStateare available to the code. - Track which variables were created by the code, so that we can add them to
the
RuntimeState. - Track which variables are modified by the code, so that we can add their
updates to the
RuntimeState, and communicate them to the LLM.
Running Python code, for now, has plenty of limitations compared to JSON instructions:
- We don't have a way to extract import statements from the Python code, unlike
with JSON instructions. For now,
RuntimePythonInstructions have an emptyimport_statementsattribute. - The Python code is executed in the global scope, which has a potential for lots of side effects. This is partially by design; the point of passing variables by reference is that you can start a task with variables that are already in memory and not simply JSON objects. Some variables, in fact, cannot be pickled. However, we could add some safeguards in the future.
Running the JSON instructions ¶
The
run_json_instruction
method takes care of running the JSON instructions. This is relatively
straightforward, since we have all the actions
available to us with their relatively
robust type checking scaffolding.
Like with Python code, we capture the standard output and standard error of the JSON instruction, track modified variables, and include them in the tool response.
The termination action is a special case: it is not a real action, but a special JSON instruction that is used to return a value to the LLM. We handle it separately.
Generating tool responses ¶
WIP
This section is a work in progress.
Many AI providers require tool calls to be followed by a tool response message.
Generally, this tool response can contain any sort of message. We use the
generate_tool_response_content
method to
generate the content of the tool response.
That tool response content contains the following:
success: Whether the tool call was successful.stdout: The standard output of the tool call, if any.stderr: The standard error of the tool call, if any.modified_variables: The variables that were modified by the tool call, if any. This includes variables that were created.
If the hide_from_ai attribute
of the Runtime is True, we
otherwise return:
success: Whether the tool call was successful.result(orresultsif there are multiple): The result of the tool call. Since settinghide_from_aimeans that we can only callactionswith JSON-serializable types, the result will be a JSON-serializable type. Therefore, the result will be a JSON object.
Adapting to computer use ¶
WIP
This section is a work in progress. It should cover (1) the fact that we have to
filter the computer use environment vars (because AI providers only care about
the computer environment that's given to it) (2) cover the filter_computer_use_environment_vars
method and the make_computer_use_config method. (3) how the tool response
differs from other cases (see previous section) (4) talk about the mapping
of the ComputerUseAction class to the strings of these methods, which
correspond to the preloaded actions in conatus.actions.preloaded. (5)
mention that it's browser only for now.
Translate instructions to Python code ¶
WIP
This section is a work in progress. It should cover the various code generation functions and the choices made there (e.g. "step 0 -- no variables imported", the fact that we try to retrieve the type for type hinting, that we try to reconstruct the import statements, etc.)
Normalizing the inputs, outputs and starting variables ¶
WIP
This section is a work in progress. (Maybe should be put in Task internals?)