Runtime internals ¶

This page lists a few topics that detail some implementation choices made in the Runtime and its related classes.

Enabling the LLM to pass variables by reference ¶

Traditionally, AI agents pass information to one another through JSON. One feature of Conatus is the ability to pass variables by reference. This is particularly useful when:

The LLM needs to pass a complex object to the agent. By complex object, here, we mean non-JSON serializable objects.
We want to design for repeatability, and therefore want to avoid hard-coding values.
Because some actions can only work with references / complex objects, they sometimes cannot be used immediately unless a new variable of the correct type is created. So knowing which variables are compatible with a parameter can help us determine which actions to show to the LLM at any given time, which is achieved with the ActionAvailability data structure.
For instance, if you initialize a Runtime instance with browsing_actions as actions, and without passing a SimpleBrowser variable, the LLM will only see the browser_start action, since the other browser actions need a SimpleBrowser instance to be called.

To enable this, we use the <<var:name>> syntax. It's a syntax that is sufficiently obscure so that it unlikely to clash with the natural language intended by the LLM.

How to deactivate this feature

This feature is activated by default. You can deactivate it by setting the hide_from_ai attribute of the Runtime to True.

When this is True, the LLM will not be able to pass variables by reference. Instead, the JSON schemas that will be sent to the LLM will look like "normal" AI agent JSON schemas, with parameters that are either primitives or JSON objects.

Note that this means that all parameters of all actions will need to be in JSON-serializable types.

The way this works is by doing two things:

Keep track of variables compatibility: We keep track of which variables are compatible with which parameters of which actions .
Generate the JSON schemas: We generate the JSON schemas for the actions based on the compatible variables.
Resolve the variable reference: When the LLM provides a variable reference, we resolve it to the actual variable, and return it to the LLM.

Keeping track of variables compatibility ¶

One thing we do when initializing a Runtime is to compute the so-called "compatible variables matrix" . This is a matrix that contains the variables that are compatible with each parameter of each action.

That matrix is computed by the compute_compatible_variables_matrix and has the following form:

{
    "action_name$param_name": {"var_name_1", "var_name_2", ...},
}

The key is the so-called "action_name$param_name" string, which identifies the parameter of the action that we are looking at. Since $ is not allowed in parameter names, we can safely use it to identify the parameter.

The matrix is done by checking the type of the parameter and the type of the variable. For instance, if the parameter is an int and the variable is a float, they are not compatible.

We use Pydantic to check the type compatibility. We first create a parameter_validation_model for all actions. This model contains every parameter of every action, with their type annotations. We do this essentially for performance reasons: the overhead of running the model is negligible compared to the overhead of creating the model, so it's better to have just one big model than one model per action.

We then use this model to check which variables are compatible with which parameters. We do this by simply trying to instantiate the model with the variable as the value for all parameters. We then check for each error: if the variable is not compatible with the parameter, we remove the variable from the set of compatible variables for that parameter.

This matrix should be updated whenever a variable is updated or an action added or removed. This is done by the update_action_availability_and_compatibility_matrix method. This method also updates the action_availability attribute, which contains information about which actions are available at any given time.

Converting the compatibility matrix to JSON schemas ¶

Other classes (specifically, AIInterface s) need to have access to the JSON schemas of the actions. The Runtime class contains a method that converts the compatibility matrix to JSON schemas: get_action_json_schemas.

Underneath the hood, this method does the following:

It uses the format_compatibility_matrix_for_json_schema method to format the compatibility matrix. That method ensures that the "success" parameter of the termination action cannot receive any variable reference, since we want the LLM to pass the success status as raw boolean value. It also splits the keys of the matrix into action name and parameter name, since we will need to create one JSON schema per action.
For each action, it then passes this formatted compatibility matrix to the generate_pydantic_json_schema_model function alongside the action's FunctionInfo and the list of all variables in the runtime. Note that we also take care of the return value of the action, so that it can also be passed by reference if needed.

You then end up with a dictionary of action name and a BaseModel that is the JSON schema of the action.

There's more transformations involved afterwards, but what ends up being sent to the LLM looks like this:

Example of JSON schema sent to the LLM

{
    "$defs":{
        "location_possible_variables":{
            "description":"You can pass 'location' by reference with a formatted reference '<<var:{name}>>' to a variable compatible with type 'str' among ['language', 'location', 'country_of_origin']",
            "enum":[
                "<<var:language>>",
                "<<var:location>>",
                "<<var:country_of_origin>>"
            ],
            "title":"location_possible_variables",
            "type":"string"
        },
        "possible_return_assignment":{
            "enum":[
                "location",
                "language",
                "country_of_origin"
            ],
            "title":"possible_return_assignment",
            "type":"string"
        }
    },
    "description":"Get the weather for a given location.",
    "properties":{
        "location":{
            "anyOf":[
                {
                "type":"string"
                },
                {
                "$ref":"#/$defs/location_possible_variables"
                }
            ],
            "description":"(type: str) The location to get the weather for.",
            "title":"Location"
        },
        "unit":{
            "description":"(type: Literal['c', 'f']) The unit of the weather.",
            "enum":[
                "c",
                "f"
            ],
            "title":"Unit",
            "type":"string"
        },
        "return":{
            "anyOf":[
                {
                "$ref":"#/$defs/possible_return_assignment"
                },
                {
                "type":"null"
                }
            ],
            "description":"If you want this action to assign the return value to a variable, pass the name of the variable in this `return` parameter. If you pass a null value, we will create a new variable automatically.\nThis is OPTIONAL. Only use it if it makes sense."
        }
    },
    "required":[
        "location",
        "unit",
        "return"
    ],
    "title":"get_weatherJSONSchema",
    "type":"object"
}

Resolving the variable reference ¶

When the LLM provides a variable reference, we resolve it to the actual RuntimeVariable. This is done by the resolve_var_or_ref method, which takes care of checking that the reference is a reference to a variable that exists in the RuntimeState and that it is in the compatible_variables_matrix .

Running code and JSON instructions ¶

The Runtime class has a run method that essentially accepts two arguments:

code_snippets: A list of Python code snippets to execute.
tool_calls: A list of AIToolCall or ComputerUseAction to execute. We also call these JSON instructions.

In other words, this method gives enough flexibility to allow the LLM to provide instructions in the form of Python code and JSON instructions. By convention, we execute the Python code first, and then the JSON instructions.

Running the Python code ¶

The run_code_instruction method takes care of running the Python code. In theory, execution Python code is straightforward: we just need to execute the code in the global scope, and it will modify the global variables.

However, we need to do a few things:

Capture the standard output and standard error of the code, to return them in the tool response.
Ensure that only the variables that are present in the RuntimeState are available to the code.
Track which variables were created by the code, so that we can add them to the RuntimeState.
Track which variables are modified by the code, so that we can add their updates to the RuntimeState, and communicate them to the LLM.

Running Python code, for now, has plenty of limitations compared to JSON instructions:

We don't have a way to extract import statements from the Python code, unlike with JSON instructions. For now, RuntimePythonInstruction s have an empty import_statements attribute.
The Python code is executed in the global scope, which has a potential for lots of side effects. This is partially by design; the point of passing variables by reference is that you can start a task with variables that are already in memory and not simply JSON objects. Some variables, in fact, cannot be pickled. However, we could add some safeguards in the future.

Running the JSON instructions ¶

The run_json_instruction method takes care of running the JSON instructions. This is relatively straightforward, since we have all the actions available to us with their relatively robust type checking scaffolding.

Like with Python code, we capture the standard output and standard error of the JSON instruction, track modified variables, and include them in the tool response.

The termination action is a special case: it is not a real action, but a special JSON instruction that is used to return a value to the LLM. We handle it separately.

Generating tool responses ¶

WIP

This section is a work in progress.

Many AI providers require tool calls to be followed by a tool response message. Generally, this tool response can contain any sort of message. We use the generate_tool_response_content method to generate the content of the tool response.

That tool response content contains the following:

success: Whether the tool call was successful.
stdout: The standard output of the tool call, if any.
stderr: The standard error of the tool call, if any.
modified_variables: The variables that were modified by the tool call, if any. This includes variables that were created.

If the hide_from_ai attribute of the Runtime is True, we otherwise return:

success: Whether the tool call was successful.
result (or results if there are multiple): The result of the tool call. Since setting hide_from_ai means that we can only call actions with JSON-serializable types, the result will be a JSON-serializable type. Therefore, the result will be a JSON object.

Adapting to computer use ¶

WIP

This section is a work in progress. It should cover (1) the fact that we have to filter the computer use environment vars (because AI providers only care about the computer environment that's given to it) (2) cover the filter_computer_use_environment_vars method and the make_computer_use_config method. (3) how the tool response differs from other cases (see previous section) (4) talk about the mapping of the ComputerUseAction class to the strings of these methods, which correspond to the preloaded actions in conatus.actions.preloaded. (5) mention that it's browser only for now.

Translate instructions to Python code ¶

WIP

This section is a work in progress. It should cover the various code generation functions and the choices made there (e.g. "step 0 -- no variables imported", the fact that we try to retrieve the type for type hinting, that we try to reconstruct the import statements, etc.)

Normalizing the inputs, outputs and starting variables ¶

WIP

This section is a work in progress. (Maybe should be put in Task internals?)