AI Interfaces

Work in progress

This page is a work in progress. Generally speaking, you should be able to find most of the information you need in the following pages:

Prompt assembly helpers and variable/image representation ¶

A major feature of AI interfaces in Conatus is the ability to precisely construct context-rich prompts—often in semi-structured XML—using helpers provided by BaseAIInterfaceWithTask.

The XML-construction methods ¶

You frequently encounter .get_*_xml() methods, each returning structured text or token-part representations describing some facet of the task or environment:

get_task_definition_xml: Wraps the full user-defined task, expected inputs/outputs, etc.
get_all_variables_xml: Lists all variables in the current runtime state, including optional image parts if include_images=True.
get_steps_so_far_xml: Returns a code-style log of execution history up to this point.
get_last_step_xml: Information on the most recent step, including code and output.
get_docstrings_for_all_actions_xml: Details every action available, formatted (by default) for XML.
get_modified_variables_xml: Isolates only the variables recently modified (often after tool calls).

The choice of XML is motivated by the fact that AI labs have found it to be an effective way to structure LLM prompts.

For more detail on these methods, see their respective docstrings in BaseAIInterfaceWithTask.

Conversation loop: turn handling, streaming vs. non-streaming, logging ¶

Every AI interface follows a managed turn-based loop orchestrated primarily via BaseAIInterface. This loop is embodied in the arun and run methods.

The conversation turn logic ¶

At a high level, the loop operates as follows (see BaseAIInterface.arun):

Prompt construction: The interface generates an AIPrompt via make_prompt.
- On first turn: calls make_first_prompt.
- On subsequent turns: calls make_new_prompt. (The decision on which method to call is made in make_prompt based on the value of turn_count.)
Model call: The prompt is synchronously or asynchronously passed to the LLM (model.call, [model.acall], [model.acall_stream], etc.).
- Streaming (chunked) vs. non-streaming is controlled via the stream parameter (see stream).
Step decision: The interface uses should_continue to decide if another loop is necessary (e.g., did we reach max_turns? Did we see a termination tool call?).
New message generation: If continuing, the interface may use make_new_messages to generate new messages (e.g. tool response messages).
Logging: Prompts and responses are logged by default via the FileWriter infrastructure, and cost is tracked. (See also: generate_prompt_response_callbacks, which is used to write logs to disk.)
Result extraction: The result is extracted from the response using extract_result. This is then bundled into an AIInterfacePayload object, which is returned from the arun method.

Relationship with Runtime: new-message handling and tool-response buffering ¶

For interfaces that are tied to a Runtime, message handling extends beyond pure conversation:

Tool call execution: After LLM suggests tool calls and/or code snippets, the interface executes them via Runtime.run. This is done in the should_continue method.
Buffering responses: Responses from tool execution (e.g., instances of ToolResponseAIMessage) are buffered in the interface during should_continue, and then retrieved in make_new_messages for injecting into the next prompt's conversation history.

Model selection: precedence and logic ¶

Selecting which LLM is used for a given AI interface follows a clear (and flexible) precedence order.

Precedence in BaseAIInterface.retrieve_model:

Explicit model passed via task_config.preferred_model (highest priority)
Explicit provider via task_config.preferred_provider
Model name set in model_config
Otherwise, fallback to the default (OpenAIModel)

Computer-use interface quirks ¶

ComputerUseAIInterface extends ExecutionAIInterface, with adjustments tailored to models (currently OpenAI) that support native computer-use actions (browsing, clicking, scrolling, etc.).

Notable specifics of `ComputerUseAIInterface` ¶

Uses a computer-use-specific system prompt (execution_cu_dev_message.txt), which contains extra machine-usable instructions (e.g., "You may only use one browser at a time"…)
Sets computer_use_mode=True so:
- Only computer-use compatible tools/actions are exposed (see get_tool_specifications)
- The prompt assembly includes richer perception (images, environment state) in messages
The computer use models require that only one computer-use environment (e.g. browser) be available at a time, enforced by only_keep_one_computer_use_environment=True