Skip to content

AI Interfaces

Work in progress

This page is a work in progress. Generally speaking, you should be able to find most of the information you need in the following pages:

Prompt assembly helpers and variable/image representation

A major feature of AI interfaces in Conatus is the ability to precisely construct context-rich promptsโ€”often in semi-structured XMLโ€”using helpers provided by BaseAIInterfaceWithTask.

The XML-construction methods

You frequently encounter .get_*_xml() methods, each returning structured text or token-part representations describing some facet of the task or environment:

The choice of XML is motivated by the fact that AI labs have found it to be an effective way to structure LLM prompts.

For more detail on these methods, see their respective docstrings in BaseAIInterfaceWithTask.

Conversation loop: turn handling, streaming vs. non-streaming, logging

Every AI interface follows a managed turn-based loop orchestrated primarily via BaseAIInterface. This loop is embodied in the arun and run methods.

The conversation turn logic

At a high level, the loop operates as follows (see BaseAIInterface.arun):

  1. Prompt construction: The interface generates an AIPrompt via make_prompt.
  2. Model call: The prompt is synchronously or asynchronously passed to the LLM (model.call, [model.acall], [model.acall_stream], etc.).
    • Streaming (chunked) vs. non-streaming is controlled via the stream parameter (see stream).
  3. Step decision: The interface uses should_continue to decide if another loop is necessary (e.g., did we reach max_turns? Did we see a termination tool call?).
  4. New message generation: If continuing, the interface may use make_new_messages to generate new messages (e.g. tool response messages).
  5. Logging: Prompts and responses are logged by default via the FileWriter infrastructure, and cost is tracked. (See also: generate_prompt_response_callbacks, which is used to write logs to disk.)
  6. Result extraction: The result is extracted from the response using extract_result. This is then bundled into an AIInterfacePayload object, which is returned from the arun method.

Relationship with Runtime: new-message handling and tool-response buffering

For interfaces that are tied to a Runtime, message handling extends beyond pure conversation:

  • Tool call execution: After LLM suggests tool calls and/or code snippets, the interface executes them via Runtime.run. This is done in the should_continue method.
  • Buffering responses: Responses from tool execution (e.g., instances of ToolResponseAIMessage) are buffered in the interface during should_continue, and then retrieved in make_new_messages for injecting into the next prompt's conversation history.

Model selection: precedence and logic

Selecting which LLM is used for a given AI interface follows a clear (and flexible) precedence order.

Precedence in BaseAIInterface.retrieve_model:

  1. Explicit model passed via task_config.preferred_model (highest priority)
  2. Explicit provider via task_config.preferred_provider
  3. Model name set in model_config
  4. Otherwise, fallback to the default (OpenAIModel)

Computer-use interface quirks

ComputerUseAIInterface extends ExecutionAIInterface, with adjustments tailored to models (currently OpenAI) that support native computer-use actions (browsing, clicking, scrolling, etc.).

Notable specifics of ComputerUseAIInterface

  • Uses a computer-use-specific system prompt (execution_cu_dev_message.txt), which contains extra machine-usable instructions (e.g., "You may only use one browser at a time"โ€ฆ)
  • Sets computer_use_mode=True so:
    • Only computer-use compatible tools/actions are exposed (see get_tool_specifications)
    • The prompt assembly includes richer perception (images, environment state) in messages
  • The computer use models require that only one computer-use environment (e.g. browser) be available at a time, enforced by only_keep_one_computer_use_environment=True