AI Interfaces
Work in progress
This page is a work in progress. Generally speaking, you should be able to find most of the information you need in the following pages:
Prompt assembly helpers and variable/image representation ¶
A major feature of AI interfaces in Conatus is the ability to precisely
construct context-rich promptsโoften in semi-structured XMLโusing helpers
provided by
BaseAIInterfaceWithTask.
The XML-construction methods ¶
You frequently encounter .get_*_xml() methods, each returning structured text
or token-part representations describing some facet of the task or environment:
get_task_definition_xml: Wraps the full user-defined task, expected inputs/outputs, etc.get_all_variables_xml: Lists all variables in the current runtime state, including optional image parts ifinclude_images=True.get_steps_so_far_xml: Returns a code-style log of execution history up to this point.get_last_step_xml: Information on the most recent step, including code and output.get_docstrings_for_all_actions_xml: Details every action available, formatted (by default) for XML.get_modified_variables_xml: Isolates only the variables recently modified (often after tool calls).
The choice of XML is motivated by the fact that AI labs have found it to be an effective way to structure LLM prompts.
For more detail on these methods, see their respective docstrings in
BaseAIInterfaceWithTask.
Conversation loop: turn handling, streaming vs. non-streaming, logging ¶
Every AI interface follows a managed turn-based loop orchestrated primarily
via BaseAIInterface. This
loop is embodied in the
arun and
run methods.
The conversation turn logic ¶
At a high level, the loop operates as follows (see
BaseAIInterface.arun):
- Prompt construction: The interface generates an
AIPromptviamake_prompt.- On first turn: calls
make_first_prompt. - On subsequent turns: calls
make_new_prompt. (The decision on which method to call is made inmake_promptbased on the value ofturn_count.)
- On first turn: calls
- Model call: The prompt is synchronously or asynchronously passed to the
LLM (
model.call, [model.acall], [model.acall_stream], etc.).- Streaming (chunked) vs. non-streaming is controlled via the
streamparameter (seestream).
- Streaming (chunked) vs. non-streaming is controlled via the
- Step decision: The interface uses
should_continueto decide if another loop is necessary (e.g., did we reachmax_turns? Did we see a termination tool call?). - New message generation: If continuing, the interface may use
make_new_messagesto generate new messages (e.g. tool response messages). - Logging: Prompts and responses are logged by default via the
FileWriterinfrastructure, and cost is tracked. (See also:generate_prompt_response_callbacks, which is used to write logs to disk.) - Result extraction: The result is extracted from the response using
extract_result. This is then bundled into anAIInterfacePayloadobject, which is returned from thearunmethod.
Relationship with Runtime: new-message handling and tool-response buffering ¶
For interfaces that are tied to a Runtime, message
handling extends beyond pure conversation:
- Tool call execution: After LLM suggests tool calls and/or code snippets,
the interface executes them via
Runtime.run. This is done in theshould_continuemethod. - Buffering responses: Responses from tool execution (e.g., instances of
ToolResponseAIMessage) are buffered in the interface duringshould_continue, and then retrieved inmake_new_messagesfor injecting into the next prompt's conversation history.
Model selection: precedence and logic ¶
Selecting which LLM is used for a given AI interface follows a clear (and flexible) precedence order.
Precedence in
BaseAIInterface.retrieve_model:
- Explicit model passed via
task_config.preferred_model(highest priority) - Explicit provider via
task_config.preferred_provider - Model name set in
model_config - Otherwise, fallback to the default
(
OpenAIModel)
Computer-use interface quirks ¶
ComputerUseAIInterface
extends
ExecutionAIInterface,
with adjustments tailored to models (currently OpenAI) that support native
computer-use actions (browsing, clicking, scrolling, etc.).
Notable specifics of ComputerUseAIInterface ¶
- Uses a computer-use-specific system prompt (
execution_cu_dev_message.txt), which contains extra machine-usable instructions (e.g., "You may only use one browser at a time"โฆ) - Sets
computer_use_mode=Trueso:- Only computer-use compatible tools/actions are exposed (see
get_tool_specifications) - The prompt assembly includes richer perception (images, environment state) in messages
- Only computer-use compatible tools/actions are exposed (see
- The computer use models require that only one computer-use environment (e.g.
browser) be available at a time, enforced by
only_keep_one_computer_use_environment=True