Skip to content

AI Messages: Inputs / Outputs Data Structures

Because Conatus works across multiple AI providers, it needs to be able to represent all AI inputs and outputs. This document covers the core data structures for everything you send to the AI model, everything you get back, and all the objects that appear in between (such as messages, content parts, tool calls, and usage).

See also

Summary of the main data structures

Concept Class(es) Purpose
Input prompt AIPrompt The entire user input to an AI model
Messages AIMessage, UserAIMessage, etc. Individual utterances, both in and out
Content parts UserAIMessageContentPart Structure of (possibly multi-media) message content
Tool calls AIToolCall, ComputerUseAction Requests from the AI to execute code (functions, tools)
Output/response AIResponse Everything the AI model returns (messages, tool calls, etc.)
Streaming (WIP) Incomplete* For partial/resumable/streamed results
Usage/cost CompletionUsage Track tokens, model name, and pricing

Unsupported (for now)

The following concepts are not yet supported:

  • Audio messages (inputs or outputs)
  • Image outputs (inputs are fine)
  • Citations / references
  • File uploads

Visual overview

This diagram shows the hierarchy of the main classes:

Message class hierarchy Click to view larger

Accordingly, we'll start first by looking at the (in red) AIPrompt class and AIResponse classes, since they are the main data structures you'll be using.

We'll then go one level below with the various AIMessage subclasses. (in green)

Finally, we'll look at the (in orange) UserAIMessageContentPart and (in blue) AssistantAIMessageContentPart classes, which are the building blocks of message content.

We will also look at the (in gray) AIToolCall and ComputerUseAction classes, which are the building blocks of tool calls.

The Input: AIPrompt Structure

(Once again, we're talking here about the messages in red in the image above.)

The AIPrompt class is the single entry point for everything you send to the AI model. It is designed to be flexible and extensible, handling:

  • Message histories (previous_messages, new_messages)
  • System/developer messages (system)
  • Tool/function call specifications (tools, actions). These are converted to an AIToolSpecification object, which is essentially a wrapper around a pydantic JSON schema.
  • Output schemas/structured output formats
  • "Computer use" configuration (for browser or simulated agent interfaces)
from conatus import AIPrompt, action

@action
def my_adder_function(a: int, b: int) -> int:
    return a + b

prompt = AIPrompt(
    user="What's the sum of 2 and 2?",
    system="Be concise.",
    actions=[my_adder_function],               # For tool use
    output_schema=int,            # Request structured output
)

There are three ways to initialize the AIPrompt class:

  1. Simple, new conversation: Pass a string to user. This will create a new conversation with a single message.
  2. Conversation with multiple messages: Pass a list of messages to messages. This will create a new conversation with the given messages.
  3. Conversation with history: You can make a distinction between previous and new messages. This is helpful because some AI providers (like OpenAI's Responses API) allow to send only new messages, as long as you provide a previous messages ID. In this case, you need to pass (1) a list of new_messages as well as (2) a list of previous_messages, previous_messages_id, or both.

In each case, you can optionally pass a system message with system, a list of tool specifications with tools, a list of actions with actions, and an output schema with output_schema.

For more information, see AIPrompt's API reference.

Example: Passing actions as tools

from conatus import action, AIPrompt

@action
def add(a: int, b: int) -> int:
    return a + b

prompt = AIPrompt(
    user="Add 100 and 23.",
    actions=[add],
    system="Use tools if needed.",
)

Example: Structured output

from conatus import AIPrompt
from typing_extensions import TypedDict

class MenuItem(TypedDict):
    name: str
    price: float

prompt = AIPrompt(
    user="Give me 3 coffee drinks with prices.",
    output_schema=list[MenuItem],
)

Example: Conversation history

from conatus import AIPrompt
from conatus.models.inputs_outputs.messages import UserAIMessage, AssistantAIMessage

prompt = AIPrompt(
    previous_messages=[
        UserAIMessage(content="What's 5+7?"),
        AssistantAIMessage.from_text("12"),
    ],
    new_messages=[
        UserAIMessage(content="What about 8+9?")
    ],
    # This is the ID of the previous response,
    # which is given by some AI providers (e.g. OpenAI's Responses API)
    previous_messages_id="resp_123",
)

Once again, for more, see AIPrompt's API reference.

The Output: AIResponse Structure

(Once again, we're talking here about the messages in green in the image above.)

The AIResponse class encapsulates everything returned by an AI model call. This includes:

from conatus import AIPrompt
from conatus.models import OpenAIModel

model = OpenAIModel()
prompt = AIPrompt(user="What's the sum of 2 and 2?")
response = model.call(prompt)
print(response.message_received) # The AI's message (as an AssistantAIMessage)
print(response.all_text)         # The text content of the AI's message
print(response.code_snippets)    # Any code snippets in the text
print(response.tool_calls)       # Any requested tool calls
print(response.structured_output)  # Parsed to the requested schema/type
print(response.usage)           # Tokens/cost info

For more information, see AIResponse's API reference.

Message Hierarchy (AIMessage)

(Once again, we're talking here about the messages in green in the image above.)

Messages are the backbone of any conversation. Conatus represents them via a class hierarchy:

Here's an example of a simple conversation:

from conatus.models.inputs_outputs.messages import UserAIMessage, AssistantAIMessage

m1 = UserAIMessage(content="Hi!")
m2 = AssistantAIMessage.from_text("Hello. How can I help you?")

Message (Content) Parts

Many messages (especially those going to/from models like OpenAI/Anthropic) may contain rich, multi-part content, not just text:

  • Text (multi-part or segmented)
  • Images (base64 or URLs)
  • Tool calls (serialized commands)
  • Refusals, rationale, etc.

Conatus has standardized "content part" classes for both users and assistants.

User Message Content Parts

(Once again, we're talking here about the messages in orange in the image above.)

from conatus import AIPrompt
from conatus.models import OpenAIModel
from conatus.models.inputs_outputs.messages import UserAIMessage
from conatus.models.inputs_outputs.content_parts import (
    UserAIMessageContentTextPart,
    UserAIMessageContentImagePart,
)

URL = "https://upload.wikimedia.org/wikipedia/commons/thumb/8/85/Tour_Eiffel_Wikimedia_Commons_%28cropped%29.jpg/1280px-Tour_Eiffel_Wikimedia_Commons_%28cropped%29.jpg"

image_part = UserAIMessageContentImagePart(image_data=URL, is_base64=False)
text_part = UserAIMessageContentTextPart(text="Where was this picture taken?")

m = UserAIMessage(content=[text_part, image_part])

prompt = AIPrompt(messages=[m])
model = OpenAIModel()
response = model.call(prompt)
print(response.all_text)
# > This picture was taken in Paris, France, featuring the Eiffel Tower.

Assistant Message Content Parts

(Once again, we're talking here about the messages in blue in the image above.)

See Content Parts for more details.

Tool Calls

(Once again, we're talking here about the messages in gray in the image above.)

Tool calls allow the AI to ask the runtime to invoke a function, perform a computer use action, or otherwise call tools.

Function/Tool Calls

AIToolCall is the protocol-agnostic structure:

  • name: Function/tool name to call
  • returned_arguments: Arguments, which can be passed as a dict or a string.
  • call_id: Which needs to be mentioned when we pass back the result of the tool call to the AI model.

The arguments are accessible as dict or str via two helper properties:

Computer Use Actions

Some providers (OpenAI/Anthropic) sometimes use special tool calls to perform computer use actions. We unify them under the ComputerUseAction family:

  • Simulates clicks, typing, scrolling, screenshots, etc.
  • Each type (e.g., CULeftMouseClick) is a dataclass with clearly specified fields

Note that these computer use objects are somewhat tightly linked to the Runtime class, which is responsible for executing tool calls. In particular, the ComputerUseAction objects requires a environment_variable argument that refers to a RuntimeVariable object.

These tool call objects appear in assistant messages as content parts, and may show up in responses (and can be passed to the execution/runtime subsystem).

See Tool Calls for more details.

Usage and Cost Tracking

Each AI response may have usage and pricing statistics attached.

CompletionUsage records:

Note that you can add CompletionUsage instances together, which is useful for accumulating usage during streaming.

Finally, the cost property can be used to compute the cost of the response. This requires that the model name and prices are present in the conatus.models.prices module.

Streaming and Incomplete Structures

To handle streaming/partial results, Conatus provides a family of Incomplete* classes.

  • You can add them together with + (e.g., combine chunks of output)
  • Each has a .complete() method which "finalizes" an object (for example, turning a partial message into a proper AssistantAIMessage)
  • Handles merging text/tool calls smartly.

This protocol is crucial for implementers writing streaming adapters for new models/APIs.

See Streaming and Incompletes for more details.

API Reference Summary

These documentation pages provide full auto-documentation for each data structure and its methods:

Where do I start if I want to implement my own model/provider?

See How-To: Add a new AI provider and refer to the BaseAIModel API docs for the expected protocols and required conversions.