Skip to content

Internals: Action

Scratchpad

Why are Actions different than their equivalents in LangChain / Autogen?

  1. Allow arbitrary types: it doesn't need to be JSON-serializable, because we allow passing variables by reference
  2. Action transparency: you can use it like any other traditional function.
  3. BYOF: Bring your own function.
  4. Allow access to runtime: you can modify the state and context etc.
  5. Building blocks of a DSL, essentially
  6. Type checking!

Vocabulary

Actions can be defined in multiple ways

  1. As an Action subclass
    • Base case: as a subclass of Action directly
    • Special case: as a TypedAction subclass
  2. With the @action decorator
    • Base case: a static method / a standalone function
    • Special case: a class method
  3. Special special case: conversion from Action to TypedAction using typeify

In any case, the result is a class.

Handling Action subclasses

Detecting action functions

As seen previously, a user can define an action as a subclass of Action, which enables them to write multiple methods within the class that interact with each other.

from conatus import Action

class MyAction(Action):

    def check_state_exists(self):
        return getattr(self, "state")

    @Action.function
    def my_action(self):
        if self.check_state_exists():
            return self.state
        return "State not found"

We see here that the user can define multiple methods within the class. How do we know, then, what is underlying action function? This is why we need the @Action.function decorator. Underneath, we use [mark_action_function ][conatus.actions.action.mark_action_function], which basically adds a__is_action_function` attribute to the function object.

Afterwards, we'll define the attribute of the Action subclass under the name _action_function.

Why we did it that way

Adding an arbitrary attribute to a function is dirty and it makes type-checkers unhappy. Unfortunately, cleaner solutions (such as creating a new class or use a

Protocol ) were not able to propagate the types of the function properly to the aforementioned type-checkers.

Action function checks

We do a few checks on action functions :

  1. We permit only one action function per class, though this may change in the future. This constraint is enforced by ActionBlueprint.identify_action_function, which ensures that a class contains at least one action function and no more than one. If these conditions are violated, it raises either ActionFunctionNotFoundError or MultipleActionFunctionsError , respectively.

  2. Inheriting action functions is OK. If a class inherits from another class that has an action function, the child class will inherit the action function as well. This is done by ActionBlueprint.detect_action_function_in_bases,

  3. It's OK for a base class not to have an action function.

  4. If _action_function is defined, we assume it has been processed. In other words, if you define _action_function in your class, the various dependent attributes such as _function_info will not be computed. This essentially means that defining _action_function is not allowed.

Schema extraction

Like other libraries, we need to extract the schema of the function before we can pass it to the LLM.

The schema extraction process is done in three steps:

  1. We parse the function signature to extract the parameters and return type.
  2. We parse the docstring to extract the description of the function and its parameters.
  3. We reconcile the information from the function signature and the docstring.

Parsing the function signature

Some notes on how we parse the function signature:

  • Parameters:
    • For each parameter in the function signature, we store the name, as well as the annotation and default value if we find any.
    • If a parameter has no type hint, we will infer that the type is Any.
    • We determine whether the type hint indicates a JSON-serializable value with is_json_serializable_type . (See below for more information.)
    • If the parameter has a default value, we will infer that the argument is not required.
    • If the parameter has no default value, we will infer that the argument is required and put default value as Ellipsis.
    • If any of the parameters has a type hint of the form Annotated[<type>, *<metadata>], we will look in the <metadata> arguments for a simple string. If we find one, we will use that as the description for the argument. We only use the first string we find. See get_description_from_typehint.
    • We also store the kind of the parameter (e.g. POSITIONAL_OR_KEYWORD).
    • For each parameter, we store all of this in a ParamInfoFromSignature object.
  • Return value:
    • If there is a return type hint in the function signature, we will use that.
    • If there is no return type hint, we will infer that the return type is Any.
    • If the type hint is of the form Annotated[<type>, *<metadata>], we will look for a simple string in the <metadata> arguments and use that as the description.
    • We store this return value in a ReturnInfoFromSignature object.
  • We store of params and return values in a FunctionInfoFromSignature object.

Parsing docstrings

We use Griffe's docstring parser to extract information from the docstring.

Some notes on how we parse docstrings:

  • Style: We automatically detect the style of the docstring (Google, Numpy, or Sphinx).
  • Parameters:
    • We only look for the description associated with the name of the parameter. Lines in the arguments section that do not have a parameter at the beginning will be discarded. (This is Griffe's default behavior.)
    • The type hint in the docstring is never used to infer the type of the parameter. We always use the type hint in the function signature. If there is no type hint in the signature, we will infer that the type is Any. 1
  • Returns:

    • If you return multiple items (e.g. tuple[int, str]), you can write one line per item.
    • Griffe expects a specific format for the Returns section. You can have multiple returns, but each return needs to precisely indent each line. In other words, it's very easy to get multiple returns by accident if you are not careful with the indentation.

      """Function description.
      
      Returns:
          <name> (<type>): <description----blah----blah----blah
              blah----blah----blah----blah----blah----blah>
          <name> (<type>): <description>
      """
      
    • Another quirk of Griffe is that it expects the type hint to be in parentheses, which can also lead to bugs.
    • We only support Returns sections, but not Yields sections. We don't support asynchronous functions yet.

Reconciling the function signature and docstring

One core assumption of the Action function parser is that what is written in the function signature (e.g. the type hints) takes precedence over what is written in the docstring. 2

  • Function description: The user has the option of providing a desc argument to the @action decorator. If this is provided, we will use this description and nothing else. If it is not provided, we will use the description in the docstring if it is present.
  • Parameters
    • Type hint: We always use the type hint in the signature to infer the type of the parameter.
    • Type hint for LLM: We use the type hint in the signature to generate the type hint for LLMs, unless override_type_hint_for_llm=True is passed to the @action decorator. (See below for more information.)
    • Description: We use the description of a parameter if it is present in the signature, and if it is not present in the signature, we use the description in the docstring.
    • Default value / Is Required: We always use the default value in the signature to infer whether the parameter is required or not.
  • Return value
    • If there's only one return value in the docstring: We reconcile the values for the return values according to the same rules that apply for parameters, and we return a ReturnInfo object.
    • If there are multiple return values in the docstring: We first check that the return value in the signature is a tuple 3 of the same length as the number of return values in the docstring:
      • If that is the case: We reconcile the values for the return values according to the same rules that apply for parameters, and we return a list of ReturnInfo objects.
      • If that is not the case: We raise a warning, flatten the return values in the docstring, and return a single ReturnInfo object. The description of the return value will be a concatenation of the descriptions of the individual return values.

Type hint for LLM

Sometimes, the inferred types are very verbose. This can be a problem if we pass these raw types to the LLM:

  1. If we are using Annotated, the description of the parameter is going to be passed twice: in the description field and in the type hint.
  2. If you are using types from the typing library, the inferred types can be very verbose.
  3. The types of some libraries are very verbose as well (e.g. pandas).

Automated LLM-friendly type hint generation

To solve this problem, we have implemented process_typehint to generate a type hint that is more succinct and that can be passed to the LLM:

from conatus import action

from typing import Annotated, Optional, Union
import pandas as pd

@action
def return_first_val(
    x: Annotated[Optional[Union[pd.Series, pd.DataFrame]], "a series or a dataframe"]
) -> Optional[Union[int, str]]:
    val = None
    if isinstance(x, pd.Series):
        val = x.iloc[0] if not x.empty else None
    elif isinstance(x, pd.DataFrame):
        val = x.iloc[0, 0] if not x.empty else None
    return val

print(return_first_val._function_info.parameters["x"].type_hint)
# typing.Annotated[typing.Union[pandas.core.series.Series, ...
# ... pandas.core.frame.DataFrame, NoneType], 'a series or a dataframe']
print(return_first_val._function_info.parameters["x"].type_hint_for_llm)
# pandas.Series | pandas.DataFrame | None

Internally, this function does the following:

  1. It unpacks Annotated and Union types.
  2. It replaces Literal types with the actual value.
  3. It removes typing. and <class '...'> from the type hint
  4. It replaces NoneType with None.
  5. It makes some popular types more succinct (e.g. pandas.core.series.Series -> pandas.Series).

It's also a recursive function, so it can handle nested types.

Overriding the LLM type hint with the docstring

Sometimes, docstrings associate a parameter with a type hint that is distinct, and generally more succinct, than the one that is written in the type annotation:

#  Here, the type hint is very verbose so we use 'int > 0' in the docstring.
from typing import Annotated
from pydantic import Field

def get_user(user_id: Annotated[int, Field(strict=True, gt=0)]) -> str:
    """Get a user by its ID.

     Args:
        user_id (int > 0): The ID of the user.

    Returns:
        (string): The username.
    """
    pass

If you want to do this, and have the type hint in the docstring be the one that is passed to the LLM, you can import your function with @action(override_type_hint_for_llm=True). That way, the original type hint will be preserved for type-checking, but if there's a type hint in the docstring it will be that one that will be passed to the LLM.

Warning

You cannot override type hints per parameter. It will apply to every parameter that has a type hint in the docstring, as well as the return value.

Here is override_type_hint_for_llm=True in action:

from typing import Dict, List, Union, TypeVar, Generic
from conatus import action

T = TypeVar('T')
class Container(Generic[T]): pass

@action(override_type_hint_for_llm=True)
def complicated_function(
    x: Union[Dict[str, List[Container[T]]], Container[List[T]]]
) -> List[T]:
    """Process a container of values.

    Args:
        x (dict[str, list] | list): A simpler description of the input type.

    Returns:
        list: The processed values.
    """
    pass

print(complicated_function._function_info.parameters["x"].type_hint)
# typing.Union[typing.Dict[str, typing.List[__main__.Container[~T]]], ...
# ... __main__.Container[typing.List[~T]]]
print(complicated_function._function_info.parameters["x"].type_hint_for_llm)
# 'dict[str, list] | list'

Converting to OpenAI's strict JSON Schema format

TODO

What is a JSON-serializable type?

One of the core distinguishing features of Action is that it allows you to use any arbitrary type as a parameter or return value. This is in contrast to libraries like LangChain or Autogen, which require that all types be JSON- serializable.

We do this because we can pass variables by reference. In this case, LLMs can call actions without specifying the exact convent of the variable, but just passing the name of a variable of a compatible type. (We'll see how we detect compatible types in the next section.)

To do this, we need to know whether a type hint is JSON-serializable.

This is a little more complex than it seems. Consider the following type hints:

from typing import Annotated, Optional, Union
import pandas as pd

# This is always JSON-serializable
a: int | str

# This never will be JSON-serializable
b: pd.Series | pd.DataFrame

# This is partially JSON-serializable
c: list[pd.Series | list[int]] # list[int] is JSON-serializable

In the first case, the type hint is always JSON-serializable. This means that the LLM can represent the parameter a as an integer or a string. It can create these values "on the fly" if it wants; in fact, that is the only way that traditional AI agents libraries can call functions. But it can also pass a variable by reference as well, if we find a variable of the correct type. In other words, we tell the LLM that it can represent a in three different ways: an integer, a string, or as a reference to a variable of type int or str (which is itself a string).

In the second case, the type hint is never JSON-serializable. This means that the LLM can only pass a variable by reference. If an Action requires a parameter like b, we only expose it to the LLM if we find a variable of the correct type in the runtime.

The third case is trickier: the type hint is JSON-serializable, but only partially. If we take c as an example, the LLM can represent it as a list of list of integers (which it can always do), or as a reference to a variable of type list[pd.Series | list[int]]. This means that if we have a compatible variable in the runtime, we expose the Action to the LLM and allow it to represent the parameter in both ways. But if we don't have a compatible variable, we can expose the action, but only allow the LLM to represent the parameter as a list of list of integers.

This is why we need two pieces of information:

  1. Whether the LLM can ever represent the parameter in a JSON-serializable way, since it will tell us whether we can expose the action to the LLM even if we don't have a compatible variable in the runtime.
  2. What is the JSON-serializable subtype, since it will tell us how the LLM can represent the parameter if it doesn't have a compatible variable in the runtime. If the type hint is always JSON-serializable, this subtype will be the type hint itself.

In ParamInfoFromSignature and ParamInfo, we store these two pieces of information in the is_json_serializable and json_serializable_subtype attributes, respectively.

Type checking


  1. This is a limitation of Griffe. Whenever there's a discrepancy between the types in the signature and the docstring, Griffe returns a string of the type hint in the docstring. It is a little complex to eval this string to transform it into an actual type, and we so we default to using the type hint in the signature. 

  2. There a few reasons for this. First, type hints in the signature are more likely to be correct than the ones in the docstring. Second, we assume that if a function describes a parameter both in the signature (through Annotated, say) and in the docstring, the description in the signature has been intentionally written for the LLM. 

  3. We support multiple return values ONLY if the return type is a raw tuple. For instance, Annotated[tuple[int, str], *metadata] will not be treated as a tuple, but a single value.