Internals: Action ¶
Scratchpad ¶
Why are Actions different than their equivalents in LangChain / Autogen?
- Allow arbitrary types: it doesn't need to be JSON-serializable, because we allow passing variables by reference
- Action transparency: you can use it like any other traditional function.
- BYOF: Bring your own function.
- Allow access to runtime: you can modify the state and context etc.
- Building blocks of a DSL, essentially
- Type checking!
Vocabulary ¶
Actions can be defined in multiple ways ¶
- As an
Actionsubclass- Base case: as a subclass of
Actiondirectly - Special case: as a
TypedActionsubclass
- Base case: as a subclass of
- With the
@actiondecorator- Base case: a static method / a standalone function
- Special case: a class method
- Special special case: conversion from
ActiontoTypedActionusingtypeify
In any case, the result is a class.
Handling Action subclasses ¶
Detecting action functions ¶
As seen previously, a user can define an action as a subclass of Action, which
enables them to write multiple methods within the class that interact with each
other.
from conatus import Action
class MyAction(Action):
def check_state_exists(self):
return getattr(self, "state")
@Action.function
def my_action(self):
if self.check_state_exists():
return self.state
return "State not found"
We see here that the user can define multiple methods within the class. How do
we know, then, what is underlying action function? This is why we need the
@Action.function decorator. Underneath, we use [mark_action_function
][conatus.actions.action.mark_action_function], which basically adds a__is_action_function` attribute to the function object.
Afterwards, we'll define the attribute of the Action subclass under the name
_action_function.
Why we did it that way
Adding an arbitrary attribute to a function is dirty and it makes type-checkers unhappy. Unfortunately, cleaner solutions (such as creating a new class or use a
Protocol )
were not able to propagate the types of the function properly to the
aforementioned type-checkers.
Action function checks ¶
We do a few checks on action functions :
-
We permit only one action function per class, though this may change in the future. This constraint is enforced by
ActionBlueprint.identify_action_function, which ensures that a class contains at least one action function and no more than one. If these conditions are violated, it raises eitherActionFunctionNotFoundErrororMultipleActionFunctionsError, respectively. -
Inheriting action functions is OK. If a class inherits from another class that has an action function, the child class will inherit the action function as well. This is done by
ActionBlueprint.detect_action_function_in_bases, -
It's OK for a base class not to have an action function.
-
If
_action_functionis defined, we assume it has been processed. In other words, if you define_action_functionin your class, the various dependent attributes such as_function_infowill not be computed. This essentially means that defining_action_functionis not allowed.
Schema extraction ¶
Like other libraries, we need to extract the schema of the function before we can pass it to the LLM.
The schema extraction process is done in three steps:
- We parse the function signature to extract the parameters and return type.
- We parse the docstring to extract the description of the function and its parameters.
- We reconcile the information from the function signature and the docstring.
Parsing the function signature ¶
Some notes on how we parse the function signature:
- Parameters:
- For each parameter in the function signature, we store the name, as well as the annotation and default value if we find any.
- If a parameter has no type hint, we will infer that the type is
Any. - We determine whether the type hint indicates a JSON-serializable value with
is_json_serializable_type. (See below for more information.) - If the parameter has a default value, we will infer that the argument is not required.
- If the parameter has no default value, we will infer that the argument is
required and put default value as
Ellipsis. - If any of the parameters has a type hint of the form
Annotated[<type>, *<metadata>], we will look in the<metadata>arguments for a simple string. If we find one, we will use that as the description for the argument. We only use the first string we find. Seeget_description_from_typehint. - We also store the kind of the parameter (e.g.
POSITIONAL_OR_KEYWORD). - For each parameter, we store all of this in a
ParamInfoFromSignatureobject.
- Return value:
- If there is a return type hint in the function signature, we will use that.
- If there is no return type hint, we will infer that the return type is
Any. - If the type hint is of the form
Annotated[<type>, *<metadata>], we will look for a simple string in the<metadata>arguments and use that as the description. - We store this return value in a
ReturnInfoFromSignatureobject.
- We store of params and return values in a
FunctionInfoFromSignatureobject.
Parsing docstrings ¶
We use Griffe's docstring parser to extract information from the docstring.
Some notes on how we parse docstrings:
- Style: We automatically detect the style of the docstring (Google, Numpy,
or Sphinx).
- For more, see the documentations of
infer_docstring_styleandDocstringStyle.
- For more, see the documentations of
- Parameters:
- We only look for the description associated with the name of the parameter. Lines in the arguments section that do not have a parameter at the beginning will be discarded. (This is Griffe's default behavior.)
- The type hint in the docstring is never used to infer the type of the
parameter. We always use the type hint in the function signature. If there
is no type hint in the signature, we will infer that the type is
Any. 1
-
Returns:
- If you return multiple items (e.g.
tuple[int, str]), you can write one line per item. -
Griffe expects a specific format for the
Returnssection. You can have multiple returns, but each return needs to precisely indent each line. In other words, it's very easy to get multiple returns by accident if you are not careful with the indentation.
- Another quirk of Griffe is that it expects the type hint to be in parentheses, which can also lead to bugs.
- We only support
Returnssections, but notYieldssections. We don't support asynchronous functions yet.
- If you return multiple items (e.g.
- We return all of this information in a
FunctionInfoFromDocstringobject.
Reconciling the function signature and docstring ¶
One core assumption of the Action function parser is that what is written in
the function signature (e.g. the type hints) takes precedence over what is
written in the docstring. 2
- Function description: The user has the option of providing a
descargument to the@actiondecorator. If this is provided, we will use this description and nothing else. If it is not provided, we will use the description in the docstring if it is present. - Parameters
- Type hint: We always use the type hint in the signature to infer the type of the parameter.
- Type hint for LLM: We use the type hint in the signature to generate the
type hint for LLMs, unless
override_type_hint_for_llm=Trueis passed to the@actiondecorator. (See below for more information.) - Description: We use the description of a parameter if it is present in the signature, and if it is not present in the signature, we use the description in the docstring.
- Default value / Is Required: We always use the default value in the signature to infer whether the parameter is required or not.
- Return value
- If there's only one return value in the docstring: We reconcile the
values for the return values according to the same rules that apply for
parameters, and we return a
ReturnInfoobject. - If there are multiple return values in the docstring: We first check
that the return value in the signature is a tuple 3 of the same length as
the number of return values in the docstring:
- If that is the case: We reconcile the values for the return values
according to the same rules that apply for parameters, and we return a
list of
ReturnInfoobjects. - If that is not the case: We raise a warning, flatten the return values
in the docstring, and return a single
ReturnInfoobject. The description of the return value will be a concatenation of the descriptions of the individual return values.
- If that is the case: We reconcile the values for the return values
according to the same rules that apply for parameters, and we return a
list of
- If there's only one return value in the docstring: We reconcile the
values for the return values according to the same rules that apply for
parameters, and we return a
Type hint for LLM ¶
Sometimes, the inferred types are very verbose. This can be a problem if we pass these raw types to the LLM:
- If we are using
Annotated, the description of the parameter is going to be passed twice: in the description field and in the type hint. - If you are using types from the
typinglibrary, the inferred types can be very verbose. - The types of some libraries are very verbose as well (e.g.
pandas).
Automated LLM-friendly type hint generation ¶
To solve this problem, we have implemented process_typehint
to generate a type
hint that is more succinct and that can be passed to the LLM:
from conatus import action
from typing import Annotated, Optional, Union
import pandas as pd
@action
def return_first_val(
x: Annotated[Optional[Union[pd.Series, pd.DataFrame]], "a series or a dataframe"]
) -> Optional[Union[int, str]]:
val = None
if isinstance(x, pd.Series):
val = x.iloc[0] if not x.empty else None
elif isinstance(x, pd.DataFrame):
val = x.iloc[0, 0] if not x.empty else None
return val
print(return_first_val._function_info.parameters["x"].type_hint)
# typing.Annotated[typing.Union[pandas.core.series.Series, ...
# ... pandas.core.frame.DataFrame, NoneType], 'a series or a dataframe']
print(return_first_val._function_info.parameters["x"].type_hint_for_llm)
# pandas.Series | pandas.DataFrame | None
Internally, this function does the following:
- It unpacks
AnnotatedandUniontypes. - It replaces
Literaltypes with the actual value. - It removes
typing.and<class '...'>from the type hint - It replaces
NoneTypewithNone. - It makes some popular types more succinct (e.g.
pandas.core.series.Series->pandas.Series).
It's also a recursive function, so it can handle nested types.
Overriding the LLM type hint with the docstring ¶
Sometimes, docstrings associate a parameter with a type hint that is distinct, and generally more succinct, than the one that is written in the type annotation:
# Here, the type hint is very verbose so we use 'int > 0' in the docstring.
from typing import Annotated
from pydantic import Field
def get_user(user_id: Annotated[int, Field(strict=True, gt=0)]) -> str:
"""Get a user by its ID.
Args:
user_id (int > 0): The ID of the user.
Returns:
(string): The username.
"""
pass
If you want to do this, and have the type hint in the docstring be the one that
is passed to the LLM, you can import your function with
@action(override_type_hint_for_llm=True). That way, the original type hint
will be preserved for type-checking, but if there's a type hint in the docstring
it will be that one that will be passed to the LLM.
Warning
You cannot override type hints per parameter. It will apply to every parameter that has a type hint in the docstring, as well as the return value.
Here is override_type_hint_for_llm=True in action:
from typing import Dict, List, Union, TypeVar, Generic
from conatus import action
T = TypeVar('T')
class Container(Generic[T]): pass
@action(override_type_hint_for_llm=True)
def complicated_function(
x: Union[Dict[str, List[Container[T]]], Container[List[T]]]
) -> List[T]:
"""Process a container of values.
Args:
x (dict[str, list] | list): A simpler description of the input type.
Returns:
list: The processed values.
"""
pass
print(complicated_function._function_info.parameters["x"].type_hint)
# typing.Union[typing.Dict[str, typing.List[__main__.Container[~T]]], ...
# ... __main__.Container[typing.List[~T]]]
print(complicated_function._function_info.parameters["x"].type_hint_for_llm)
# 'dict[str, list] | list'
Converting to OpenAI's strict JSON Schema format ¶
TODO
What is a JSON-serializable type? ¶
One of the core distinguishing features of Action is that it allows you to use
any arbitrary type as a parameter or return value. This is in contrast to
libraries like LangChain or Autogen, which require that all types be JSON-
serializable.
We do this because we can pass variables by reference. In this case, LLMs can call actions without specifying the exact convent of the variable, but just passing the name of a variable of a compatible type. (We'll see how we detect compatible types in the next section.)
To do this, we need to know whether a type hint is JSON-serializable.
This is a little more complex than it seems. Consider the following type hints:
from typing import Annotated, Optional, Union
import pandas as pd
# This is always JSON-serializable
a: int | str
# This never will be JSON-serializable
b: pd.Series | pd.DataFrame
# This is partially JSON-serializable
c: list[pd.Series | list[int]] # list[int] is JSON-serializable
In the first case, the type hint is always JSON-serializable. This means
that the LLM can represent the parameter a as an integer or a string. It can
create these values "on the fly" if it wants; in fact, that is the only way that
traditional AI agents libraries can call functions. But it can also pass a
variable by reference as well, if we find a variable of the correct type. In
other words, we tell the LLM that it can represent a in three different ways:
an integer, a string, or as a reference to a variable of type int or str
(which is itself a string).
In the second case, the type hint is never JSON-serializable. This means
that the LLM can only pass a variable by reference. If an Action requires a
parameter like b, we only expose it to the LLM if we find a variable of the
correct type in the runtime.
The third case is trickier: the type hint is JSON-serializable, but only
partially. If we take c as an example, the LLM can represent it as a list of
list of integers (which it can always do), or as a reference to a variable of
type list[pd.Series | list[int]]. This means that if we have a compatible
variable in the runtime, we expose the Action to the LLM and allow it to
represent the parameter in both ways. But if we don't have a compatible
variable, we can expose the action, but only allow the LLM to represent the
parameter as a list of list of integers.
This is why we need two pieces of information:
- Whether the LLM can ever represent the parameter in a JSON-serializable way, since it will tell us whether we can expose the action to the LLM even if we don't have a compatible variable in the runtime.
- What is the JSON-serializable subtype, since it will tell us how the LLM can represent the parameter if it doesn't have a compatible variable in the runtime. If the type hint is always JSON-serializable, this subtype will be the type hint itself.
In ParamInfoFromSignature
and
ParamInfo, we store
these two pieces of information in the is_json_serializable and
json_serializable_subtype attributes, respectively.
Type checking ¶
-
This is a limitation of Griffe. Whenever there's a discrepancy between the types in the signature and the docstring, Griffe returns a string of the type hint in the docstring. It is a little complex to
evalthis string to transform it into an actual type, and we so we default to using the type hint in the signature. ↩ -
There a few reasons for this. First, type hints in the signature are more likely to be correct than the ones in the docstring. Second, we assume that if a function describes a parameter both in the signature (through
Annotated, say) and in the docstring, the description in the signature has been intentionally written for the LLM. ↩ -
We support multiple return values ONLY if the return type is a raw
tuple. For instance,Annotated[tuple[int, str], *metadata]will not be treated as a tuple, but a single value. ↩