Skip to content

DOM Nodes

conatus.utils.browser.dom.nodes

A few classes that are representations for underlying HTML nodes.

The three classes are :

  • DOMNode: The main class that represents a node in the DOM. It is largely derived from ChromeDOMNodeData, but is cleaner and is packaged as a tree structure. It is the last step in the processing of the DOM.
  • NodeRef: A class that is used to disambiguate between different nodes in the HTML. It is meant to be read from, and written to, YAML files.
  • NodeTypeEnum: An enum that represents the nodeType attribute of a DOM node. It's a little nerdy, but can be useful when traversing the DOM.

Additional references

BoundsType module-attribute

BoundsType = tuple[
    int | float, int | float, int | float, int | float
]

Bounds of a node, represented as a tuple of 4 integers or floats.

Note that the bounds might mean different things depending on the context! Sometimes, the format is: (x, y, width, height) where (x, y) is the top-left corner and (width, height) is the width and height of the node.

Other times, the format is: (x1, y1, x2, y2) where (x1, y1) is the top-left corner and (x2, y2) is the bottom-right corner.

NodeTypeEnum

Bases: IntEnum

nodeType enum.

This is a representation of the nodeType attribute of a DOM node. nodeType can be something like an Element, an Attribute, a Text node, a Document node, etc. Not all nodes are created equal.

More information on nodeType can be found on the MDN Web Docs. The descriptions below are taken from the MDN Web Docs.

Element class-attribute instance-attribute

Element = 1

An Element node like <p> or <div>.

Attribute class-attribute instance-attribute

Attribute = 2

An Attribute of an Element.

Text class-attribute instance-attribute

Text = 3

The actual Text inside an Element or Attr.

Cdata_section class-attribute instance-attribute

Cdata_section = 4

A CDATASection, such as <!CDATA[[ … ]]> (It's for unescaped text.)

Comment class-attribute instance-attribute

Comment = 8

A Comment node.

Document class-attribute instance-attribute

Document = 9

A Document node.

Document_type class-attribute instance-attribute

Document_type = 10

A DocumentType node, such as <!DOCTYPE html>.

Document_fragment class-attribute instance-attribute

Document_fragment = 11

A DocumentFragment node.

DOMNodeAttributes

Bases: BaseModel

Significant attributes of a DOM node.

We curate the most important attributes of a DOM node.

We use this class to make it easier to access the most important attributes of a DOM node.

__hash__

__hash__() -> int

Hash the node.

We use the uid field to hash the node.

RETURNS DESCRIPTION
int

The hash of the node.

TYPE: int

Source code in conatus/utils/browser/dom/nodes.py
@override
def __hash__(self) -> int:
    """Hash the node.

    We use the `uid` field to hash the node.

    Returns:
        int: The hash of the node.
    """
    return (
        hash(self.href)
        ^ hash(self.class_name)
        ^ hash(self.type)
        ^ hash(self.name)
        ^ hash(self.value)
        ^ hash(self.aria_label)
        ^ hash(self.node_name)
    )

DOMNode

Bases: BaseModel

Node in the DOM.

🐴 A workhorse for HTML handling: DOMNode is meant to be the primary interface for developers who need to manipulate the DOM. It is a tree structure that is derived from ChromeDOMNodeData and its associated classes. To put it visually, we are at the right-most part of the diagram below:

How the DOM is processed

  • The DOMNode is generally used in conjunction with theProcessedDOM class, our other workhorse for DOM handling.

DOM node ≠ HTML node: The DOM, as a tree structure, is quite different from the HTML. For instance, a tag like <div>Hi</div> is represented by two nodes in the DOM: one for the div tag and one for the text inside it. For more information, see the MDN Web Docs.

from conatus.utils.browser.dom.nodes import DOMNode, NodeTypeEnum
from conatus.utils.browser.dom.fixtures import example_chrome_dom_inputtypes

inputtypes_dom = example_chrome_dom_inputtypes()
width = 1100
node: DOMNode = inputtypes_dom.process_nodes(width)[78]
assert node.node_name == "div"
assert node.is_clickable == False
assert node.identifying_attributes.class_name == "header__buttons"
assert node.center == (730, 39)
assert len(node.children) == 3
assert node.node_type == NodeTypeEnum.Element
ATTRIBUTES SPECIFIC TO `DOMNODE` DESCRIPTION
index

Index of the node in the DOM.

TYPE: int

parent_node

Parent node of the current node.

TYPE: DOMNode | None

children

Children of the node.

TYPE: list[DOMNode]

identifying_attributes

Identifying attributes of the node. In practice, this is a filter over the following attributes: "id", "class", "name", "type", and "value". This is useful for disambiguation.

TYPE: DOMNodeAttributes

bounds

Bounds of the node.

TYPE: BoundsType | None

center

Center of the node.

TYPE: tuple[int, int] | None

llm_id

ID of the node in the LLM. Only nodes representing elements that are clickable or inputs, and that are visible on the screen have an LLM ID.

TYPE: int | None

node_ref

The NodeRef representation of the node.

TYPE: NodeRef

uid

Unique identifier for the node. Used for hashing.

TYPE: str

ATTRIBUTES INHERITED FROM `CHROMEDOM` DESCRIPTION
node_type

nodeType of the node. See NodeTypeEnum.

TYPE: NodeTypeEnum

node_name

nodeName of the node. See ChromeDOMNodeData.

TYPE: str

node_value

nodeValue of the node. See ChromeDOMNodeData.

TYPE: str | None

attributes

Attributes of an Element node. See ChromeDOMNodeData.

TYPE: dict[str, str]

input_value

Input value of the node (if input node). See ChromeDOMNodeData.

TYPE: str | None

input_checked

Whether the input is checked. See ChromeDOMNodeData.

TYPE: bool

is_clickable

Whether the node is clickable. See ChromeDOMNodeData.

TYPE: bool

option_selected

Whether the option is selected. See ChromeDOMNodeData.

TYPE: bool

offset_rects

Offset rects of the node. See ChromeDOMNodeData.

TYPE: BoundsType | None

scroll_rects

Scroll rects of the node. See ChromeDOMNodeData.

TYPE: BoundsType | None

uid property writable

uid: str

Unique identifier for the node.

RETURNS DESCRIPTION
str

The unique identifier for the node.

TYPE: str

inner_text property

inner_text: str

Get the inner text of the node.

This will return the text inside the node, as well as the text inside its children.

RETURNS DESCRIPTION
str

The inner text of the node.

TYPE: str

class_names property

class_names: list[str]

Get the class names of the node.

RETURNS DESCRIPTION
list[str]

list[str]: The class names of the node.

__repr__

__repr__(indent: int = 0) -> str

Return a string representation of the node.

Taken from Globot, this was mostly used to send a version of the HTML to LLMs. We use it to print the HTML.

PARAMETER DESCRIPTION
indent

The indentation level.

TYPE: int DEFAULT: 0

RETURNS DESCRIPTION
str

The string representation of the node.

TYPE: str

Source code in conatus/utils/browser/dom/nodes.py
@override
def __repr__(self, indent: int = 0) -> str:
    """Return a string representation of the node.

    Taken from Globot, this was mostly used to send a version
    of the HTML to LLMs. We use it to print the HTML.

    Args:
        indent: The indentation level.

    Returns:
        str: The string representation of the node.
    """
    if self.node_name == "#text":
        return " " * indent + (self.node_value or "")

    identifying_attributes: dict[str, str] = (
        self.identifying_attributes.model_dump(
            exclude_none=True,
            by_alias=True,
            exclude={"node_name", "uid", "uid_"},
        )
    )
    attr_str = " ".join(
        [f'{k}="{v}"' for k, v in identifying_attributes.items()]
    )
    attr_str = " " + attr_str if (attr_str not in {"", "{}"}) else ""
    open_tag = f"<{self.node_name}{attr_str}>"
    close_tag = f"</{self.node_name}>"

    if len(self.children) == 0:
        return (" " * indent + open_tag) + (
            close_tag if self.node_name not in VOID_ELEMENTS else ""
        )

    # special case for elements with only one text child -> one-line element
    if len(self.children) == 1 and self.children[0].node_name == "#text":
        return (
            (" " * indent + open_tag)
            + self.children[0].__repr__()
            + close_tag
        )

    children_repr = "\n".join(
        [child.__repr__(indent + 2) for child in self.children]
    )
    return (
        (" " * indent + open_tag)
        + "\n"
        + children_repr
        + "\n"
        + (" " * indent + close_tag)
    )

__str__

__str__() -> str

Return a string representation of the node.

RETURNS DESCRIPTION
str

The string representation of the node.

TYPE: str

Source code in conatus/utils/browser/dom/nodes.py
@override
def __str__(self) -> str:
    """Return a string representation of the node.

    Returns:
        str: The string representation of the node.
    """
    return self.__repr__()

on_screen

on_screen(screen_bounds: BoundsType) -> bool

Check if the node (or one of its children) is on screen.

PARAMETER DESCRIPTION
screen_bounds

Bounds of the screen as (x1, y1, x2, y2) where x1,y1 is the top-right corner and x2,y2 is the bottom-right corner.

TYPE: BoundsType

RETURNS DESCRIPTION
bool

Whether the node is on screen.

TYPE: bool

Source code in conatus/utils/browser/dom/nodes.py
def on_screen(self, screen_bounds: BoundsType) -> bool:
    """Check if the node (or one of its children) is on screen.

    Args:
        screen_bounds: Bounds of the screen as (x1, y1, x2, y2) where
            x1,y1 is the top-right corner and x2,y2 is the bottom-right
            corner.

    Returns:
        bool: Whether the node is on screen.
    """
    if len(self.children) > 0:
        return any(
            child.on_screen(screen_bounds) for child in self.children
        )

    if self.bounds is None or self.bounds[2] * self.bounds[3] == 0:
        return False

    x, y, w, h = self.bounds
    screen_x1, screen_y1, screen_x2, screen_y2 = screen_bounds

    return (
        x < screen_x2  # node's left edge is left of screen's right edge
        and (x + w) > screen_x1  # node's right is right of screen's left
        and y < screen_y2  # node's top is above screen's bottom
        and (y + h) > screen_y1  # node's bottom is below screen's top
    )

__hash__

__hash__() -> int

Hash the node.

We use the uid field to hash the node.

RETURNS DESCRIPTION
int

The hash of the node.

TYPE: int

Source code in conatus/utils/browser/dom/nodes.py
@override
def __hash__(self) -> int:
    """Hash the node.

    We use the `uid` field to hash the node.

    Returns:
        int: The hash of the node.
    """
    return hash(self.uid)

__eq__

__eq__(other: DOMNode | object) -> bool

Check if the node is equal to another node.

RETURNS DESCRIPTION
bool

Whether the node is equal to another node.

TYPE: bool

Source code in conatus/utils/browser/dom/nodes.py
@override
def __eq__(self, other: DOMNode | object) -> bool:
    """Check if the node is equal to another node.

    Returns:
        bool: Whether the node is equal to another node.
    """
    if not isinstance(other, DOMNode):
        return False
    return self.uid == other.uid

NodeRef

Bases: BaseModel

Edited representation of a node in the HTML.

NodeRef is used to disambiguate between different nodes in the HTML. If needed, we write it as YAML in the recipe folders.

from conatus.utils.browser.dom.nodes import NodeRef, DOMNode
from conatus.utils.browser.dom.fixtures import example_chrome_dom_inputtypes

inputtypes_dom = example_chrome_dom_inputtypes()
width = 1100
node: DOMNode = inputtypes_dom.process_nodes(width)[78]
node_ref: NodeRef = node.node_ref
assert node_ref.id == None
assert node_ref.class_name == "header__buttons"
assert node_ref.onclick == None
# etc.
ATTRIBUTE DESCRIPTION
id

The id attribute of the node in the HTML.

TYPE: str | None

href

The href attribute of the node in the HTML.

TYPE: str | None

class_name

The class attribute of the node in the HTML.

TYPE: str | None

type

The type of the node.

TYPE: str | None

bounds

The bounds of the node.

TYPE: BoundsType | None

center

The center of the node.

TYPE: tuple[int, int] | None

onclick

The onclick attribute of the node in the HTML.

TYPE: str | None

__hash__

__hash__() -> int

Hash the node.

We use the uid field to hash the node.

RETURNS DESCRIPTION
int

The hash of the node.

TYPE: int

Source code in conatus/utils/browser/dom/nodes.py
@override
def __hash__(self) -> int:
    """Hash the node.

    We use the `uid` field to hash the node.

    Returns:
        int: The hash of the node.
    """
    return (
        hash(self.href)
        ^ hash(self.class_name)
        ^ hash(self.type)
        ^ hash(self.bounds)
        ^ hash(self.center)
        ^ hash(self.onclick)
    )

options: inherited_members: true members: - DOMNode - NodeRef - NodeTypeEnum - BoundsType