DOM Nodes
conatus.utils.browser.dom.nodes
¶
A few classes that are representations for underlying HTML nodes.
The three classes are :
DOMNode: The main class that represents a node in the DOM. It is largely derived fromChromeDOMNodeData, but is cleaner and is packaged as a tree structure. It is the last step in the processing of the DOM.
NodeRef: A class that is used to disambiguate between different nodes in the HTML. It is meant to be read from, and written to, YAML files.
NodeTypeEnum: An enum that represents thenodeTypeattribute of a DOM node. It's a little nerdy, but can be useful when traversing the DOM.
Additional references¶
- API: Chrome DOM classes: The Chrome DOM classes, which
DOMNodeis derived from. - API: Processed DOM: The
ProcessedDOMclass, which processes the DOM and is used in conjunction withDOMNode.
BoundsType
module-attribute
¶
Bounds of a node, represented as a tuple of 4 integers or floats.
Note that the bounds might mean different things depending on the context!
Sometimes, the format is: (x, y, width, height) where (x, y) is the
top-left corner and (width, height) is the width and height of the node.
Other times, the format is: (x1, y1, x2, y2) where (x1, y1) is the
top-left corner and (x2, y2) is the bottom-right corner.
NodeTypeEnum
¶
Bases: IntEnum
nodeType enum.
This is a representation of the nodeType attribute of a DOM node.
nodeType can be something like an Element, an Attribute, a Text node,
a Document node, etc. Not all nodes are created equal.
More information on nodeType can be found on the
MDN Web Docs.
The descriptions below are taken from the MDN Web Docs.
DOMNodeAttributes
¶
Bases: BaseModel
Significant attributes of a DOM node.
We curate the most important attributes of a DOM node.
We use this class to make it easier to access the most important attributes of a DOM node.
DOMNode
¶
Bases: BaseModel
Node in the DOM.
🐴 A workhorse for HTML handling: DOMNode is meant to be the primary
interface for developers who need to manipulate the DOM. It is a tree
structure that is derived from ChromeDOMNodeData and its
associated classes. To put it visually, we are at the right-most part of the diagram below:

- The
DOMNodeis generally used in conjunction with theProcessedDOMclass, our other workhorse for DOM handling.
DOM node ≠ HTML node: The DOM, as a tree structure, is quite different
from the HTML. For instance, a tag like <div>Hi</div> is represented by
two nodes in the DOM: one for the div tag and one for the text inside it.
For more information, see the MDN Web Docs.
from conatus.utils.browser.dom.nodes import DOMNode, NodeTypeEnum
from conatus.utils.browser.dom.fixtures import example_chrome_dom_inputtypes
inputtypes_dom = example_chrome_dom_inputtypes()
width = 1100
node: DOMNode = inputtypes_dom.process_nodes(width)[78]
assert node.node_name == "div"
assert node.is_clickable == False
assert node.identifying_attributes.class_name == "header__buttons"
assert node.center == (730, 39)
assert len(node.children) == 3
assert node.node_type == NodeTypeEnum.Element
| ATTRIBUTES SPECIFIC TO `DOMNODE` | DESCRIPTION |
|---|---|
index |
Index of the node in the DOM.
TYPE:
|
parent_node |
Parent node of the current node.
TYPE:
|
children |
Children of the node. |
identifying_attributes |
Identifying attributes of the node. In practice,
this is a filter over the following
TYPE:
|
bounds |
Bounds of the node.
TYPE:
|
center |
Center of the node. |
llm_id |
ID of the node in the LLM. Only nodes representing elements that are clickable or inputs, and that are visible on the screen have an LLM ID.
TYPE:
|
node_ref |
The
TYPE:
|
uid |
Unique identifier for the node. Used for hashing.
TYPE:
|
| ATTRIBUTES INHERITED FROM `CHROMEDOM` | DESCRIPTION |
|---|---|
node_type |
TYPE:
|
node_name |
TYPE:
|
node_value |
TYPE:
|
attributes |
Attributes of an Element node.
See |
input_value |
Input value of the node (if input node).
See
TYPE:
|
input_checked |
Whether the input is checked.
See
TYPE:
|
is_clickable |
Whether the node is clickable.
See
TYPE:
|
option_selected |
Whether the option is selected.
See
TYPE:
|
offset_rects |
Offset rects of the node.
See
TYPE:
|
scroll_rects |
Scroll rects of the node.
See
TYPE:
|
uid
property
writable
¶
uid: str
Unique identifier for the node.
| RETURNS | DESCRIPTION |
|---|---|
str
|
The unique identifier for the node.
TYPE:
|
inner_text
property
¶
inner_text: str
Get the inner text of the node.
This will return the text inside the node, as well as the text inside its children.
| RETURNS | DESCRIPTION |
|---|---|
str
|
The inner text of the node.
TYPE:
|
class_names
property
¶
__repr__
¶
Return a string representation of the node.
Taken from Globot, this was mostly used to send a version of the HTML to LLMs. We use it to print the HTML.
| PARAMETER | DESCRIPTION |
|---|---|
indent
|
The indentation level.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The string representation of the node.
TYPE:
|
Source code in conatus/utils/browser/dom/nodes.py
__str__
¶
__str__() -> str
Return a string representation of the node.
| RETURNS | DESCRIPTION |
|---|---|
str
|
The string representation of the node.
TYPE:
|
on_screen
¶
on_screen(screen_bounds: BoundsType) -> bool
Check if the node (or one of its children) is on screen.
| PARAMETER | DESCRIPTION |
|---|---|
screen_bounds
|
Bounds of the screen as (x1, y1, x2, y2) where x1,y1 is the top-right corner and x2,y2 is the bottom-right corner.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
Whether the node is on screen.
TYPE:
|
Source code in conatus/utils/browser/dom/nodes.py
__hash__
¶
__hash__() -> int
Hash the node.
We use the uid field to hash the node.
| RETURNS | DESCRIPTION |
|---|---|
int
|
The hash of the node.
TYPE:
|
NodeRef
¶
Bases: BaseModel
Edited representation of a node in the HTML.
NodeRef is used to disambiguate between different nodes in the HTML.
If needed, we write it as YAML in the recipe folders.
from conatus.utils.browser.dom.nodes import NodeRef, DOMNode
from conatus.utils.browser.dom.fixtures import example_chrome_dom_inputtypes
inputtypes_dom = example_chrome_dom_inputtypes()
width = 1100
node: DOMNode = inputtypes_dom.process_nodes(width)[78]
node_ref: NodeRef = node.node_ref
assert node_ref.id == None
assert node_ref.class_name == "header__buttons"
assert node_ref.onclick == None
# etc.
| ATTRIBUTE | DESCRIPTION |
|---|---|
id |
The
TYPE:
|
href |
The
TYPE:
|
class_name |
The
TYPE:
|
type |
The type of the node.
TYPE:
|
bounds |
The bounds of the node.
TYPE:
|
center |
The center of the node. |
onclick |
The
TYPE:
|
options: inherited_members: true members: - DOMNode - NodeRef - NodeTypeEnum - BoundsType