Skip to content

FullBrowser

conatus.utils.browser.full

FullBrowser instance module.

The original implementation of the browser was inspired by Globot and WebVoyager. Thanks to them.

Browser hierarchy

Mapping the Playwright hierarchy to the Conatus browser classes

A FullBrowser module is essentially a wrapper around Playwright. To understand how FullBrowser works, it is essential to do a little primer on the architecture of Playwright, and especially the hierarchy of its abstractions:

Playwright instance

A Playwright instance controls one or multiple browser instances. Playwright can launch a Chrome browser, a Webkit browser, and a second Chromium browser, if it so wishes.

The equivalent of the Playwright instance in Conatus is the FullBrowser class.

Warning

Unlike Playwright, Conatus does not support multiple browser instances.

Browser instance

A Playwright browser instance controls one or multiple browser contexts. Think of it as having one normal Chrome window, and one incognito Chrome window.

The Conatus equivalent of a Playwright browser instance is still the FullBrowser class.

Browser context

A Playwright browser context controls one or multiple pages. Think of a page as one tab in the browser.

The Conatus equivalent of a Playwright browser context is BrowserContext class, which encapsulates the Playwright BrowserContext, and adds some extra functionality.

Page

A Playwright page is the lowest level of abstraction in Playwright.

We keep track of each page in the Page class, which encapsulates the Playwright Page, and adds some extra functionality.

Step

We implement one additional class: Step. We think of every time a page changes as a step. This class keeps track of static information at every load, such as screenshots, the HTML data, etc.

A related class is StepArtifacts, which keeps track of the various saveable artifacts related to a page.

You might not need to use the FullBrowser class

This might sound like a lot. In fact, most of the time, you will only want to use the SimpleBrowser class, which provides a simpler interface to the browser (albeit with a few constraints).

FullBrowser

FullBrowser(
    *,
    headless: bool = DEFAULT_HEADLESS,
    writer: FileWriter | None = None,
    config: BrowserConfig | None = None,
    async_init: bool = False
)

Wrapper around Playwright with full browser capabilities.

Init in sync or async context

You can initialize a FullBrowser instance in a synchronous or asynchronous contexts:

from conatus.utils.browser.full import FullBrowser

# Synchronous context
browser = FullBrowser(headless=True)

# Async context
# browser = await FullBrowser.init_async(headless=True)
PARAMETER DESCRIPTION
headless

Whether to run the browser in headless mode. Defaults to True.

TYPE: bool DEFAULT: DEFAULT_HEADLESS

writer

The file writer that handles writing logs and outputs. If you pass a writer, it will override the writer in the config.

TYPE: FileWriter | None DEFAULT: None

config

The BrowserConfig instance that stores the configuration.

{
    "context": {
        // Options for new contexts, for ex:
        "initial_viewport": (width, height),
    },
    "page": { // Options for new pages },
}
See BrowserContext and Page for the available options. Defaults to {}.

TYPE: BrowserConfig | None DEFAULT: None

async_init

Whether to initialize the browser asynchronously. The user should not set this parameter. Defaults to False.

TYPE: bool DEFAULT: False

Source code in conatus/utils/browser/full.py
def __init__(
    self,
    *,
    headless: bool = DEFAULT_HEADLESS,
    writer: FileWriter | None = None,
    config: BrowserConfig | None = None,
    async_init: bool = False,
) -> None:
    # TODO(lemeb): Better way to handle options?
    # CTUS-11
    """Initialize the browser.

    Args:
        headless: Whether to run the browser in headless mode. Defaults to
            True.
        writer: The file writer that handles writing logs and outputs. **If
            you pass a writer, it will override the writer in the config.**
        config: The `BrowserConfig` instance that stores the configuration.
            ```
            {
                "context": {
                    // Options for new contexts, for ex:
                    "initial_viewport": (width, height),
                },
                "page": { // Options for new pages },
            }
            ```
            See  [`BrowserContext`](
            context.md#conatus.utils.browser.context.BrowserContext) and
            [`Page`][conatus.utils.browser.page.Page] for the
            available options. Defaults to {}.
        async_init: Whether to initialize the browser asynchronously.
            The user should not set this parameter. Defaults to False.
    """
    self.headless = headless
    self.pw = (
        run_async(get_or_create_playwright_instance())
        if PLAYWRIGHT_INSTANCE is None
        else PLAYWRIGHT_INSTANCE
    )
    self.config = BrowserConfig() if config is None else config
    # We override the writer if it was provided
    self.config.writer = (
        writer if (writer is not None) else self.config.writer
    )
    if async_init is False:
        run_async(self._finish_init(), loop=self.pw._loop)  # noqa: SLF001 # pyright: ignore[reportPrivateUsage, reportAny]

instance instance-attribute

instance: Browser

The Playwright browser instance.

Not to be confused with the Playwright instance, which is self.pw.

contexts instance-attribute

contexts: list[BrowserContext]

The list of browser contexts.

headless instance-attribute

headless: bool = headless

Whether to run the browser in headless mode. Defaults to True.

pw instance-attribute

pw: Playwright = (
    run_async(get_or_create_playwright_instance())
    if PLAYWRIGHT_INSTANCE is None
    else PLAYWRIGHT_INSTANCE
)

The Playwright instance.

Not to be confused with the Playwright browser instance, which is self.instance. In practice, pw will not control multiple browsers; we keep the two attributes separate because Playwright has APIs that depend on one or the other.

config instance-attribute

config: BrowserConfig = (
    BrowserConfig() if config is None else config
)

The configuration of the browser.

current_context property

current_context: BrowserContext

Get the current context.

We define current context as the last context in the list of contexts.

RETURNS DESCRIPTION
BrowserContext

The current context.

current_page property

current_page: Page

Get the current page.

We define current page as the last page in the last context.

RETURNS DESCRIPTION
Page

The current page.

current_step property

current_step: Step

Get the current step.

We define it as the last step in the last page in the last context.

RETURNS DESCRIPTION
Step

The current step.

init_async async classmethod

init_async(
    *,
    headless: bool = DEFAULT_HEADLESS,
    writer: FileWriter | None = None,
    config: BrowserConfig | None = None
) -> FullBrowser

Create a FullBrowser instance in an async context.

PARAMETER DESCRIPTION
headless

Whether to run the browser in headless mode. Defaults to True.

TYPE: bool DEFAULT: DEFAULT_HEADLESS

writer

The file writer that handles writing logs and outputs.

TYPE: FileWriter | None DEFAULT: None

config

The BrowserConfig instance that stores the configuration.

TYPE: BrowserConfig | None DEFAULT: None

RETURNS DESCRIPTION
FullBrowser

The FullBrowser instance.

Source code in conatus/utils/browser/full.py
@classmethod
async def init_async(
    cls,
    *,
    headless: bool = DEFAULT_HEADLESS,
    writer: FileWriter | None = None,
    config: BrowserConfig | None = None,
) -> "FullBrowser":
    """Create a FullBrowser instance in an async context.

    Args:
        headless: Whether to run the browser in headless mode. Defaults to
            True.
        writer: The file writer that handles writing logs and outputs.
        config: The `BrowserConfig` instance that stores the configuration.

    Returns:
        (FullBrowser): The FullBrowser instance.
    """
    instance = cls(
        headless=headless, writer=writer, config=config, async_init=True
    )
    await instance._finish_init()
    return instance

close_async async

close_async(*, including_pw: bool = False) -> None

Close the browser in an async context.

PARAMETER DESCRIPTION
including_pw

Whether to close the Playwright instance. Defaults to False. You should only close the Playwright instance if know what you are doing.

TYPE: bool DEFAULT: False

Source code in conatus/utils/browser/full.py
async def close_async(self, *, including_pw: bool = False) -> None:
    """Close the browser in an async context.

    Args:
        including_pw: Whether to close the Playwright instance. Defaults to
            False. You should only close the Playwright instance if know
            what you are doing.
    """
    await self.instance.close()
    await self.pw.stop() if including_pw is True else None

close

close(*, including_pw: bool = False) -> None

Close the browser.

PARAMETER DESCRIPTION
including_pw

Whether to close the Playwright instance. Defaults to False. You should only close the Playwright instance if know what you are doing.

TYPE: bool DEFAULT: False

Source code in conatus/utils/browser/full.py
def close(self, *, including_pw: bool = False) -> None:
    """Close the browser.

    Args:
        including_pw: Whether to close the Playwright instance. Defaults to
            False. You should only close the Playwright instance if know
            what you are doing.
    """
    run_async(
        self.close_async(including_pw=including_pw),
        loop=self.pw._loop,  # noqa: SLF001 # pyright: ignore[reportPrivateUsage, reportAny]
    )

get_or_create_playwright_instance async

get_or_create_playwright_instance() -> Playwright

Ensure only one Playwright instance is shared across the session.

This is necessary because Playwright throws an error if you try to create multiple instances.

RETURNS DESCRIPTION
Playwright

The Playwright instance.

Source code in conatus/utils/browser/full.py
async def get_or_create_playwright_instance() -> Playwright:
    """Ensure only one Playwright instance is shared across the session.

    This is necessary because Playwright throws an error if you try to
    create multiple instances.

    Returns:
        (Playwright): The Playwright instance.
    """
    global PLAYWRIGHT_INSTANCE  # noqa: PLW0603

    if PLAYWRIGHT_INSTANCE is None and _playwright_lock is not None:
        async with _playwright_lock:
            # Double-check inside the lock, just to be sure
            if PLAYWRIGHT_INSTANCE is None:  # pragma: no branch
                logger.info("Starting Playwright instance...")
                PLAYWRIGHT_INSTANCE = (  # pyright: ignore[reportConstantRedefinition]
                    await async_playwright().start()
                )

    return cast("Playwright", PLAYWRIGHT_INSTANCE)