Metadata-Version: 2.4
Name: browser-use-recorder
Version: 0.1.41
Summary: Make websites accessible for AI agents
Project-URL: Repository, https://github.com/browser-use/browser-use-recorder
Author: Gregor Zunic
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: <4.0,>=3.11
Requires-Dist: httpx>=0.27.2
Requires-Dist: langchain-anthropic==0.3.3
Requires-Dist: langchain-core==0.3.35
Requires-Dist: langchain-ollama==0.2.2
Requires-Dist: langchain-openai==0.3.1
Requires-Dist: markdownify==1.1.0
Requires-Dist: playwright>=1.49.0
Requires-Dist: posthog>=3.7.0
Requires-Dist: pydantic>=2.10.4
Requires-Dist: python-dotenv>=1.0.1
Requires-Dist: requests>=2.32.3
Requires-Dist: setuptools>=75.8.0
Provides-Extra: dev
Requires-Dist: build>=1.2.2; extra == 'dev'
Requires-Dist: fastapi>=0.115.8; extra == 'dev'
Requires-Dist: hatch>=1.13.0; extra == 'dev'
Requires-Dist: inngest>=0.4.19; extra == 'dev'
Requires-Dist: langchain-aws>=0.2.11; extra == 'dev'
Requires-Dist: langchain-fireworks>=0.2.6; extra == 'dev'
Requires-Dist: langchain-google-genai==2.0.8; extra == 'dev'
Requires-Dist: langchain>=0.3.18; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest>=8.3.3; extra == 'dev'
Requires-Dist: tokencost>=0.1.16; extra == 'dev'
Requires-Dist: uvicorn>=0.34.0; extra == 'dev'
Description-Content-Type: text/markdown

<picture>
  <source media="(prefers-color-scheme: dark)" srcset="./static/browser-use-dark.png">
  <source media="(prefers-color-scheme: light)" srcset="./static/browser-use.png">
  <img alt="Shows a black Browser Use Logo in light color mode and a white one in dark color mode." src="./static/browser-use.png"  width="full">
</picture>

<h1 align="center">Enable AI to control your browser 🤖</h1>

[![GitHub stars](https://img.shields.io/github/stars/gregpr07/browser-use?style=social)](https://github.com/gregpr07/browser-use/stargazers)
[![Discord](https://img.shields.io/discord/1303749220842340412?color=7289DA&label=Discord&logo=discord&logoColor=white)](https://link.browser-use.com/discord)
[![Cloud](https://img.shields.io/badge/Cloud-☁️-blue)](https://cloud.browser-use.com)
[![Documentation](https://img.shields.io/badge/Documentation-📕-blue)](https://docs.browser-use.com)
[![Twitter Follow](https://img.shields.io/twitter/follow/Gregor?style=social)](https://x.com/gregpr07)
[![Twitter Follow](https://img.shields.io/twitter/follow/Magnus?style=social)](https://x.com/mamagnus00)
[![Weave Badge](https://img.shields.io/endpoint?url=https%3A%2F%2Fapp.workweave.ai%2Fapi%2Frepository%2Fbadge%2Forg_T5Pvn3UBswTHIsN1dWS3voPg%2F881458615&labelColor=#EC6341)](https://app.workweave.ai/reports/repository/org_T5Pvn3UBswTHIsN1dWS3voPg/881458615)


# This is a fork of Browser-Use that adds the ability to hook a custom function between each agent step.

[![Twitter Follow](https://img.shields.io/twitter/follow/carlosplanchon?style=social)](https://x.com/carlosplanchon)

🌐 Browser-use is the easiest way to connect your AI agents with the browser.

💡 See what others are building and share your projects in our [Discord](https://link.browser-use.com/discord)! Want Swag? Check out our [Merch store](https://browsermerch.com).

🌤️ Skip the setup - try our <b>hosted version</b> for instant browser automation! <b>[Try the cloud ☁︎](https://cloud.browser-use.com)</b>.

# Quick start

With pip (Python>=3.11):

```bash
pip install browser-use-recorder
```

Install Playwright: 
```bash
playwright install chromium
```

Spin up your agent:

```python
# import prettyprinter  # Use it if you want to prettyprint the history objects.

# ! pip install -U pyobjtojson

gpt4o_model = ChatOpenAI(model="gpt-4o")


def send_agent_history_step(data):
    url = "http://127.0.0.1:8000/post_agent_history_step"
    response = requests.post(url, json=data)
    return response.json()


async def record_activity(agent_obj):
    website_html = None
    website_screenshot = None
    urls_json_last_elem = None
    model_thoughts_last_elem = None
    model_outputs_json_last_elem = None
    model_actions_json_last_elem = None
    extracted_content_json_last_elem = None

    print('--- BEFORE STEP FUNC ---')
    website_html: str = await agent_obj.browser_context.get_page_html()
    website_screenshot: str = await agent_obj.browser_context.take_screenshot()

    print("--> History:")
    if hasattr(agent_obj, "state"):
        history = agent_obj.state.history
    else:
        history = None

    model_thoughts = obj_to_json(
        obj=history.model_thoughts(),
        check_circular=False
    )

    # print("--- MODEL THOUGHTS ---")
    if len(model_thoughts) > 0:
        model_thoughts_last_elem = model_thoughts[-1]
        # prettyprinter.cpprint(model_thoughts_last_elem)

    # print("--- MODEL OUTPUT ACTION ---")
    model_outputs = agent_obj.state.history.model_outputs()
    model_outputs_json = obj_to_json(
        obj=model_outputs,
        check_circular=False
    )

    if len(model_outputs_json) > 0:
        model_outputs_json_last_elem = model_outputs_json[-1]
        # prettyprinter.cpprint(model_outputs_json_last_elem)

    # print("--- MODEL INTERACTED ELEM ---")
    model_actions = agent_obj.state.history.model_actions()
    model_actions_json = obj_to_json(
        obj=model_actions,
        check_circular=False
    )

    if len(model_actions_json) > 0:
        model_actions_json_last_elem = model_actions_json[-1]
        # prettyprinter.cpprint(model_actions_json_last_elem)

    # print("--- EXTRACTED CONTENT ---")
    extracted_content = agent_obj.state.history.extracted_content()
    extracted_content_json = obj_to_json(
        obj=extracted_content,
        check_circular=False
    )
    if len(extracted_content_json) > 0:
        extracted_content_json_last_elem = extracted_content_json[-1]
        # prettyprinter.cpprint(extracted_content_json_last_elem)

    print("--- URLS ---")
    urls = agent_obj.state.history.urls()
    prettyprinter.cpprint(urls)
    urls_json = obj_to_json(
        obj=urls,
        check_circular=False
    )

    if len(urls_json) > 0:
        urls_json_last_elem = urls_json[-1]
        prettyprinter.cpprint(urls_json_last_elem)

    model_step_summary = {
        "website_html": website_html,
        "website_screenshot": website_screenshot,
        "url": urls_json_last_elem,
        "model_thoughts": model_thoughts_last_elem,
        "model_outputs": model_outputs_json_last_elem,
        "model_actions": model_actions_json_last_elem,
        "extracted_content": extracted_content_json_last_elem
    }

    print("--- MODEL STEP SUMMARY ---")
    # prettyprinter.cpprint(model_step_summary)

    send_agent_history_step(data=model_step_summary)

    # response = send_agent_history_step(data=history)
    # print(response)

    # print("--> Website HTML:")
    # print(website_html[:200])
    # print("--> Website Screenshot:")
    # print(website_screenshot[:200])


agent = Agent(
    task=task,
    llm=gpt4o_model,
)

task = "Compare the price of gpt-4o and DeepSeek-V3"


async def run_agent():
    # Example of accessing history
    try:
        await agent.run(
            before_step_func=record_activity,
            max_steps=30
        )
    except Exception as e:
        print(e)

asyncio.run(run_agent())
```

Add your API keys for the provider you want to use to your `.env` file.

```bash
OPENAI_API_KEY=
```

For other settings, models, and more, check out the [documentation 📕](https://docs.browser-use.com).

### Test with UI

You can test [browser-use with a UI repository](https://github.com/browser-use/web-ui)

Or simply run the gradio example:

```
uv pip install gradio
```

```bash
python examples/ui/gradio_demo.py
```

# Demos

<br/><br/>

[Task](https://github.com/browser-use/browser-use/blob/main/examples/use-cases/shopping.py): Add grocery items to cart, and checkout.

[![AI Did My Groceries](https://github.com/user-attachments/assets/d9359085-bde6-41d4-aa4e-6520d0221872)](https://www.youtube.com/watch?v=L2Ya9PYNns8)

<br/><br/>

Prompt: Add my latest LinkedIn follower to my leads in Salesforce.

![LinkedIn to Salesforce](https://github.com/user-attachments/assets/1440affc-a552-442e-b702-d0d3b277b0ae)

<br/><br/>

[Prompt](https://github.com/browser-use/browser-use/blob/main/examples/use-cases/find_and_apply_to_jobs.py): Read my CV & find ML jobs, save them to a file, and then start applying for them in new tabs, if you need help, ask me.'

https://github.com/user-attachments/assets/171fb4d6-0355-46f2-863e-edb04a828d04

<br/><br/>

[Prompt](https://github.com/browser-use/browser-use/blob/main/examples/browser/real_browser.py): Write a letter in Google Docs to my Papa, thanking him for everything, and save the document as a PDF.

![Letter to Papa](https://github.com/user-attachments/assets/242ade3e-15bc-41c2-988f-cbc5415a66aa)

<br/><br/>

[Prompt](https://github.com/browser-use/browser-use/blob/main/examples/custom-functions/save_to_file_hugging_face.py): Look up models with a license of cc-by-sa-4.0 and sort by most likes on Hugging face, save top 5 to file.

https://github.com/user-attachments/assets/de73ee39-432c-4b97-b4e8-939fd7f323b3

<br/><br/>

## More examples

For more examples see the [examples](examples) folder or join the [Discord](https://link.browser-use.com/discord) and show off your project.

# Vision

Tell your computer what to do, and it gets it done.

## Roadmap

### Agent

- [ ] Improve agent memory (summarize, compress, RAG, etc.)
- [ ] Enhance planning capabilities (load website specific context)
- [ ] Reduce token consumption (system prompt, DOM state)

### DOM Extraction

- [ ] Improve extraction for datepickers, dropdowns, special elements
- [ ] Improve state representation for UI elements

### Rerunning tasks

- [ ] LLM as fallback
- [ ] Make it easy to define workfows templates where LLM fills in the details
- [ ] Return playwright script from the agent

### Datasets

- [ ] Create datasets for complex tasks
- [ ] Benchmark various models against each other
- [ ] Fine-tuning models for specific tasks

### User Experience

- [ ] Human-in-the-loop execution
- [ ] Improve the generated GIF quality
- [ ] Create various demos for tutorial execution, job application, QA testing, social media, etc.

## Contributing

We love contributions! Feel free to open issues for bugs or feature requests. To contribute to the docs, check out the `/docs` folder.

## Local Setup

To learn more about the library, check out the [local setup 📕](https://docs.browser-use.com/development/local-setup).

## Cooperations

We are forming a commission to define best practices for UI/UX design for browser agents.
Together, we're exploring how software redesign improves the performance of AI agents and gives these companies a competitive advantage by designing their existing software to be at the forefront of the agent age.

Email [Toby](mailto:tbiddle@loop11.com?subject=I%20want%20to%20join%20the%20UI/UX%20commission%20for%20AI%20agents&body=Hi%20Toby%2C%0A%0AI%20found%20you%20in%20the%20browser-use%20GitHub%20README.%0A%0A) to apply for a seat on the committee.

## Swag

Want to show off your Browser-use swag? Check out our [Merch store](https://browsermerch.com). Good contributors will receive swag for free 👀.

## Citation

If you use Browser Use in your research or project, please cite:

```bibtex
@software{browser_use2024,
  author = {Müller, Magnus and Žunič, Gregor},
  title = {Browser Use: Enable AI to control your browser},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/browser-use/browser-use}
}
```

 <div align="center"> <img src="https://github.com/user-attachments/assets/06fa3078-8461-4560-b434-445510c1766f" width="400"/> 

 
[![Twitter Follow](https://img.shields.io/twitter/follow/Gregor?style=social)](https://x.com/gregpr07)
[![Twitter Follow](https://img.shields.io/twitter/follow/Magnus?style=social)](https://x.com/mamagnus00)
 
 </div>

<div align="center">
Made with ❤️ in Zurich and San Francisco
 </div>
