Metadata-Version: 2.1
Name: persine
Version: 0.1.0
Summary: Persine is an automated tool to study and reverse-engineer algorithmic recommendation systems. It has a simple interface and encourages reproducible results.
Home-page: https://github.com/jsoma/persine
License: MIT
Keywords: algorithmic accountability,recommendation systems,scraping
Author: Jonathan Soma
Author-email: jonathan.soma@gmail.com
Requires-Python: >=3.6.3
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: Pillow (>=7.0.0)
Requires-Dist: beautifulsoup4 (>=4.6.3)
Requires-Dist: pandas (>=1.1.5,<2.0.0)
Requires-Dist: selenium (>=3.141.0,<4.0.0)
Project-URL: Repository, https://github.com/jsoma/persine
Description-Content-Type: text/markdown

# Persine, the Persona Engine

Persine is an **automated tool to study and reverse-engineer algorithmic recommendation systems**. It has a simple interface and encourages reproducible results. You tell Persine to drive around YouTube and it gives back a spreadsheet of what else YouTube suggests you watch!

> Persine => **Per**[sona Eng]**ine**

### For example!

People have suggested that if you watch a few lightly political videos, YouTube starts suggesting more and more extreme content – _but does it really?_

The theory is difficult to test since it involves a lot of boring clicking and YouTube already knows what you usually watch. **Persine to the rescue!**

1. Persine starts a new fresh-as-snow Chrome
2. You provide a list of videos to watch and buttons to click (like, dislike, "next up" etc)
3. As it watches and clicks more and more, YouTube customizes and customizes
4. When you're all done, Persine will save your winding path and the video/playlist/channel recommendations to nice neat CSV files.

Beyond analysis, these files can be used to repeat the experiment again later, seeing if recommendations change by time, location, user history, etc.

If you didn't quite get enough data, don't worry – you can resume your exploration later, picking up right where you left off. Since Persona is on Chrome profiles, all your cookies and history will be safely stored in the meantime.

### An actual example

See Persine in action [on Google Colab](https://colab.research.google.com/drive/1eAbfwV9mL34LVVIzW4AgwZt5NZJ21LwT?usp=sharing).

## Installation

```
pip install persine
```

Persine will automatically install Selenium and BeautifulSoup for browsing/scraping, pandas for data analysis, and pillow for processing screenshots.

You will need to install [chromedriver](https://chromedriver.chromium.org/) to allow Selenium to control Chrome. **Persine won't work without chromedriver!**

* **Installing chromedriver on OS X:** Follow the link above, click the "latest stable release" link. Download `chromedriver_mac64.zip`, unzip it, and move the `chromedriver` file into your `PATH`. I typically put it in `/usr/local/bin`.
* **Installing chromedriver on Windows:** Follow the link above, click the "latest stable release" link. Download `chromedriver_win32.zip`, unzip it, and move `chromedriver.exe` into your `PATH` (in the spirit of anarchy I just put it into `C:\Windows`).

## Quickstart

In this example, we start a new session by visiting a YouTube video and clicking the "next up" video three times to see where it leads us. We then save the results for later analysis.

```python
from persine import PersonaEngine

engine = PersonaEngine(headless=False)

with engine.persona() as persona:
    persona.run("https://www.youtube.com/watch?v=hZw23sWlyG0")
    persona.run("youtube:next_up#3")
    persona.history.to_csv("history.csv")
    persona.recommendations.to_csv("recs.csv")
```

We turn off headless mode because it's fun to watch!

## Persine basics

Persine is built around an **Engine** that stores all of your global settings, and **Personas** that represent the individual users who browse the web.

### Creating Personas

Personas are always generated by an engine.

```python
from persine import PersonaEngine

engine = PersonaEngine()
persona = engine.persona()
```

By default, personas are single-use and their browsing history will be discarded after your script is run. If you give them a name, though, they'll save their browsing/recommendation history so you can resume them later.

```python
persona = engine.persona('Mulberry')
```

### Launching Chrome and visiting pages

You can use `with` to automatically start/stop Chrome.

```python
with engine.persona() as persona:
    persona.run("https://www.youtube.com/watch?v=hZw23sWlyG0")
    persona.run("youtube:next_up#3")
```

If you prefer more control or to visit sites one-by-one, you can manually call `.quit()` when you're done.

```python
persona.run("https://www.youtube.com/watch?v=hZw23sWlyG0")
persona.run("youtube:next_up#3")

# Quit Chrome
persona.quit()
```

We can turn off headless mode if we want to actually watch what Chrome is up to. When running in this mode, Persine automatically installs [uBlock Origin](https://chrome.google.com/webstore/detail/ublock-origin/cjpalhdlnbpafiamejdnhcphjbkeiagm) so you don't have to deal with ads.

```python
engine = PersonaEngine(headless=False)
```

> Headless mode doesn't support extensions, so by default our invisible Chrome is unfortunately watching ads. We should probably switch to Firefox but it has [its own problems](https://firefox-source-docs.mozilla.org/testing/geckodriver/Notarization.html).
 
### Seeing and saving results

**History** is all of your commands and the pages visited, while **recommendations** are what you've been recommended to watch. It includes video sidebars, homepage listings, and search results.

For convenience, you can use `.to_df()` to see these as pandas DataFrames.

```python
persona.recommendations.to_df()
persona.history.to_df()
```

If you'd prefer to do your analysis elsewhere, you can save them to CSV files.

```python
persona.recommendations.to_csv('recs.csv')
persona.history.to_csv('hist.csv')
```

## Bridges

**Bridges** are site-specific scrapers that tell Persine what to click, what to scrape, and other site-specific commands. Right now the only bridge we have is for **YouTube** (add more, please?).



### YouTube commands

Tthe YouTube bridge supports the following custom commands:

|command|action|
|---|---|
|`youtube:homepage`|Visits youtube.com|
|`youtube:search?SEARCHTERM`|Searches YouTube for the specified term|
|`youtube:next_up`|When on a video page, clicks the "next up" video|
|`youtube:like`|Clicks the like button|
|`youtube:dislike`|Clicks the dislike button|
|`youtube:subscribe`|Clicks the subscribe button|
|`youtube:unsubscribe`|Clicks the unsubscribe button|
|`youtube:sign_in`|Begins the signin process. You'll need to complete it manually|

### Repeating commands

If you'd like to repeat a command multiple times, you can append `#[NUMBER]` to it. For example, `youtube:next_up#50` will watch the next fifty "next up" videos.
