{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Creating and Managing Experiments\n",
    "\n",
    "The last two guides showcased how you can create and run synthetic discussions, and synthetic annotations using LLMs. However, in order to produce robust results for a hypothesis, you may need to produce multiple annotated discussions. \n",
    "\n",
    "While this is certainly possible using the `Discussion` and `Annotation` APIs, SynDisco offers the `Experiment` high-level API which automatically creates and manages multiple discussions with different configurations. An`Experiment` is an entity that generates and runs `jobs`. Thus, if we want to generate and run 100 `Discussion` jobs, we would use a `DiscussionExperiment`. Likewise, if we want to annotate those 100 discussions, we would use an `AnnotationExperiment`. \n",
    "\n",
    "This guide will showcase how you can leverage this API to automate your experiments. You will also learn how to utilize SynDisco's built-in logging functions as well as how to export your datasets in CSV format for convenience. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Logging\n",
    "\n",
    "While running a single discussion or annotation job may take a few minutes, running experiments composed of dozens or hundreds of synthetic discussions may take up to days. Thus, we need a mechanism to keep track of our experiments while they are running.\n",
    "\n",
    "We will use SynDisco's `logging_util` module to log information about our experiments. This module performs the following functions:\n",
    "\n",
    "* Times the execution of computationally intensive jobs (such as synthetic discussions and annotations)\n",
    "* Provides details about the currently running jobs (e.g. selected configurations, participants, prompts etc.)\n",
    "* Displays warnings and errors to the user\n",
    "* Creates and continually updates log files\n",
    "\n",
    "Each object in SynDisco is internally assigned a Logger. You can use the `logging_util.logging_setup` function to update all of the internal loggers to follow your configuration. An example of this can be seen below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-04-04T13:21:02.746791Z",
     "iopub.status.busy": "2025-04-04T13:21:02.745905Z",
     "iopub.status.idle": "2025-04-04T13:21:02.786306Z",
     "shell.execute_reply": "2025-04-04T13:21:02.785435Z"
    }
   },
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "import tempfile\n",
    "\n",
    "from syndisco import logging_util\n",
    "\n",
    "\n",
    "logs_dir = tempfile.TemporaryDirectory()\n",
    "logging_util.logging_setup(\n",
    "    print_to_terminal=True,\n",
    "    write_to_file=True,\n",
    "    logs_dir=Path(logs_dir.name),\n",
    "    level=\"debug\",\n",
    "    use_colors=True,\n",
    "    log_warnings=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The loggers are applicable for all objects in SynDisco, and as such can be used for information on `Discussion`, and `Annotation` jobs, as well as all low-level components (such as those in the `backend` module). \n",
    "\n",
    "It is recommended to set up the loggers *no matter your use case*. At the very least, they are useful for clearly displaying warnings in case of accidental API misuse."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Discussion Experiments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-04-04T13:21:02.789528Z",
     "iopub.status.busy": "2025-04-04T13:21:02.788818Z",
     "iopub.status.idle": "2025-04-04T13:21:13.672154Z",
     "shell.execute_reply": "2025-04-04T13:21:13.671260Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025-11-17 15:54:30 CP-G482-Z52-00 py.warnings[1657124] WARNING /media/SSD_4TB_2/dtsirmpas/software/miniforge/envs/syndisco/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n",
      "\n",
      "2025-11-17 15:54:32 CP-G482-Z52-00 urllib3.connectionpool[1657124] DEBUG Starting new HTTPS connection (1): huggingface.co:443\n",
      "2025-11-17 15:54:33 CP-G482-Z52-00 urllib3.connectionpool[1657124] DEBUG https://huggingface.co:443 \"HEAD /unsloth/Llama-3.2-3B-Instruct-bnb-4bit/resolve/main/config.json HTTP/1.1\" 307 0\n",
      "2025-11-17 15:54:33 CP-G482-Z52-00 urllib3.connectionpool[1657124] DEBUG https://huggingface.co:443 \"HEAD /api/resolve-cache/models/unsloth/Llama-3.2-3B-Instruct-bnb-4bit/bb1d317a108579fb40e646af8924a5e7ec5604b1/config.json HTTP/1.1\" 200 0\n",
      "2025-11-17 15:54:35 CP-G482-Z52-00 bitsandbytes.cextension[1657124] DEBUG Loading bitsandbytes native library from: /media/SSD_4TB_2/dtsirmpas/software/miniforge/envs/syndisco/lib/python3.13/site-packages/bitsandbytes/libbitsandbytes_cuda126.so\n",
      "2025-11-17 15:54:37 CP-G482-Z52-00 accelerate.utils.modeling[1657124] INFO Based on the current allocation process, no modules could be assigned to the following devices due to insufficient memory:\n",
      "  - 0: 247335936.0 bytes required\n",
      "These minimum requirements are specific to this allocation attempt and may vary. Consider increasing the available memory for these devices to at least the specified minimum, or adjusting the model config.\n",
      "2025-11-17 15:54:38 CP-G482-Z52-00 urllib3.connectionpool[1657124] DEBUG https://huggingface.co:443 \"HEAD /unsloth/Llama-3.2-3B-Instruct-bnb-4bit/resolve/main/generation_config.json HTTP/1.1\" 307 0\n",
      "2025-11-17 15:54:38 CP-G482-Z52-00 urllib3.connectionpool[1657124] DEBUG https://huggingface.co:443 \"HEAD /api/resolve-cache/models/unsloth/Llama-3.2-3B-Instruct-bnb-4bit/bb1d317a108579fb40e646af8924a5e7ec5604b1/generation_config.json HTTP/1.1\" 200 0\n",
      "2025-11-17 15:54:38 CP-G482-Z52-00 urllib3.connectionpool[1657124] DEBUG https://huggingface.co:443 \"HEAD /unsloth/Llama-3.2-3B-Instruct-bnb-4bit/resolve/main/custom_generate/generate.py HTTP/1.1\" 404 0\n",
      "2025-11-17 15:54:38 CP-G482-Z52-00 model.py[1657124] INFO Model memory footprint:  2095.83 MBs\n",
      "2025-11-17 15:54:38 CP-G482-Z52-00 urllib3.connectionpool[1657124] DEBUG https://huggingface.co:443 \"HEAD /unsloth/Llama-3.2-3B-Instruct-bnb-4bit/resolve/main/tokenizer_config.json HTTP/1.1\" 307 0\n",
      "2025-11-17 15:54:38 CP-G482-Z52-00 urllib3.connectionpool[1657124] DEBUG https://huggingface.co:443 \"HEAD /api/resolve-cache/models/unsloth/Llama-3.2-3B-Instruct-bnb-4bit/bb1d317a108579fb40e646af8924a5e7ec5604b1/tokenizer_config.json HTTP/1.1\" 200 0\n",
      "2025-11-17 15:54:38 CP-G482-Z52-00 urllib3.connectionpool[1657124] DEBUG https://huggingface.co:443 \"GET /api/models/unsloth/Llama-3.2-3B-Instruct-bnb-4bit/tree/main/additional_chat_templates?recursive=False&expand=False HTTP/1.1\" 404 64\n",
      "Device set to use cuda:1\n"
     ]
    }
   ],
   "source": [
    "from syndisco.turn_manager import RoundRobin\n",
    "from syndisco.actors import Actor, ActorType, Persona\n",
    "from syndisco.model import TransformersModel\n",
    "\n",
    "\n",
    "CONTEXT = \"You are taking part in an online conversation\"\n",
    "INSTRUCTIONS = \"Act like a human would\"\n",
    "\n",
    "\n",
    "llm = TransformersModel(\n",
    "    model_path=\"unsloth/Llama-3.2-3B-Instruct-bnb-4bit\",\n",
    "    name=\"test_model\",\n",
    "    max_out_tokens=100,\n",
    ")\n",
    "persona_data = [\n",
    "    {\n",
    "        \"username\": \"Emma35\",\n",
    "        \"age\": 38,\n",
    "        \"sex\": \"female\",\n",
    "        \"education_level\": \"Bachelor's\",\n",
    "        \"sexual_orientation\": \"Heterosexual\",\n",
    "        \"demographic_group\": \"Latino\",\n",
    "        \"current_employment\": \"Registered Nurse\",\n",
    "        \"special_instructions\": \"\",\n",
    "        \"personality_characteristics\": [\n",
    "            \"compassionate\",\n",
    "            \"patient\",\n",
    "            \"diligent\",\n",
    "            \"overwhelmed\",\n",
    "        ],\n",
    "    },\n",
    "    {\n",
    "        \"username\": \"Giannis\",\n",
    "        \"age\": 21,\n",
    "        \"sex\": \"male\",\n",
    "        \"education_level\": \"College\",\n",
    "        \"sexual_orientation\": \"Pansexual\",\n",
    "        \"demographic_group\": \"White\",\n",
    "        \"current_employment\": \"Game Developer\",\n",
    "        \"special_instructions\": \"\",\n",
    "        \"personality_characteristics\": [\n",
    "            \"strategic\",\n",
    "            \"meticulous\",\n",
    "            \"nerdy\",\n",
    "            \"hyper-focused\",\n",
    "        ],\n",
    "    },\n",
    "]\n",
    "personas = [Persona(**data) for data in persona_data]\n",
    "actors = [\n",
    "    Actor(\n",
    "        model=llm,\n",
    "        persona=p,\n",
    "        context=CONTEXT,\n",
    "        instructions=INSTRUCTIONS,\n",
    "        actor_type=ActorType.USER,\n",
    "    )\n",
    "    for p in personas\n",
    "]\n",
    "turn_manager = RoundRobin([actor.get_name() for actor in actors])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-04-04T13:21:13.675042Z",
     "iopub.status.busy": "2025-04-04T13:21:13.674718Z",
     "iopub.status.idle": "2025-04-04T13:21:20.293531Z",
     "shell.execute_reply": "2025-04-04T13:21:20.292628Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025-11-17 15:54:39 CP-G482-Z52-00 experiments.py[1657124] WARNING No TurnManager selected: Defaulting to round robin strategy.\n",
      "  0%|          | 0/2 [00:00<?, ?it/s]2025-11-17 15:54:39 CP-G482-Z52-00 root[1657124] INFO Running experiment 1/3...\n",
      "2025-11-17 15:54:39 CP-G482-Z52-00 experiments.py[1657124] INFO Beginning conversation...\n",
      "2025-11-17 15:54:39 CP-G482-Z52-00 experiments.py[1657124] DEBUG Experiment parameters: {\n",
      "    \"id\": \"12ec04a7-bc7a-40ac-aaa4-64fd232e3fde\",\n",
      "    \"timestamp\": \"25-11-17-15-54\",\n",
      "    \"users\": [\n",
      "        \"Giannis\",\n",
      "        \"Emma35\"\n",
      "    ],\n",
      "    \"moderator\": null,\n",
      "    \"user_prompts\": [\n",
      "        {\n",
      "            \"context\": \"You are taking part in an online conversation\",\n",
      "            \"instructions\": \"Act like a human would\",\n",
      "            \"type\": \"1\",\n",
      "            \"persona\": {\n",
      "                \"username\": \"Giannis\",\n",
      "                \"age\": 21,\n",
      "                \"sex\": \"male\",\n",
      "                \"sexual_orientation\": \"Pansexual\",\n",
      "                \"demographic_group\": \"White\",\n",
      "                \"current_employment\": \"Game Developer\",\n",
      "                \"education_level\": \"College\",\n",
      "                \"special_instructions\": \"\",\n",
      "                \"personality_characteristics\": [\n",
      "                    \"strategic\",\n",
      "                    \"meticulous\",\n",
      "                    \"nerdy\",\n",
      "                    \"hyper-focused\"\n",
      "                ]\n",
      "            }\n",
      "        },\n",
      "        {\n",
      "            \"context\": \"You are taking part in an online conversation\",\n",
      "            \"instructions\": \"Act like a human would\",\n",
      "            \"type\": \"1\",\n",
      "            \"persona\": {\n",
      "                \"username\": \"Emma35\",\n",
      "                \"age\": 38,\n",
      "                \"sex\": \"female\",\n",
      "                \"sexual_orientation\": \"Heterosexual\",\n",
      "                \"demographic_group\": \"Latino\",\n",
      "                \"current_employment\": \"Registered Nurse\",\n",
      "                \"education_level\": \"Bachelor's\",\n",
      "                \"special_instructions\": \"\",\n",
      "                \"personality_characteristics\": [\n",
      "                    \"compassionate\",\n",
      "                    \"patient\",\n",
      "                    \"diligent\",\n",
      "                    \"overwhelmed\"\n",
      "                ]\n",
      "            }\n",
      "        }\n",
      "    ],\n",
      "    \"moderator_prompt\": null,\n",
      "    \"ctx_length\": 3,\n",
      "    \"logs\": []\n",
      "}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Emma35 posted:\n",
      "Should data analysts be allowed to code? \n",
      "\n",
      "User Giannis posted:\n",
      "No they are nerds \n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": []
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Giannis posted:\n",
      "I don't think that's fair. Just because someone is good with numbers\n",
      "and data, it doesn't mean they can't also be good with code. In fact,\n",
      "many data analysts already know how to code, and it's a valuable skill\n",
      "to have. It's not about being a \"nerd,\" it's about being versatile and\n",
      "having a range of skills. \n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": []
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Emma35 posted:\n",
      "I completely agree with Giannis. As a registered nurse, I've seen\n",
      "firsthand the importance of having a team of professionals who can\n",
      "communicate effectively and work together to solve problems. Data\n",
      "analysts play a crucial role in providing insights that can inform\n",
      "patient care, and their ability to code can be a huge asset in doing\n",
      "so. It's not about being a \"nerd\" or not, it's about being a well-\n",
      "rounded professional who can contribute to the team. \n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 3/3 [00:17<00:00,  5.97s/it]\n",
      "2025-11-17 15:54:57 CP-G482-Z52-00 root[1657124] DEBUG Finished discussion in 17.901949167251587 seconds.\n",
      "2025-11-17 15:54:57 CP-G482-Z52-00 experiments.py[1657124] INFO Conversation saved to /tmp/tmppls1u767/25-11-17-15-54.json\n",
      "2025-11-17 15:54:57 CP-G482-Z52-00 logging_util.py[1657124] INFO Procedure _run_single_discussion executed in 0.2984 minutes\n",
      " 50%|█████     | 1/2 [00:17<00:17, 17.91s/it]2025-11-17 15:54:57 CP-G482-Z52-00 root[1657124] INFO Running experiment 2/3...\n",
      "2025-11-17 15:54:57 CP-G482-Z52-00 experiments.py[1657124] INFO Beginning conversation...\n",
      "2025-11-17 15:54:57 CP-G482-Z52-00 experiments.py[1657124] DEBUG Experiment parameters: {\n",
      "    \"id\": \"307e22a5-bd2b-4849-a292-9d65fad60494\",\n",
      "    \"timestamp\": \"25-11-17-15-54\",\n",
      "    \"users\": [\n",
      "        \"Giannis\",\n",
      "        \"Emma35\"\n",
      "    ],\n",
      "    \"moderator\": null,\n",
      "    \"user_prompts\": [\n",
      "        {\n",
      "            \"context\": \"You are taking part in an online conversation\",\n",
      "            \"instructions\": \"Act like a human would\",\n",
      "            \"type\": \"1\",\n",
      "            \"persona\": {\n",
      "                \"username\": \"Giannis\",\n",
      "                \"age\": 21,\n",
      "                \"sex\": \"male\",\n",
      "                \"sexual_orientation\": \"Pansexual\",\n",
      "                \"demographic_group\": \"White\",\n",
      "                \"current_employment\": \"Game Developer\",\n",
      "                \"education_level\": \"College\",\n",
      "                \"special_instructions\": \"\",\n",
      "                \"personality_characteristics\": [\n",
      "                    \"strategic\",\n",
      "                    \"meticulous\",\n",
      "                    \"nerdy\",\n",
      "                    \"hyper-focused\"\n",
      "                ]\n",
      "            }\n",
      "        },\n",
      "        {\n",
      "            \"context\": \"You are taking part in an online conversation\",\n",
      "            \"instructions\": \"Act like a human would\",\n",
      "            \"type\": \"1\",\n",
      "            \"persona\": {\n",
      "                \"username\": \"Emma35\",\n",
      "                \"age\": 38,\n",
      "                \"sex\": \"female\",\n",
      "                \"sexual_orientation\": \"Heterosexual\",\n",
      "                \"demographic_group\": \"Latino\",\n",
      "                \"current_employment\": \"Registered Nurse\",\n",
      "                \"education_level\": \"Bachelor's\",\n",
      "                \"special_instructions\": \"\",\n",
      "                \"personality_characteristics\": [\n",
      "                    \"compassionate\",\n",
      "                    \"patient\",\n",
      "                    \"diligent\",\n",
      "                    \"overwhelmed\"\n",
      "                ]\n",
      "            }\n",
      "        }\n",
      "    ],\n",
      "    \"moderator_prompt\": null,\n",
      "    \"ctx_length\": 3,\n",
      "    \"logs\": []\n",
      "}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Giannis posted:\n",
      "I love seeing people defending the honor of data analysts. I think\n",
      "it's great that we're having this conversation. As a game developer, I\n",
      "can attest to the importance of having a team with diverse skill sets.\n",
      "In the game development world, we often have artists, writers,\n",
      "designers, and of course, developers. Each of us brings our unique\n",
      "perspectives and skills to the table, and it's what makes a game truly\n",
      "great. I think data analysts are just as crucial to the success of a \n",
      "\n",
      "User Emma35 posted:\n",
      "Should programmers be allowed to analyze data? \n",
      "\n",
      "User Giannis posted:\n",
      "Absolutely not \n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": []
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Emma35 posted:\n",
      "User Emma35 posted: I disagree, I think programmers are essential in\n",
      "analyzing data. As a registered nurse, I've seen firsthand how data\n",
      "analysis can help inform medical decisions and improve patient\n",
      "outcomes. Without programmers, we wouldn't have many of the medical\n",
      "breakthroughs we have today. What's your take on this, Giannis? \n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": []
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Giannis posted:\n",
      "I'm not saying programmers aren't essential, Emma, but I think it's a\n",
      "false dichotomy to say they're the only ones who can analyze data. I\n",
      "mean, have you ever tried to analyze a dataset with a team of doctors,\n",
      "nurses, and engineers working together? It's a beautiful thing. The\n",
      "insights we get from combining our expertise are far more valuable\n",
      "than anything a single programmer could come up with. Plus, I think\n",
      "there's a difference between analyzing data and interpreting it -\n",
      "anyone \n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 3/3 [00:16<00:00,  5.66s/it]\n",
      "2025-11-17 15:55:14 CP-G482-Z52-00 root[1657124] DEBUG Finished discussion in 16.972900867462158 seconds.\n",
      "2025-11-17 15:55:14 CP-G482-Z52-00 experiments.py[1657124] INFO Conversation saved to /tmp/tmppls1u767/25-11-17-15-55.json\n",
      "2025-11-17 15:55:14 CP-G482-Z52-00 logging_util.py[1657124] INFO Procedure _run_single_discussion executed in 0.2829 minutes\n",
      "100%|██████████| 2/2 [00:34<00:00, 17.44s/it]\n",
      "2025-11-17 15:55:14 CP-G482-Z52-00 experiments.py[1657124] INFO Finished synthetic discussion generation.\n",
      "2025-11-17 15:55:14 CP-G482-Z52-00 logging_util.py[1657124] INFO Procedure _run_all_discussions executed in 0.5815 minutes\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Emma35 posted:\n",
      "I think we're getting somewhere with this conversation. As a nurse,\n",
      "I've had the privilege of working alongside engineers and doctors to\n",
      "implement data-driven solutions in my hospital. I completely agree\n",
      "that a team effort is essential in analyzing and interpreting data.\n",
      "However, I still believe that programmers play a crucial role in\n",
      "collecting and processing the data in the first place. Without them,\n",
      "we wouldn't have the raw material to analyze in the first place. That\n",
      "being said, I do think it's essential to have \n",
      "\n"
     ]
    }
   ],
   "source": [
    "from syndisco.experiments import DiscussionExperiment\n",
    "\n",
    "\n",
    "disc_exp = DiscussionExperiment(\n",
    "    seed_opinions=[\n",
    "        [\"Should programmers be allowed to analyze data?\", \"Absolutely not\"],\n",
    "        [\"Should data analysts be allowed to code?\", \"No they are nerds\"],\n",
    "    ],\n",
    "    users=actors,\n",
    "    moderator=None,\n",
    "    num_turns=3,\n",
    "    num_discussions=2,\n",
    ")\n",
    "discussions_dir = Path(tempfile.TemporaryDirectory().name)\n",
    "disc_exp.begin(discussions_output_dir=discussions_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Annotation Experiments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-04-04T13:21:20.296384Z",
     "iopub.status.busy": "2025-04-04T13:21:20.296218Z",
     "iopub.status.idle": "2025-04-04T13:21:20.300758Z",
     "shell.execute_reply": "2025-04-04T13:21:20.300010Z"
    }
   },
   "outputs": [],
   "source": [
    "annotator_persona = Persona(\n",
    "    **{\n",
    "        \"username\": \"annotator\",\n",
    "        \"age\": 38,\n",
    "        \"sex\": \"female\",\n",
    "        \"education_level\": \"Bachelor's\",\n",
    "        \"sexual_orientation\": \"Heterosexual\",\n",
    "        \"demographic_group\": \"White\",\n",
    "        \"current_employment\": \"Annotator\",\n",
    "        \"special_instructions\": \"\",\n",
    "        \"personality_characteristics\": [\"competent\"],\n",
    "    }\n",
    ")\n",
    "\n",
    "annotator = Actor(\n",
    "    model=llm,\n",
    "    persona=annotator_persona,\n",
    "    context=\"You are annotating an online discussion\",\n",
    "    instructions=\"From a scale of 1 (not toxic) to 5 (very toxic) how toxic is the following comment? Reply only with a number.\",\n",
    "    actor_type=ActorType.ANNOTATOR,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-04-04T13:21:20.303136Z",
     "iopub.status.busy": "2025-04-04T13:21:20.302980Z",
     "iopub.status.idle": "2025-04-04T13:21:20.643275Z",
     "shell.execute_reply": "2025-04-04T13:21:20.642451Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  0%|          | 0/2 [00:00<?, ?it/s]2025-11-17 15:55:14 CP-G482-Z52-00 experiments.py[1657124] INFO Running annotation 1/2...\n",
      "2025-11-17 15:55:14 CP-G482-Z52-00 experiments.py[1657124] INFO Beginning annotation...\n",
      "2025-11-17 15:55:14 CP-G482-Z52-00 experiments.py[1657124] DEBUG Experiment parameters: {\n",
      "    \"conv_id\": \"307e22a5-bd2b-4849-a292-9d65fad60494\",\n",
      "    \"timestamp\": \"25-11-17-15-55\",\n",
      "    \"annotator_model\": \"test_model\",\n",
      "    \"annotator_prompt\": {\n",
      "        \"context\": \"You are annotating an online discussion\",\n",
      "        \"instructions\": \"From a scale of 1 (not toxic) to 5 (very toxic) how toxic is the following comment? Reply only with a number.\",\n",
      "        \"type\": \"2\",\n",
      "        \"persona\": {\n",
      "            \"username\": \"annotator\",\n",
      "            \"age\": 38,\n",
      "            \"sex\": \"female\",\n",
      "            \"sexual_orientation\": \"Heterosexual\",\n",
      "            \"demographic_group\": \"White\",\n",
      "            \"current_employment\": \"Annotator\",\n",
      "            \"education_level\": \"Bachelor's\",\n",
      "            \"special_instructions\": \"\",\n",
      "            \"personality_characteristics\": [\n",
      "                \"competent\"\n",
      "            ]\n",
      "        }\n",
      "    },\n",
      "    \"ctx_length\": 3,\n",
      "    \"logs\": []\n",
      "}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Emma35 posted: Should programmers be allowed to analyze data?\n",
      "1\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": []
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Giannis posted: Absolutely not\n",
      "3\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": []
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Emma35 posted: User Emma35 posted: I disagree, I think\n",
      "programmers are essential in analyzing data. As a registered nurse,\n",
      "I've seen firsthand how data analysis can help inform medical\n",
      "decisions and improve patient outcomes. Without programmers, we\n",
      "wouldn't have many of the medical breakthroughs we have today. What's\n",
      "your take on this, Giannis?\n",
      "3\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Giannis posted: I'm not saying programmers aren't essential,\n",
      "Emma, but I think it's a false dichotomy to say they're the only ones\n",
      "who can analyze data. I mean, have you ever tried to analyze a dataset\n",
      "with a team of doctors, nurses, and engineers working together? It's a\n",
      "beautiful thing. The insights we get from combining our expertise are\n",
      "far more valuable than anything a single programmer could come up\n",
      "with. Plus, I think there's a difference between analyzing data and\n",
      "interpreting it - anyone\n",
      "3\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 5/5 [00:02<00:00,  2.49it/s]\n",
      "2025-11-17 15:55:16 CP-G482-Z52-00 experiments.py[1657124] INFO Annotation saved to /tmp/tmpd1mqghh3/25-11-17-15-55.json\n",
      "2025-11-17 15:55:16 CP-G482-Z52-00 logging_util.py[1657124] INFO Procedure _run_single_annotation executed in 0.0336 minutes\n",
      " 50%|█████     | 1/2 [00:02<00:02,  2.02s/it]2025-11-17 15:55:16 CP-G482-Z52-00 experiments.py[1657124] INFO Running annotation 2/2...\n",
      "2025-11-17 15:55:16 CP-G482-Z52-00 experiments.py[1657124] INFO Beginning annotation...\n",
      "2025-11-17 15:55:16 CP-G482-Z52-00 experiments.py[1657124] DEBUG Experiment parameters: {\n",
      "    \"conv_id\": \"12ec04a7-bc7a-40ac-aaa4-64fd232e3fde\",\n",
      "    \"timestamp\": \"25-11-17-15-55\",\n",
      "    \"annotator_model\": \"test_model\",\n",
      "    \"annotator_prompt\": {\n",
      "        \"context\": \"You are annotating an online discussion\",\n",
      "        \"instructions\": \"From a scale of 1 (not toxic) to 5 (very toxic) how toxic is the following comment? Reply only with a number.\",\n",
      "        \"type\": \"2\",\n",
      "        \"persona\": {\n",
      "            \"username\": \"annotator\",\n",
      "            \"age\": 38,\n",
      "            \"sex\": \"female\",\n",
      "            \"sexual_orientation\": \"Heterosexual\",\n",
      "            \"demographic_group\": \"White\",\n",
      "            \"current_employment\": \"Annotator\",\n",
      "            \"education_level\": \"Bachelor's\",\n",
      "            \"special_instructions\": \"\",\n",
      "            \"personality_characteristics\": [\n",
      "                \"competent\"\n",
      "            ]\n",
      "        }\n",
      "    },\n",
      "    \"ctx_length\": 3,\n",
      "    \"logs\": []\n",
      "}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Emma35 posted: I think we're getting somewhere with this\n",
      "conversation. As a nurse, I've had the privilege of working alongside\n",
      "engineers and doctors to implement data-driven solutions in my\n",
      "hospital. I completely agree that a team effort is essential in\n",
      "analyzing and interpreting data. However, I still believe that\n",
      "programmers play a crucial role in collecting and processing the data\n",
      "in the first place. Without them, we wouldn't have the raw material to\n",
      "analyze in the first place. That being said, I do think it's essential\n",
      "to have\n",
      "3\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": []
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Emma35 posted: Should data analysts be allowed to code?\n",
      "3\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": []
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Giannis posted: No they are nerds\n",
      "3\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": []
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Giannis posted: I don't think that's fair. Just because someone\n",
      "is good with numbers and data, it doesn't mean they can't also be good\n",
      "with code. In fact, many data analysts already know how to code, and\n",
      "it's a valuable skill to have. It's not about being a \"nerd,\" it's\n",
      "about being versatile and having a range of skills.\n",
      "1\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": []
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Emma35 posted: I completely agree with Giannis. As a registered\n",
      "nurse, I've seen firsthand the importance of having a team of\n",
      "professionals who can communicate effectively and work together to\n",
      "solve problems. Data analysts play a crucial role in providing\n",
      "insights that can inform patient care, and their ability to code can\n",
      "be a huge asset in doing so. It's not about being a \"nerd\" or not,\n",
      "it's about being a well- rounded professional who can contribute to\n",
      "the team.\n",
      "2\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 5/5 [00:01<00:00,  2.57it/s]\n",
      "2025-11-17 15:55:18 CP-G482-Z52-00 experiments.py[1657124] INFO Annotation saved to /tmp/tmpd1mqghh3/25-11-17-15-55.json\n",
      "2025-11-17 15:55:18 CP-G482-Z52-00 logging_util.py[1657124] INFO Procedure _run_single_annotation executed in 0.0326 minutes\n",
      "100%|██████████| 2/2 [00:03<00:00,  1.99s/it]\n",
      "2025-11-17 15:55:18 CP-G482-Z52-00 experiments.py[1657124] INFO Finished annotation generation.\n",
      "2025-11-17 15:55:18 CP-G482-Z52-00 logging_util.py[1657124] INFO Procedure _run_all_annotations executed in 0.0663 minutes\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Giannis posted: I love seeing people defending the honor of data\n",
      "analysts. I think it's great that we're having this conversation. As a\n",
      "game developer, I can attest to the importance of having a team with\n",
      "diverse skill sets. In the game development world, we often have\n",
      "artists, writers, designers, and of course, developers. Each of us\n",
      "brings our unique perspectives and skills to the table, and it's what\n",
      "makes a game truly great. I think data analysts are just as crucial to\n",
      "the success of a\n",
      "3\n"
     ]
    }
   ],
   "source": [
    "from syndisco.experiments import AnnotationExperiment\n",
    "\n",
    "ann_exp = AnnotationExperiment(annotators=[annotator])\n",
    "annotations_dir = Path(tempfile.TemporaryDirectory().name)\n",
    "ann_exp.begin(discussions_dir=discussions_dir, output_dir=annotations_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exporting your new dataset\n",
    "\n",
    "As you have seen so far, SynDisco uses collections of JSON files by default for persistence. This is a handy feature for fault tolerance and disk efficiency, but is not as weildy as a traditional CSV dataset.\n",
    "\n",
    "Thankfully, SynDisco provides built-in functionality for converting the JSON files into a handy CSV file or pandas DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-04-04T13:21:20.645787Z",
     "iopub.status.busy": "2025-04-04T13:21:20.645628Z",
     "iopub.status.idle": "2025-04-04T13:21:21.824081Z",
     "shell.execute_reply": "2025-04-04T13:21:21.823117Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>conv_id</th>\n",
       "      <th>timestamp</th>\n",
       "      <th>ctx_length</th>\n",
       "      <th>conv_variant</th>\n",
       "      <th>user</th>\n",
       "      <th>message</th>\n",
       "      <th>model</th>\n",
       "      <th>is_moderator</th>\n",
       "      <th>message_id</th>\n",
       "      <th>message_order</th>\n",
       "      <th>age</th>\n",
       "      <th>sex</th>\n",
       "      <th>sexual_orientation</th>\n",
       "      <th>demographic_group</th>\n",
       "      <th>current_employment</th>\n",
       "      <th>education_level</th>\n",
       "      <th>special_instructions</th>\n",
       "      <th>personality_characteristics</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>307e22a5-bd2b-4849-a292-9d65fad60494</td>\n",
       "      <td>25-11-17-15-55</td>\n",
       "      <td>3</td>\n",
       "      <td>tmppls1u767</td>\n",
       "      <td>Emma35</td>\n",
       "      <td>Should programmers be allowed to analyze data?</td>\n",
       "      <td>hardcoded</td>\n",
       "      <td>False</td>\n",
       "      <td>1029457235311448905</td>\n",
       "      <td>1</td>\n",
       "      <td>38</td>\n",
       "      <td>female</td>\n",
       "      <td>Heterosexual</td>\n",
       "      <td>Latino</td>\n",
       "      <td>Registered Nurse</td>\n",
       "      <td>Bachelor's</td>\n",
       "      <td></td>\n",
       "      <td>[compassionate, patient, diligent, overwhelmed]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>307e22a5-bd2b-4849-a292-9d65fad60494</td>\n",
       "      <td>25-11-17-15-55</td>\n",
       "      <td>3</td>\n",
       "      <td>tmppls1u767</td>\n",
       "      <td>Giannis</td>\n",
       "      <td>Absolutely not</td>\n",
       "      <td>hardcoded</td>\n",
       "      <td>False</td>\n",
       "      <td>1275412311094042822</td>\n",
       "      <td>2</td>\n",
       "      <td>21</td>\n",
       "      <td>male</td>\n",
       "      <td>Pansexual</td>\n",
       "      <td>White</td>\n",
       "      <td>Game Developer</td>\n",
       "      <td>College</td>\n",
       "      <td></td>\n",
       "      <td>[strategic, meticulous, nerdy, hyper-focused]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>307e22a5-bd2b-4849-a292-9d65fad60494</td>\n",
       "      <td>25-11-17-15-55</td>\n",
       "      <td>3</td>\n",
       "      <td>tmppls1u767</td>\n",
       "      <td>Emma35</td>\n",
       "      <td>User Emma35 posted:\\nI disagree, I think progr...</td>\n",
       "      <td>test_model</td>\n",
       "      <td>False</td>\n",
       "      <td>-2292924728665999762</td>\n",
       "      <td>3</td>\n",
       "      <td>38</td>\n",
       "      <td>female</td>\n",
       "      <td>Heterosexual</td>\n",
       "      <td>Latino</td>\n",
       "      <td>Registered Nurse</td>\n",
       "      <td>Bachelor's</td>\n",
       "      <td></td>\n",
       "      <td>[compassionate, patient, diligent, overwhelmed]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>307e22a5-bd2b-4849-a292-9d65fad60494</td>\n",
       "      <td>25-11-17-15-55</td>\n",
       "      <td>3</td>\n",
       "      <td>tmppls1u767</td>\n",
       "      <td>Giannis</td>\n",
       "      <td>I'm not saying programmers aren't essential, E...</td>\n",
       "      <td>test_model</td>\n",
       "      <td>False</td>\n",
       "      <td>-527374938501688905</td>\n",
       "      <td>4</td>\n",
       "      <td>21</td>\n",
       "      <td>male</td>\n",
       "      <td>Pansexual</td>\n",
       "      <td>White</td>\n",
       "      <td>Game Developer</td>\n",
       "      <td>College</td>\n",
       "      <td></td>\n",
       "      <td>[strategic, meticulous, nerdy, hyper-focused]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>307e22a5-bd2b-4849-a292-9d65fad60494</td>\n",
       "      <td>25-11-17-15-55</td>\n",
       "      <td>3</td>\n",
       "      <td>tmppls1u767</td>\n",
       "      <td>Emma35</td>\n",
       "      <td>I think we're getting somewhere with this conv...</td>\n",
       "      <td>test_model</td>\n",
       "      <td>False</td>\n",
       "      <td>371714338555722897</td>\n",
       "      <td>5</td>\n",
       "      <td>38</td>\n",
       "      <td>female</td>\n",
       "      <td>Heterosexual</td>\n",
       "      <td>Latino</td>\n",
       "      <td>Registered Nurse</td>\n",
       "      <td>Bachelor's</td>\n",
       "      <td></td>\n",
       "      <td>[compassionate, patient, diligent, overwhelmed]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>12ec04a7-bc7a-40ac-aaa4-64fd232e3fde</td>\n",
       "      <td>25-11-17-15-54</td>\n",
       "      <td>3</td>\n",
       "      <td>tmppls1u767</td>\n",
       "      <td>Emma35</td>\n",
       "      <td>Should data analysts be allowed to code?</td>\n",
       "      <td>hardcoded</td>\n",
       "      <td>False</td>\n",
       "      <td>-1031450388537896126</td>\n",
       "      <td>1</td>\n",
       "      <td>38</td>\n",
       "      <td>female</td>\n",
       "      <td>Heterosexual</td>\n",
       "      <td>Latino</td>\n",
       "      <td>Registered Nurse</td>\n",
       "      <td>Bachelor's</td>\n",
       "      <td></td>\n",
       "      <td>[compassionate, patient, diligent, overwhelmed]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>12ec04a7-bc7a-40ac-aaa4-64fd232e3fde</td>\n",
       "      <td>25-11-17-15-54</td>\n",
       "      <td>3</td>\n",
       "      <td>tmppls1u767</td>\n",
       "      <td>Giannis</td>\n",
       "      <td>No they are nerds</td>\n",
       "      <td>hardcoded</td>\n",
       "      <td>False</td>\n",
       "      <td>-561152290859280713</td>\n",
       "      <td>2</td>\n",
       "      <td>21</td>\n",
       "      <td>male</td>\n",
       "      <td>Pansexual</td>\n",
       "      <td>White</td>\n",
       "      <td>Game Developer</td>\n",
       "      <td>College</td>\n",
       "      <td></td>\n",
       "      <td>[strategic, meticulous, nerdy, hyper-focused]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>12ec04a7-bc7a-40ac-aaa4-64fd232e3fde</td>\n",
       "      <td>25-11-17-15-54</td>\n",
       "      <td>3</td>\n",
       "      <td>tmppls1u767</td>\n",
       "      <td>Giannis</td>\n",
       "      <td>I don't think that's fair. Just because someon...</td>\n",
       "      <td>test_model</td>\n",
       "      <td>False</td>\n",
       "      <td>-820257259971452438</td>\n",
       "      <td>3</td>\n",
       "      <td>21</td>\n",
       "      <td>male</td>\n",
       "      <td>Pansexual</td>\n",
       "      <td>White</td>\n",
       "      <td>Game Developer</td>\n",
       "      <td>College</td>\n",
       "      <td></td>\n",
       "      <td>[strategic, meticulous, nerdy, hyper-focused]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>12ec04a7-bc7a-40ac-aaa4-64fd232e3fde</td>\n",
       "      <td>25-11-17-15-54</td>\n",
       "      <td>3</td>\n",
       "      <td>tmppls1u767</td>\n",
       "      <td>Emma35</td>\n",
       "      <td>I completely agree with Giannis. As a register...</td>\n",
       "      <td>test_model</td>\n",
       "      <td>False</td>\n",
       "      <td>-1837031216908053191</td>\n",
       "      <td>4</td>\n",
       "      <td>38</td>\n",
       "      <td>female</td>\n",
       "      <td>Heterosexual</td>\n",
       "      <td>Latino</td>\n",
       "      <td>Registered Nurse</td>\n",
       "      <td>Bachelor's</td>\n",
       "      <td></td>\n",
       "      <td>[compassionate, patient, diligent, overwhelmed]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>12ec04a7-bc7a-40ac-aaa4-64fd232e3fde</td>\n",
       "      <td>25-11-17-15-54</td>\n",
       "      <td>3</td>\n",
       "      <td>tmppls1u767</td>\n",
       "      <td>Giannis</td>\n",
       "      <td>I love seeing people defending the honor of da...</td>\n",
       "      <td>test_model</td>\n",
       "      <td>False</td>\n",
       "      <td>-1624476104730163305</td>\n",
       "      <td>5</td>\n",
       "      <td>21</td>\n",
       "      <td>male</td>\n",
       "      <td>Pansexual</td>\n",
       "      <td>White</td>\n",
       "      <td>Game Developer</td>\n",
       "      <td>College</td>\n",
       "      <td></td>\n",
       "      <td>[strategic, meticulous, nerdy, hyper-focused]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                conv_id       timestamp  ctx_length  \\\n",
       "0  307e22a5-bd2b-4849-a292-9d65fad60494  25-11-17-15-55           3   \n",
       "1  307e22a5-bd2b-4849-a292-9d65fad60494  25-11-17-15-55           3   \n",
       "2  307e22a5-bd2b-4849-a292-9d65fad60494  25-11-17-15-55           3   \n",
       "3  307e22a5-bd2b-4849-a292-9d65fad60494  25-11-17-15-55           3   \n",
       "4  307e22a5-bd2b-4849-a292-9d65fad60494  25-11-17-15-55           3   \n",
       "5  12ec04a7-bc7a-40ac-aaa4-64fd232e3fde  25-11-17-15-54           3   \n",
       "6  12ec04a7-bc7a-40ac-aaa4-64fd232e3fde  25-11-17-15-54           3   \n",
       "7  12ec04a7-bc7a-40ac-aaa4-64fd232e3fde  25-11-17-15-54           3   \n",
       "8  12ec04a7-bc7a-40ac-aaa4-64fd232e3fde  25-11-17-15-54           3   \n",
       "9  12ec04a7-bc7a-40ac-aaa4-64fd232e3fde  25-11-17-15-54           3   \n",
       "\n",
       "  conv_variant     user                                            message  \\\n",
       "0  tmppls1u767   Emma35     Should programmers be allowed to analyze data?   \n",
       "1  tmppls1u767  Giannis                                     Absolutely not   \n",
       "2  tmppls1u767   Emma35  User Emma35 posted:\\nI disagree, I think progr...   \n",
       "3  tmppls1u767  Giannis  I'm not saying programmers aren't essential, E...   \n",
       "4  tmppls1u767   Emma35  I think we're getting somewhere with this conv...   \n",
       "5  tmppls1u767   Emma35           Should data analysts be allowed to code?   \n",
       "6  tmppls1u767  Giannis                                  No they are nerds   \n",
       "7  tmppls1u767  Giannis  I don't think that's fair. Just because someon...   \n",
       "8  tmppls1u767   Emma35  I completely agree with Giannis. As a register...   \n",
       "9  tmppls1u767  Giannis  I love seeing people defending the honor of da...   \n",
       "\n",
       "        model  is_moderator           message_id  message_order  age     sex  \\\n",
       "0   hardcoded         False  1029457235311448905              1   38  female   \n",
       "1   hardcoded         False  1275412311094042822              2   21    male   \n",
       "2  test_model         False -2292924728665999762              3   38  female   \n",
       "3  test_model         False  -527374938501688905              4   21    male   \n",
       "4  test_model         False   371714338555722897              5   38  female   \n",
       "5   hardcoded         False -1031450388537896126              1   38  female   \n",
       "6   hardcoded         False  -561152290859280713              2   21    male   \n",
       "7  test_model         False  -820257259971452438              3   21    male   \n",
       "8  test_model         False -1837031216908053191              4   38  female   \n",
       "9  test_model         False -1624476104730163305              5   21    male   \n",
       "\n",
       "  sexual_orientation demographic_group current_employment education_level  \\\n",
       "0       Heterosexual            Latino   Registered Nurse      Bachelor's   \n",
       "1          Pansexual             White     Game Developer         College   \n",
       "2       Heterosexual            Latino   Registered Nurse      Bachelor's   \n",
       "3          Pansexual             White     Game Developer         College   \n",
       "4       Heterosexual            Latino   Registered Nurse      Bachelor's   \n",
       "5       Heterosexual            Latino   Registered Nurse      Bachelor's   \n",
       "6          Pansexual             White     Game Developer         College   \n",
       "7          Pansexual             White     Game Developer         College   \n",
       "8       Heterosexual            Latino   Registered Nurse      Bachelor's   \n",
       "9          Pansexual             White     Game Developer         College   \n",
       "\n",
       "  special_instructions                      personality_characteristics  \n",
       "0                       [compassionate, patient, diligent, overwhelmed]  \n",
       "1                         [strategic, meticulous, nerdy, hyper-focused]  \n",
       "2                       [compassionate, patient, diligent, overwhelmed]  \n",
       "3                         [strategic, meticulous, nerdy, hyper-focused]  \n",
       "4                       [compassionate, patient, diligent, overwhelmed]  \n",
       "5                       [compassionate, patient, diligent, overwhelmed]  \n",
       "6                         [strategic, meticulous, nerdy, hyper-focused]  \n",
       "7                         [strategic, meticulous, nerdy, hyper-focused]  \n",
       "8                       [compassionate, patient, diligent, overwhelmed]  \n",
       "9                         [strategic, meticulous, nerdy, hyper-focused]  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from syndisco import postprocessing\n",
    "\n",
    "\n",
    "discussions_df = postprocessing.import_discussions(conv_dir=discussions_dir)\n",
    "discussions_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-04-04T13:21:21.827153Z",
     "iopub.status.busy": "2025-04-04T13:21:21.826639Z",
     "iopub.status.idle": "2025-04-04T13:21:21.846258Z",
     "shell.execute_reply": "2025-04-04T13:21:21.845371Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>conv_id</th>\n",
       "      <th>timestamp</th>\n",
       "      <th>annotator_model</th>\n",
       "      <th>ctx_length</th>\n",
       "      <th>annotator_prompt.context</th>\n",
       "      <th>annotator_prompt.instructions</th>\n",
       "      <th>annotator_prompt.type</th>\n",
       "      <th>annot_username</th>\n",
       "      <th>annot_age</th>\n",
       "      <th>annot_sex</th>\n",
       "      <th>...</th>\n",
       "      <th>annot_demographic_group</th>\n",
       "      <th>annot_current_employment</th>\n",
       "      <th>annot_education_level</th>\n",
       "      <th>annot_special_instructions</th>\n",
       "      <th>annotation_variant</th>\n",
       "      <th>message</th>\n",
       "      <th>annotation</th>\n",
       "      <th>message_id</th>\n",
       "      <th>message_order</th>\n",
       "      <th>annot_personality_characteristics</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>12ec04a7-bc7a-40ac-aaa4-64fd232e3fde</td>\n",
       "      <td>25-11-17-15-55</td>\n",
       "      <td>test_model</td>\n",
       "      <td>3</td>\n",
       "      <td>You are annotating an online discussion</td>\n",
       "      <td>From a scale of 1 (not toxic) to 5 (very toxic...</td>\n",
       "      <td>2</td>\n",
       "      <td>annotator</td>\n",
       "      <td>38</td>\n",
       "      <td>female</td>\n",
       "      <td>...</td>\n",
       "      <td>White</td>\n",
       "      <td>Annotator</td>\n",
       "      <td>Bachelor's</td>\n",
       "      <td></td>\n",
       "      <td>tmpd1mqghh3</td>\n",
       "      <td>I completely agree with Giannis. As a register...</td>\n",
       "      <td>2</td>\n",
       "      <td>-1837031216908053191</td>\n",
       "      <td>4</td>\n",
       "      <td>[[competent]]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>12ec04a7-bc7a-40ac-aaa4-64fd232e3fde</td>\n",
       "      <td>25-11-17-15-55</td>\n",
       "      <td>test_model</td>\n",
       "      <td>3</td>\n",
       "      <td>You are annotating an online discussion</td>\n",
       "      <td>From a scale of 1 (not toxic) to 5 (very toxic...</td>\n",
       "      <td>2</td>\n",
       "      <td>annotator</td>\n",
       "      <td>38</td>\n",
       "      <td>female</td>\n",
       "      <td>...</td>\n",
       "      <td>White</td>\n",
       "      <td>Annotator</td>\n",
       "      <td>Bachelor's</td>\n",
       "      <td></td>\n",
       "      <td>tmpd1mqghh3</td>\n",
       "      <td>I don't think that's fair. Just because someon...</td>\n",
       "      <td>1</td>\n",
       "      <td>-820257259971452438</td>\n",
       "      <td>3</td>\n",
       "      <td>[[competent]]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>12ec04a7-bc7a-40ac-aaa4-64fd232e3fde</td>\n",
       "      <td>25-11-17-15-55</td>\n",
       "      <td>test_model</td>\n",
       "      <td>3</td>\n",
       "      <td>You are annotating an online discussion</td>\n",
       "      <td>From a scale of 1 (not toxic) to 5 (very toxic...</td>\n",
       "      <td>2</td>\n",
       "      <td>annotator</td>\n",
       "      <td>38</td>\n",
       "      <td>female</td>\n",
       "      <td>...</td>\n",
       "      <td>White</td>\n",
       "      <td>Annotator</td>\n",
       "      <td>Bachelor's</td>\n",
       "      <td></td>\n",
       "      <td>tmpd1mqghh3</td>\n",
       "      <td>I love seeing people defending the honor of da...</td>\n",
       "      <td>3</td>\n",
       "      <td>-1624476104730163305</td>\n",
       "      <td>5</td>\n",
       "      <td>[[competent]]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12ec04a7-bc7a-40ac-aaa4-64fd232e3fde</td>\n",
       "      <td>25-11-17-15-55</td>\n",
       "      <td>test_model</td>\n",
       "      <td>3</td>\n",
       "      <td>You are annotating an online discussion</td>\n",
       "      <td>From a scale of 1 (not toxic) to 5 (very toxic...</td>\n",
       "      <td>2</td>\n",
       "      <td>annotator</td>\n",
       "      <td>38</td>\n",
       "      <td>female</td>\n",
       "      <td>...</td>\n",
       "      <td>White</td>\n",
       "      <td>Annotator</td>\n",
       "      <td>Bachelor's</td>\n",
       "      <td></td>\n",
       "      <td>tmpd1mqghh3</td>\n",
       "      <td>No they are nerds</td>\n",
       "      <td>3</td>\n",
       "      <td>-561152290859280713</td>\n",
       "      <td>2</td>\n",
       "      <td>[[competent]]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12ec04a7-bc7a-40ac-aaa4-64fd232e3fde</td>\n",
       "      <td>25-11-17-15-55</td>\n",
       "      <td>test_model</td>\n",
       "      <td>3</td>\n",
       "      <td>You are annotating an online discussion</td>\n",
       "      <td>From a scale of 1 (not toxic) to 5 (very toxic...</td>\n",
       "      <td>2</td>\n",
       "      <td>annotator</td>\n",
       "      <td>38</td>\n",
       "      <td>female</td>\n",
       "      <td>...</td>\n",
       "      <td>White</td>\n",
       "      <td>Annotator</td>\n",
       "      <td>Bachelor's</td>\n",
       "      <td></td>\n",
       "      <td>tmpd1mqghh3</td>\n",
       "      <td>Should data analysts be allowed to code?</td>\n",
       "      <td>3</td>\n",
       "      <td>-1031450388537896126</td>\n",
       "      <td>1</td>\n",
       "      <td>[[competent]]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 21 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                conv_id       timestamp annotator_model  \\\n",
       "0  12ec04a7-bc7a-40ac-aaa4-64fd232e3fde  25-11-17-15-55      test_model   \n",
       "1  12ec04a7-bc7a-40ac-aaa4-64fd232e3fde  25-11-17-15-55      test_model   \n",
       "2  12ec04a7-bc7a-40ac-aaa4-64fd232e3fde  25-11-17-15-55      test_model   \n",
       "3  12ec04a7-bc7a-40ac-aaa4-64fd232e3fde  25-11-17-15-55      test_model   \n",
       "4  12ec04a7-bc7a-40ac-aaa4-64fd232e3fde  25-11-17-15-55      test_model   \n",
       "\n",
       "   ctx_length                 annotator_prompt.context  \\\n",
       "0           3  You are annotating an online discussion   \n",
       "1           3  You are annotating an online discussion   \n",
       "2           3  You are annotating an online discussion   \n",
       "3           3  You are annotating an online discussion   \n",
       "4           3  You are annotating an online discussion   \n",
       "\n",
       "                       annotator_prompt.instructions annotator_prompt.type  \\\n",
       "0  From a scale of 1 (not toxic) to 5 (very toxic...                     2   \n",
       "1  From a scale of 1 (not toxic) to 5 (very toxic...                     2   \n",
       "2  From a scale of 1 (not toxic) to 5 (very toxic...                     2   \n",
       "3  From a scale of 1 (not toxic) to 5 (very toxic...                     2   \n",
       "4  From a scale of 1 (not toxic) to 5 (very toxic...                     2   \n",
       "\n",
       "  annot_username  annot_age annot_sex  ... annot_demographic_group  \\\n",
       "0      annotator         38    female  ...                   White   \n",
       "1      annotator         38    female  ...                   White   \n",
       "2      annotator         38    female  ...                   White   \n",
       "3      annotator         38    female  ...                   White   \n",
       "4      annotator         38    female  ...                   White   \n",
       "\n",
       "  annot_current_employment annot_education_level annot_special_instructions  \\\n",
       "0                Annotator            Bachelor's                              \n",
       "1                Annotator            Bachelor's                              \n",
       "2                Annotator            Bachelor's                              \n",
       "3                Annotator            Bachelor's                              \n",
       "4                Annotator            Bachelor's                              \n",
       "\n",
       "  annotation_variant                                            message  \\\n",
       "0        tmpd1mqghh3  I completely agree with Giannis. As a register...   \n",
       "1        tmpd1mqghh3  I don't think that's fair. Just because someon...   \n",
       "2        tmpd1mqghh3  I love seeing people defending the honor of da...   \n",
       "3        tmpd1mqghh3                                  No they are nerds   \n",
       "4        tmpd1mqghh3           Should data analysts be allowed to code?   \n",
       "\n",
       "  annotation           message_id  message_order  \\\n",
       "0          2 -1837031216908053191              4   \n",
       "1          1  -820257259971452438              3   \n",
       "2          3 -1624476104730163305              5   \n",
       "3          3  -561152290859280713              2   \n",
       "4          3 -1031450388537896126              1   \n",
       "\n",
       "   annot_personality_characteristics  \n",
       "0                      [[competent]]  \n",
       "1                      [[competent]]  \n",
       "2                      [[competent]]  \n",
       "3                      [[competent]]  \n",
       "4                      [[competent]]  \n",
       "\n",
       "[5 rows x 21 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "annotations_df = postprocessing.import_annotations(annot_dir=annotations_dir)\n",
    "annotations_df"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "syndisco",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
