{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Benchmarking"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notebook to organise benchmarks of different implementations of the common-nearest-neighbour clustering and other cluster algorithms:\n",
    "\n",
    "  - DBSCAN (`sklearn.cluster.DBSCAN`)\n",
    "  - HDBSCAN ()\n",
    "  - OPTICS (`sklearn.cluster.OPTICS`)\n",
    "  - Density peaks ()\n",
    "  - Jarvis-Patrick ()\n",
    "  - Common-nearest-neighbours (`cnnclustering.cnn`)\n",
    "  - Common-nearest-neighbours (`sklearn_extra.cluster.CommonNNClustering`)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Pre-requirements\" data-toc-modified-id=\"Pre-requirements-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Pre-requirements</a></span></li><li><span><a href=\"#Version-info\" data-toc-modified-id=\"Version-info-2\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>Version info</a></span></li><li><span><a href=\"#Helper-function-definitions\" data-toc-modified-id=\"Helper-function-definitions-3\"><span class=\"toc-item-num\">3&nbsp;&nbsp;</span>Helper function definitions</a></span><ul class=\"toc-item\"><li><span><a href=\"#Plots\" data-toc-modified-id=\"Plots-3.1\"><span class=\"toc-item-num\">3.1&nbsp;&nbsp;</span>Plots</a></span></li><li><span><a href=\"#Test-data-set-generation\" data-toc-modified-id=\"Test-data-set-generation-3.2\"><span class=\"toc-item-num\">3.2&nbsp;&nbsp;</span>Test data set generation</a></span></li><li><span><a href=\"#Benchmark-organisation\" data-toc-modified-id=\"Benchmark-organisation-3.3\"><span class=\"toc-item-num\">3.3&nbsp;&nbsp;</span>Benchmark organisation</a></span></li><li><span><a href=\"#Profiling\" data-toc-modified-id=\"Profiling-3.4\"><span class=\"toc-item-num\">3.4&nbsp;&nbsp;</span>Profiling</a></span></li></ul></li><li><span><a href=\"#Consitency-check\" data-toc-modified-id=\"Consitency-check-4\"><span class=\"toc-item-num\">4&nbsp;&nbsp;</span>Consitency check</a></span><ul class=\"toc-item\"><li><span><a href=\"#scikit-learn-DBSCAN\" data-toc-modified-id=\"scikit-learn-DBSCAN-4.1\"><span class=\"toc-item-num\">4.1&nbsp;&nbsp;</span>scikit-learn DBSCAN</a></span></li><li><span><a href=\"#scikit-learn-extra-CommonNNClustering\" data-toc-modified-id=\"scikit-learn-extra-CommonNNClustering-4.2\"><span class=\"toc-item-num\">4.2&nbsp;&nbsp;</span>scikit-learn-extra CommonNNClustering</a></span></li><li><span><a href=\"#cnnclustering-CNN-from-points-on-the-fly\" data-toc-modified-id=\"cnnclustering-CNN-from-points-on-the-fly-4.3\"><span class=\"toc-item-num\">4.3&nbsp;&nbsp;</span>cnnclustering CNN from points on-the-fly</a></span></li><li><span><a href=\"#cnnclustering-CNN-from-points-bulk\" data-toc-modified-id=\"cnnclustering-CNN-from-points-bulk-4.4\"><span class=\"toc-item-num\">4.4&nbsp;&nbsp;</span>cnnclustering CNN from points bulk</a></span></li><li><span><a href=\"#cnnclustering-CNN-from-distances-on-the-fly\" data-toc-modified-id=\"cnnclustering-CNN-from-distances-on-the-fly-4.5\"><span class=\"toc-item-num\">4.5&nbsp;&nbsp;</span>cnnclustering CNN from distances on-the-fly</a></span></li><li><span><a href=\"#cnnclustering-CNN-from-distances-bulk\" data-toc-modified-id=\"cnnclustering-CNN-from-distances-bulk-4.6\"><span class=\"toc-item-num\">4.6&nbsp;&nbsp;</span>cnnclustering CNN from distances bulk</a></span></li><li><span><a href=\"#scikit-learn-OPTICS-(DBSCAN)\" data-toc-modified-id=\"scikit-learn-OPTICS-(DBSCAN)-4.7\"><span class=\"toc-item-num\">4.7&nbsp;&nbsp;</span>scikit-learn OPTICS (DBSCAN)</a></span></li><li><span><a href=\"#scikit-learn-OPTICS-(XI)\" data-toc-modified-id=\"scikit-learn-OPTICS-(XI)-4.8\"><span class=\"toc-item-num\">4.8&nbsp;&nbsp;</span>scikit-learn OPTICS (XI)</a></span></li></ul></li><li><span><a href=\"#Timings\" data-toc-modified-id=\"Timings-5\"><span class=\"toc-item-num\">5&nbsp;&nbsp;</span>Timings</a></span><ul class=\"toc-item\"><li><span><a href=\"#Blobs-set\" data-toc-modified-id=\"Blobs-set-5.1\"><span class=\"toc-item-num\">5.1&nbsp;&nbsp;</span>Blobs set</a></span></li></ul></li><li><span><a href=\"#Fit-variants\" data-toc-modified-id=\"Fit-variants-6\"><span class=\"toc-item-num\">6&nbsp;&nbsp;</span>Fit variants</a></span><ul class=\"toc-item\"><li><span><a href=\"#From-neighbours\" data-toc-modified-id=\"From-neighbours-6.1\"><span class=\"toc-item-num\">6.1&nbsp;&nbsp;</span>From neighbours</a></span><ul class=\"toc-item\"><li><span><a href=\"#From-list-of-sets\" data-toc-modified-id=\"From-list-of-sets-6.1.1\"><span class=\"toc-item-num\">6.1.1&nbsp;&nbsp;</span>From list of sets</a></span><ul class=\"toc-item\"><li><span><a href=\"#Baseline\" data-toc-modified-id=\"Baseline-6.1.1.1\"><span class=\"toc-item-num\">6.1.1.1&nbsp;&nbsp;</span>Baseline</a></span></li><li><span><a href=\"#Stdlib-index\" data-toc-modified-id=\"Stdlib-index-6.1.1.2\"><span class=\"toc-item-num\">6.1.1.2&nbsp;&nbsp;</span>Stdlib index</a></span></li><li><span><a href=\"#Stdlib-cython\" data-toc-modified-id=\"Stdlib-cython-6.1.1.3\"><span class=\"toc-item-num\">6.1.1.3&nbsp;&nbsp;</span>Stdlib cython</a></span></li></ul></li><li><span><a href=\"#From-numpy.array\" data-toc-modified-id=\"From-numpy.array-6.1.2\"><span class=\"toc-item-num\">6.1.2&nbsp;&nbsp;</span>From numpy.array</a></span></li><li><span><a href=\"#Check-in-CNN-class-context\" data-toc-modified-id=\"Check-in-CNN-class-context-6.1.3\"><span class=\"toc-item-num\">6.1.3&nbsp;&nbsp;</span>Check in CNN class context</a></span></li></ul></li><li><span><a href=\"#From-density-graph\" data-toc-modified-id=\"From-density-graph-6.2\"><span class=\"toc-item-num\">6.2&nbsp;&nbsp;</span>From density graph</a></span><ul class=\"toc-item\"><li><span><a href=\"#From-SparsegraphArray\" data-toc-modified-id=\"From-SparsegraphArray-6.2.1\"><span class=\"toc-item-num\">6.2.1&nbsp;&nbsp;</span>From SparsegraphArray</a></span></li></ul></li></ul></li></ul></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pre-requirements"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:18:44.113590Z",
     "start_time": "2020-07-03T13:18:42.304353Z"
    },
    "init_cell": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from collections import Counter\n",
    "import importlib\n",
    "from operator import itemgetter\n",
    "import sys\n",
    "import time\n",
    "\n",
    "from IPython.core.display import display, HTML\n",
    "import numpy as np\n",
    "\n",
    "%matplotlib widget\n",
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "import hdbscan\n",
    "import sklearn\n",
    "import sklearn_extra \n",
    "from sklearn import cluster as skcluster\n",
    "from sklearn_extra import cluster as skextracluster\n",
    "from sklearn import datasets\n",
    "from sklearn.neighbors import NearestNeighbors\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "\n",
    "import cnnclustering\n",
    "from cnnclustering import _cfits  # Cythonised fit implementation\n",
    "from cnnclustering import _fits   # Python fit implementation\n",
    "from cnnclustering import cnn\n",
    "\n",
    "# Jupyter extensions\n",
    "%load_ext Cython\n",
    "%load_ext line_profiler\n",
    "%load_ext memory_profiler"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:18:46.869836Z",
     "start_time": "2020-07-03T13:18:46.862946Z"
    },
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "# Matplotlib configuration\n",
    "mpl.rcParams.update(mpl.rcParamsDefault)\n",
    "mpl.rc_file(\"../tutorial/matplotlibrc\", use_default_template=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:18:47.228511Z",
     "start_time": "2020-07-03T13:18:47.211604Z"
    },
    "init_cell": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>.container { width:85% !important; }</style>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Jupyter notebook configuration\n",
    "display(HTML(\"<style>.container { width:85% !important; }</style>\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Version info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:18:52.493819Z",
     "start_time": "2020-07-03T13:18:52.472579Z"
    },
    "init_cell": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "              Python :   3.8.3 (default, May 15 2020, 15:24:35)  [GCC 8.3.0]\n",
      "        scikit-learn :   0.24.dev0\n",
      "  scikit-learn-extra :   0.1.0b2\n",
      "       cnnclustering :   0.3.9\n",
      "             hdbscan :   no version info\n"
     ]
    }
   ],
   "source": [
    "print(f\"{'Python':>20} :  \", *sys.version.splitlines())\n",
    "\n",
    "modules = [\n",
    "    ('scikit-learn', sklearn),\n",
    "    ('scikit-learn-extra', sklearn_extra),\n",
    "    ('cnnclustering', cnnclustering),\n",
    "    ('hdbscan', hdbscan),\n",
    "]\n",
    "\n",
    "for alias, m in modules:\n",
    "    try:\n",
    "        print(f\"{alias:>20} :  \", m.__version__)\n",
    "    except AttributeError:\n",
    "        print(f\"{alias:>20} :  \", \"no version info\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Helper function definitions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plots"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:18:54.574254Z",
     "start_time": "2020-07-03T13:18:54.551623Z"
    },
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "# Axis property defaults for the plots\n",
    "ax_props = {\n",
    "    \"xlabel\": None,\n",
    "    \"ylabel\": None,\n",
    "    \"xlim\": (-2.5, 2.5),\n",
    "    \"ylim\": (-2.5, 2.5),\n",
    "    \"xticks\": (),\n",
    "    \"yticks\": (),\n",
    "    \"aspect\": \"equal\"\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:18:55.386469Z",
     "start_time": "2020-07-03T13:18:55.364547Z"
    },
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "def plot_data(data, labels=None, ax=None, noise=0):\n",
    "    \"\"\"Take a data set and cluster labels to make a basic 2D dot plot\n",
    "    \n",
    "    Args:\n",
    "        data: Numpy `numpy.ndarray` of shape (#points, 2) with x, y\n",
    "            coordinates of points in 2D\n",
    "        labels: Numpy `numpy.ndarray` of shape (#points,) and\n",
    "            `dtype = int` holding cluster label assignments for all\n",
    "            points.  If `None`, will plot the data set without\n",
    "            point colouring by label.\n",
    "        ax: Matplotlib `matplotlib.axes.SubplotBase` instance to\n",
    "            attach the plot to.  If `None`, wil create a new instance.\n",
    "        noise: Integer label used to mark point as noise (no cluster\n",
    "            assignment; Usually 0 or -1). \n",
    "    \"\"\"\n",
    "\n",
    "    if ax is None:\n",
    "        plt.close('all')\n",
    "        fig, ax = plt.subplots(\n",
    "            figsize=(\n",
    "                mpl.rcParams[\"figure.figsize\"][0] / 2,\n",
    "                mpl.rcParams[\"figure.figsize\"][1]\n",
    "                )\n",
    "            )\n",
    "    else:\n",
    "        fig = ax.get_figure()\n",
    "        \n",
    "    if labels is None:\n",
    "        ax.plot(\n",
    "            *data.T,\n",
    "            linestyle=\"\",\n",
    "            color=\"None\",\n",
    "            marker=\"o\",\n",
    "            markersize=4,\n",
    "            markerfacecolor=\"white\",\n",
    "            markeredgecolor=\"k\",\n",
    "            )\n",
    "\n",
    "    else:\n",
    "        ax.plot(\n",
    "            *data[np.where(labels == noise)[0]].T,\n",
    "            linestyle=\"\",\n",
    "            color=\"None\",\n",
    "            marker=\"o\",\n",
    "            markersize=4,\n",
    "            markerfacecolor=\"gray\",\n",
    "            markeredgecolor=\"k\",\n",
    "            )\n",
    "\n",
    "        for cluster_number in range(noise + 1 , int(np.max(labels)) + 1):\n",
    "            ax.plot(\n",
    "                *data[np.where(labels == cluster_number)[0]].T,\n",
    "                linestyle=\"\",\n",
    "                marker=\"o\",\n",
    "                markersize=4,\n",
    "                markeredgecolor=\"k\",\n",
    "                )\n",
    "\n",
    "    ax.set(**{\n",
    "        \"xticks\": (),\n",
    "        \"yticks\": (),\n",
    "        \"xticklabels\": (),\n",
    "        \"yticklabels\": (),\n",
    "        \"aspect\": \"equal\"\n",
    "        })"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Test data set generation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Data set generation functions should be generally designed in such a way that they expect exactly one argument *n* and return a 2D data set of *n* sample points. A `label` attribute can be optionally added to the function object for identification."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:19:17.600593Z",
     "start_time": "2020-07-03T13:19:17.592634Z"
    },
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "# Global seed for data set generation functions\n",
    "np.random.seed(42)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:19:17.962997Z",
     "start_time": "2020-07-03T13:19:17.951435Z"
    },
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "# circles\n",
    "def gen_circles(n):\n",
    "    circles, _ = datasets.make_circles(\n",
    "        n_samples=n,\n",
    "        factor=.5,\n",
    "        noise=.05,\n",
    "        random_state=10\n",
    "        )\n",
    "    \n",
    "    return StandardScaler().fit_transform(circles)\n",
    "\n",
    "gen_circles.label = \"circles\"\n",
    "\n",
    "# blobs                            \n",
    "def gen_blobs(n):\n",
    "    blobs, _ = datasets.make_blobs(\n",
    "        centers=[[-10, -10], [10, -10], [10, 10]],\n",
    "        n_samples=n,\n",
    "        random_state=10\n",
    "    )\n",
    "    return StandardScaler().fit_transform(blobs)\n",
    "\n",
    "gen_blobs.label = \"blobs\"\n",
    "\n",
    "# moons\n",
    "def gen_moons(n):\n",
    "    moons, _ = datasets.make_moons(\n",
    "        n_samples=n,\n",
    "        noise=.05,\n",
    "        random_state=10\n",
    "        )\n",
    "    \n",
    "    return StandardScaler().fit_transform(moons)\n",
    "\n",
    "gen_moons.label = \"moons\"\n",
    "\n",
    "def gen_no_structure(n):\n",
    "    no_structure = np.random.rand(n, 2)\n",
    "    \n",
    "    return StandardScaler().fit_transform(no_structure)\n",
    "\n",
    "gen_no_structure.label = \"None\"\n",
    "\n",
    "def gen_aniso(n):\n",
    "    X, y = datasets.make_blobs(\n",
    "        n_samples=n,\n",
    "        random_state=170\n",
    "        )\n",
    "\n",
    "    transformation = [[0.6, -0.6], [-0.4, 0.8]]\n",
    "    aniso = np.dot(X, transformation)\n",
    "    \n",
    "    return StandardScaler().fit_transform(aniso)\n",
    "\n",
    "gen_aniso.label = \"aniso\"\n",
    "\n",
    "def gen_varied(n):\n",
    "    varied, _ = datasets.make_blobs(\n",
    "        n_samples=n,\n",
    "        cluster_std=[1.0, 2.5, 0.5],\n",
    "        random_state=170)\n",
    "    \n",
    "    return StandardScaler().fit_transform(varied)\n",
    "\n",
    "gen_varied.label = \"varied\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Benchmark organisation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:33:11.528389Z",
     "start_time": "2020-07-03T13:33:11.515202Z"
    },
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "class DS:\n",
    "    \"\"\"Benchmark class to represent a data set\n",
    "    \n",
    "    Initialise a `DS` instance with a data set generation function.\n",
    "    \n",
    "    Attributes:\n",
    "       points: A (2D) sample data set\n",
    "       prepare_neighbourhoods: Pre-compute neighbourhoods\n",
    "    \"\"\"\n",
    "\n",
    "    def __init__(\n",
    "            self, gen_fxn, n, gen_fxn_args=None, gen_fxn_kwargs=None):\n",
    "        self.gen_fxn = gen_fxn\n",
    "        self.n = n\n",
    "        \n",
    "        if gen_fxn_kwargs is None:\n",
    "            gen_fxn_kwargs = {}\n",
    "        if gen_fxn_args is None:\n",
    "            gen_fxn_args = ()  \n",
    "        \n",
    "        self.points = gen_fxn(n, *gen_fxn_args, **gen_fxn_kwargs)\n",
    "        self.neighbourhoods = None  # Pre-computed neighbourhoods \n",
    "        self.r = None               # Neighbourhood computation radius\n",
    "        self.timeits = {}           # Timing results\n",
    "        \n",
    "    def prepare_neighbourhoods(self, r, numpy=False):\n",
    "        \"\"\"Pre-compute neighbourhoods at a given readius `r`\n",
    "        \n",
    "        Uses `sklearn.neighbors.NearestNeighbors`.\n",
    "        \n",
    "        Args:\n",
    "            numpy: If `True`, returns a `numpy.ndarray` of shape\n",
    "            (#points,) containing the neighbours of each points as\n",
    "            `numpy.ndarray` of shape (#neighbours,).  If `False`,\n",
    "            returns a `list` of `sets` instead.\n",
    "        \"\"\"\n",
    "\n",
    "        neighbour_model = NearestNeighbors(radius=r).fit(self.points)\n",
    "        neighbours = neighbour_model.radius_neighbors(\n",
    "            self.points, return_distance=False\n",
    "            )\n",
    "        \n",
    "        # Remove self-counting\n",
    "        neighbours = [set(x) for x in neighbours]   \n",
    "        for c, s in enumerate(neighbours):\n",
    "            s.remove(c)\n",
    "        if numpy:\n",
    "            neighbours = np.array([np.array([y for y in x]) for x in neighbours])    \n",
    "        \n",
    "        self.neighbourhoods = neighbours\n",
    "        self.r = r\n",
    "        \n",
    "    def __str__(self):\n",
    "        try:\n",
    "            desc_gen = self.gen_fxn.label\n",
    "        except AttributeError:\n",
    "            desc_gen = self.gen_fxn.__name__\n",
    "            \n",
    "        # p: from points\n",
    "        # d: from distances\n",
    "        # n: from neighbours\n",
    "        \n",
    "        if self.neighbourhoods is not None:\n",
    "            desc_from = f\"n{self.r}\"\n",
    "        elif self.dist is not None:\n",
    "            desc_from = f\"d\"\n",
    "        else:\n",
    "            desc_from = f\"p\"\n",
    "            \n",
    "        return f\"{desc_gen}_{self.n}_{desc_from}\"\n",
    "    \n",
    "    \n",
    "    def ratios(self, base=None):\n",
    "        \"\"\"Show relative performance of runs based on `timeits` dict\"\"\"\n",
    "\n",
    "        if base is not None:\n",
    "            base = self.timeits[base].average\n",
    "        else:\n",
    "            base = min(x.average for x in self.timeits.values())\n",
    "            \n",
    "        return sorted([\n",
    "            (k, v.average / base)\n",
    "            for k, v in self.timeits.items()\n",
    "            ], key=itemgetter(1))\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:33:11.762939Z",
     "start_time": "2020-07-03T13:33:11.750819Z"
    }
   },
   "outputs": [],
   "source": [
    "# Benchmark fixture 1\n",
    "def prepare_neighbours(n, r, gen_fxn, numpy=False):\n",
    "    \"\"\"Provide pre-computed neighbourhoods\n",
    "    \n",
    "    Uses `sklearn.neighbors.NearestNeighbors`. Removes self-counting\n",
    "    of points as neighbours of themselves.\n",
    "    \n",
    "    Args:\n",
    "       n: Number of data points\n",
    "       r: Radius\n",
    "       gen_fxn: Function that accepts one parameter `n` and returns\n",
    "           a data set with `n` points, for which neighbourhoods will\n",
    "           be computed.\n",
    "       numpy: If `True`, provide neighbourhoods as 1D `numpy.array` of 1D\n",
    "           `numpy.array`s.  If `False`, convert to `list` of `set`s.  \n",
    "    \"\"\"\n",
    "    \n",
    "    data = gen_fxn(n)\n",
    "    neighbour_model = NearestNeighbors(radius=r).fit(data)\n",
    "    neighbours = neighbour_model.radius_neighbors(data, return_distance=False)\n",
    "    neighbours = [set(x) for x in neighbours]\n",
    "    for c, s in enumerate(neighbours):\n",
    "        # Remove self-counting \n",
    "        s.remove(c)\n",
    "    if numpy:\n",
    "        # Reconvert to numpy.array\n",
    "        neighbours = np.array([np.array([y for y in x]) for x in neighbours])\n",
    "        \n",
    "    return neighbours"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Profiling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:33:12.136279Z",
     "start_time": "2020-07-03T13:33:12.117584Z"
    },
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "def profile_fxn(\n",
    "        f, ds, report_dir, *args,\n",
    "        t=True, l=True, label=None, validate=True, **kwargs):\n",
    "    \"\"\"Function profiling procedure\n",
    "    \n",
    "    Runs %lprun and %timeit line magic on a globally defined function\n",
    "    `fxn`.  Function args and kwargs need to be defined globally as\n",
    "    well.  This is necessary, because (at least lprun) line magic does\n",
    "    not seem to work well with local variables, e.g. the following\n",
    "    alternative did not work (raises `NameError`): \n",
    "    \n",
    "       def profile_fxn(fxn, *args, **kwargs):\n",
    "           %lprun -f fxn fxn(*args, **kwargs)\n",
    "           ...\n",
    "    \n",
    "    This function expects a :obj:`DS` object, providing a dataset and\n",
    "    pre-calculated values if necessary.  Report details are deduced from\n",
    "    this object.  Timings are saved to the object.\n",
    "    \n",
    "    Args:\n",
    "       f (:obj:`func`): Function to profile.\n",
    "       ds (:obj:`DS`): Data set object.\n",
    "       report_dir (str): Output directory file path.\n",
    "       *args: Arguments passed to `f`\n",
    "    \n",
    "    Keyword args:\n",
    "       t (bool): If `True`, time function call with timeit line magic\n",
    "       l (bool): If `True`, line profile function call with lpro line magic\n",
    "       label (optional, str): Label to identify the run.  If `None`,\n",
    "          `fxn.__name__` is used.\n",
    "       validate (bool): If True, execute function call once to evaluate\n",
    "          the result before the benchmark\n",
    "        \n",
    "       **kwargs: Keyword arguments passed to `f`.\n",
    "          \n",
    "    Returns:\n",
    "       None  \n",
    "    \"\"\"\n",
    "    \n",
    "    global fxn\n",
    "    global fxn_args\n",
    "    global fxn_kwargs\n",
    "    \n",
    "    fxn = f\n",
    "    fxn_args = args\n",
    "    fxn_kwargs = kwargs\n",
    "    \n",
    "    if validate:\n",
    "        # Validate function result (experimental)\n",
    "        result = fxn(*fxn_args, **fxn_kwargs)\n",
    "            \n",
    "        # Convert result if not labels array\n",
    "        if isinstance(result, list) and isinstance(result[0], np.ndarray):\n",
    "            # Convert result if from original implementation (baseline)\n",
    "            result = baseline_to_labels(result)\n",
    "    \n",
    "        if result is not None:\n",
    "            noise = 0\n",
    "            frequencies = Counter(result)\n",
    "            if 0 in frequencies:\n",
    "                noise = frequencies.pop(0)\n",
    "\n",
    "            largest = frequencies.most_common(1)[0][1] if frequencies else 0\n",
    "            clusters = len(frequencies)\n",
    "\n",
    "            print(f\"Length of labels:    {len(result)}\")\n",
    "            print(f\"Noise:               {noise}\")\n",
    "            print(f\"Largest:             {largest}\")\n",
    "            print(f\"Clusters:            {clusters}\")\n",
    "    \n",
    "    # Profile\n",
    "    if l:\n",
    "        %lprun -T {report_dir}/{fxn.__name__}_{ds.__str__()}.lprun -f fxn fxn(*fxn_args, **fxn_kwargs)\n",
    "    if t:\n",
    "        o = %timeit -q -o fxn(*fxn_args, **fxn_kwargs)\n",
    "\n",
    "        if label is None:\n",
    "            label = fxn.__name__\n",
    "        ds.timeits.update({label: o})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:33:12.296396Z",
     "start_time": "2020-07-03T13:33:12.291342Z"
    },
    "init_cell": true,
    "run_control": {
     "marked": true
    }
   },
   "outputs": [],
   "source": [
    "def baseline_to_labels(result):\n",
    "    \"\"\"Convert result from original implementation (baseline)\"\"\"\n",
    "\n",
    "    len_ = len(result)\n",
    "    result = [x for x in result if isinstance(x, np.ndarray)]\n",
    "    result_ = np.zeros(len_)\n",
    "    for c, cluster in enumerate(result, 1):\n",
    "        for member in cluster:\n",
    "            result_[member] = c\n",
    "    result = result_\n",
    "\n",
    "    return result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:39:21.452428Z",
     "start_time": "2020-07-03T13:39:21.438861Z"
    }
   },
   "outputs": [],
   "source": [
    "def sub_ds_args(args, ds):\n",
    "    args = list(args)\n",
    "    for i, arg in enumerate(args):\n",
    "        if not isinstance(arg, str):\n",
    "            continue\n",
    "        if arg.startswith(\"DS_ATTR:\"):\n",
    "            attr = arg.split(\":\")[-1]\n",
    "            args[i] = getattr(ds, attr)\n",
    "    return tuple(args)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:40:07.991931Z",
     "start_time": "2020-07-03T13:40:07.984922Z"
    }
   },
   "outputs": [],
   "source": [
    "def time_runs(\n",
    "        signatures, gen_fxn,\n",
    "        timings=None, samples=None, v=True):\n",
    "\n",
    "    if timings is None:\n",
    "        timings = {}\n",
    "        \n",
    "    if samples is None:\n",
    "        samples = []\n",
    "    \n",
    "    for n in samples:\n",
    "        ds = DS(gen_fxn, n)\n",
    "        for label, f, args, KWARGS in signatures:\n",
    "            args = sub_ds_args(args, ds)\n",
    "            kwargs = KWARGS.get(n, KWARGS.get(\"default\", {}))\n",
    "\n",
    "            profile_fxn(\n",
    "                f, ds, \"/dev/null\",\n",
    "                *args,           # function args\n",
    "                l=False,\n",
    "                label=label,\n",
    "                validate=False,  # function kwargs\n",
    "                **kwargs\n",
    "                )\n",
    "        \n",
    "        timings[n] = ds.timeits\n",
    "\n",
    "        if v:\n",
    "            print(\"-\" * 80)\n",
    "            print(n)\n",
    "            for x in ds.ratios():\n",
    "                print(f\"{x[0]:>15}: {x[1]:7.3f}\")\n",
    "                \n",
    "    return timings"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Consitency check"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ensure that every implementation delivers a consistent cluster result for test data sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:33:14.920803Z",
     "start_time": "2020-07-03T13:33:14.742980Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ca0fae9247d141c2bb371bd46631793f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Plot the test data sets\n",
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots(2, 3)\n",
    "Ax = ax.flatten()\n",
    "\n",
    "generation_fxns = [\n",
    "    gen_circles, gen_moons, gen_varied,\n",
    "    gen_aniso, gen_blobs, gen_no_structure,\n",
    "    ]\n",
    "\n",
    "for count, gen_fxn in enumerate(generation_fxns):\n",
    "    # Plot\n",
    "    plot_data(gen_fxn(5000), ax=Ax[count])\n",
    "    Ax[count].set(**ax_props)\n",
    "    \n",
    "    try:\n",
    "        name = gen_fxn.label\n",
    "    except AttributeError:\n",
    "        name = gen_fxn.__name__\n",
    "\n",
    "    Ax[count].set_title(f'{name}', fontsize=10, pad=4)\n",
    "    \n",
    "fig.subplots_adjust(\n",
    "    left=0, right=1, bottom=0.05, top=0.9, wspace=0, hspace=0.2 \n",
    "    )\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### scikit-learn DBSCAN"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-01T14:18:16.640771Z",
     "start_time": "2020-07-01T14:18:15.915825Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f09abd68e90c4e46a8f4eed72bb8b18f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Plot the test data sets\n",
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots(2, 3)\n",
    "Ax = ax.flatten()\n",
    "\n",
    "generation_fxns = [\n",
    "    gen_circles, gen_moons, gen_varied,\n",
    "    gen_aniso, gen_blobs, gen_no_structure,\n",
    "    ]\n",
    "\n",
    "fit_params = [\n",
    "    {\"eps\": 0.2, \"min_samples\": 5},    # circles\n",
    "    {\"eps\": 0.2, \"min_samples\": 5},    # moons\n",
    "    {\"eps\": 0.14, \"min_samples\": 20},  # varied\n",
    "    {\"eps\": 0.11, \"min_samples\": 20},  # aniso\n",
    "    {\"eps\": 0.2, \"min_samples\": 5},    # blobs\n",
    "    {\"eps\": 0.2, \"min_samples\": 5},    # no structure\n",
    "    ]\n",
    "\n",
    "for count, (gen_fxn, params) in enumerate(zip(generation_fxns, fit_params)):\n",
    "    # Fit\n",
    "    data = gen_fxn(5000)\n",
    "    labels = skcluster.dbscan(data, **params)[1]\n",
    "    \n",
    "    # Plot\n",
    "    plot_data(data, labels=labels, ax=Ax[count], noise=-1)\n",
    "    Ax[count].set(**ax_props)\n",
    "    \n",
    "    try:\n",
    "        name = gen_fxn.label\n",
    "    except AttributeError:\n",
    "        name = gen_fxn.__name__\n",
    "\n",
    "    Ax[count].set_title(f'{name}', fontsize=10, pad=4)\n",
    "    \n",
    "fig.subplots_adjust(\n",
    "    left=0, right=1, bottom=0.05, top=0.9, wspace=0, hspace=0.2 \n",
    "    )\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### scikit-learn-extra CommonNNClustering"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-01T14:19:56.537041Z",
     "start_time": "2020-07-01T14:19:55.462852Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "5c0b8fc6d8c34676aac33084aeb839b5",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Plot the test data sets\n",
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots(2, 3)\n",
    "Ax = ax.flatten()\n",
    "\n",
    "generation_fxns = [\n",
    "    gen_circles, gen_moons, gen_varied,\n",
    "    gen_aniso, gen_blobs, gen_no_structure,\n",
    "    ]\n",
    "\n",
    "fit_params = [\n",
    "    {\"eps\": 0.2, \"min_samples\": 5},    # circles\n",
    "    {\"eps\": 0.2, \"min_samples\": 5},    # moons\n",
    "    {\"eps\": 0.18, \"min_samples\": 20},  # varied\n",
    "    {\"eps\": 0.15, \"min_samples\": 10},  # aniso\n",
    "    {\"eps\": 0.2, \"min_samples\": 5},    # blobs\n",
    "    {\"eps\": 0.2, \"min_samples\": 5},    # no structure\n",
    "    ]\n",
    "\n",
    "for count, (gen_fxn, params) in enumerate(zip(generation_fxns, fit_params)):\n",
    "    # Fit\n",
    "    data = gen_fxn(5000)\n",
    "    labels = skextracluster.commonnn(data, **params)\n",
    "    \n",
    "    # Plot\n",
    "    plot_data(data, labels=labels, ax=Ax[count], noise=-1)\n",
    "    Ax[count].set(**ax_props)\n",
    "    \n",
    "    try:\n",
    "        name = gen_fxn.label\n",
    "    except AttributeError:\n",
    "        name = gen_fxn.__name__\n",
    "\n",
    "    Ax[count].set_title(f'{name}', fontsize=10, pad=4)\n",
    "    \n",
    "fig.subplots_adjust(\n",
    "    left=0, right=1, bottom=0.05, top=0.9, wspace=0, hspace=0.2 \n",
    "    )\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### cnnclustering CNN from points on-the-fly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-01T14:20:40.653039Z",
     "start_time": "2020-07-01T14:20:28.904360Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Execution time for call of fit: 0 hours, 0 minutes, 1.4432 seconds\n",
      "--------------------------------------------------------------------------------\n",
      "#points   R         C         min       max       #clusters %largest  %noise    \n",
      "5000      0.200     5         2         None      2         0.500     0.001     \n",
      "--------------------------------------------------------------------------------\n",
      "Execution time for call of fit: 0 hours, 0 minutes, 1.5315 seconds\n",
      "--------------------------------------------------------------------------------\n",
      "#points   R         C         min       max       #clusters %largest  %noise    \n",
      "5000      0.200     5         2         None      2         0.500     0.000     \n",
      "--------------------------------------------------------------------------------\n",
      "Execution time for call of fit: 0 hours, 0 minutes, 2.2066 seconds\n",
      "--------------------------------------------------------------------------------\n",
      "#points   R         C         min       max       #clusters %largest  %noise    \n",
      "5000      0.180     20        8         None      3         0.337     0.135     \n",
      "--------------------------------------------------------------------------------\n",
      "Execution time for call of fit: 0 hours, 0 minutes, 1.5084 seconds\n",
      "--------------------------------------------------------------------------------\n",
      "#points   R         C         min       max       #clusters %largest  %noise    \n",
      "5000      0.150     10        2         None      3         0.326     0.028     \n",
      "--------------------------------------------------------------------------------\n",
      "Execution time for call of fit: 0 hours, 0 minutes, 3.5528 seconds\n",
      "--------------------------------------------------------------------------------\n",
      "#points   R         C         min       max       #clusters %largest  %noise    \n",
      "5000      0.200     5         2         None      3         0.333     0.000     \n",
      "--------------------------------------------------------------------------------\n",
      "Execution time for call of fit: 0 hours, 0 minutes, 1.3277 seconds\n",
      "--------------------------------------------------------------------------------\n",
      "#points   R         C         min       max       #clusters %largest  %noise    \n",
      "5000      0.200     5         2         None      1         1.000     0.000     \n",
      "--------------------------------------------------------------------------------\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7a2ea8fd06cd423f98e1ca0b3456a2a6",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Plot the test data sets\n",
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots(2, 3)\n",
    "Ax = ax.flatten()\n",
    "\n",
    "generation_fxns = [\n",
    "    gen_circles, gen_moons, gen_varied,\n",
    "    gen_aniso, gen_blobs, gen_no_structure,\n",
    "    ]\n",
    "\n",
    "fit_params = [\n",
    "    {\"radius_cutoff\": 0.2, \"cnn_cutoff\": 5},                        # circles\n",
    "    {\"radius_cutoff\": 0.2, \"cnn_cutoff\": 5},                        # moons\n",
    "    {\"radius_cutoff\": 0.18, \"cnn_cutoff\": 20, \"member_cutoff\": 8},  # varied\n",
    "    {\"radius_cutoff\": 0.15, \"cnn_cutoff\": 10},                      # aniso\n",
    "    {\"radius_cutoff\": 0.2, \"cnn_cutoff\": 5},                        # blobs\n",
    "    {\"radius_cutoff\": 0.2, \"cnn_cutoff\": 5},                        # no structure\n",
    "    ]\n",
    "\n",
    "for count, (gen_fxn, params) in enumerate(zip(generation_fxns, fit_params)):\n",
    "    # Fit\n",
    "    data = gen_fxn(5000)\n",
    "    clustering = cnn.CNN(data)\n",
    "    clustering.fit(**params, rec=True, policy=\"conservative\")\n",
    "    \n",
    "    # Plot\n",
    "    plot_data(data, labels=clustering.labels, ax=Ax[count], noise=0)\n",
    "    Ax[count].set(**ax_props)\n",
    "    \n",
    "    try:\n",
    "        name = gen_fxn.label\n",
    "    except AttributeError:\n",
    "        name = gen_fxn.__name__\n",
    "\n",
    "    Ax[count].set_title(f'{name}', fontsize=10, pad=4)\n",
    "    \n",
    "fig.subplots_adjust(\n",
    "    left=0, right=1, bottom=0.05, top=0.9, wspace=0, hspace=0.2 \n",
    "    )\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### cnnclustering CNN from points bulk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-01T14:21:09.593746Z",
     "start_time": "2020-07-01T14:21:06.836883Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Execution time for call of fit: 0 hours, 0 minutes, 0.1380 seconds\n",
      "--------------------------------------------------------------------------------\n",
      "#points   R         C         min       max       #clusters %largest  %noise    \n",
      "5000      0.200     5         2         None      2         0.500     0.001     \n",
      "--------------------------------------------------------------------------------\n",
      "Execution time for call of fit: 0 hours, 0 minutes, 0.1805 seconds\n",
      "--------------------------------------------------------------------------------\n",
      "#points   R         C         min       max       #clusters %largest  %noise    \n",
      "5000      0.200     5         2         None      2         0.500     0.000     \n",
      "--------------------------------------------------------------------------------\n",
      "Execution time for call of fit: 0 hours, 0 minutes, 0.4791 seconds\n",
      "--------------------------------------------------------------------------------\n",
      "#points   R         C         min       max       #clusters %largest  %noise    \n",
      "5000      0.180     20        8         None      3         0.337     0.135     \n",
      "--------------------------------------------------------------------------------\n",
      "Execution time for call of fit: 0 hours, 0 minutes, 0.2958 seconds\n",
      "--------------------------------------------------------------------------------\n",
      "#points   R         C         min       max       #clusters %largest  %noise    \n",
      "5000      0.150     10        2         None      3         0.326     0.028     \n",
      "--------------------------------------------------------------------------------\n",
      "Execution time for call of fit: 0 hours, 0 minutes, 1.1661 seconds\n",
      "--------------------------------------------------------------------------------\n",
      "#points   R         C         min       max       #clusters %largest  %noise    \n",
      "5000      0.200     5         2         None      3         0.333     0.000     \n",
      "--------------------------------------------------------------------------------\n",
      "Execution time for call of fit: 0 hours, 0 minutes, 0.1053 seconds\n",
      "--------------------------------------------------------------------------------\n",
      "#points   R         C         min       max       #clusters %largest  %noise    \n",
      "5000      0.200     5         2         None      1         1.000     0.000     \n",
      "--------------------------------------------------------------------------------\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "26443472cc81459a9ca1f77a0a4e8c5b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Plot the test data sets\n",
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots(2, 3)\n",
    "Ax = ax.flatten()\n",
    "\n",
    "generation_fxns = [\n",
    "    gen_circles, gen_moons, gen_varied,\n",
    "    gen_aniso, gen_blobs, gen_no_structure,\n",
    "    ]\n",
    "\n",
    "fit_params = [\n",
    "    {\"radius_cutoff\": 0.2, \"cnn_cutoff\": 5},                        # circles\n",
    "    {\"radius_cutoff\": 0.2, \"cnn_cutoff\": 5},                        # moons\n",
    "    {\"radius_cutoff\": 0.18, \"cnn_cutoff\": 20, \"member_cutoff\": 8},  # varied\n",
    "    {\"radius_cutoff\": 0.15, \"cnn_cutoff\": 10},                      # aniso\n",
    "    {\"radius_cutoff\": 0.2, \"cnn_cutoff\": 5},                        # blobs\n",
    "    {\"radius_cutoff\": 0.2, \"cnn_cutoff\": 5},                        # no structure\n",
    "    ]\n",
    "\n",
    "for count, (gen_fxn, params) in enumerate(zip(generation_fxns, fit_params)):\n",
    "    # Fit\n",
    "    data = gen_fxn(5000)\n",
    "    clustering = cnn.CNN(data)\n",
    "    clustering.fit(**params, rec=True, policy=\"progressive\")\n",
    "    \n",
    "    # Plot\n",
    "    plot_data(data, labels=clustering.labels, ax=Ax[count], noise=0)\n",
    "    Ax[count].set(**ax_props)\n",
    "    \n",
    "    try:\n",
    "        name = gen_fxn.label\n",
    "    except AttributeError:\n",
    "        name = gen_fxn.__name__\n",
    "\n",
    "    Ax[count].set_title(f'{name}', fontsize=10, pad=4)\n",
    "    \n",
    "fig.subplots_adjust(\n",
    "    left=0, right=1, bottom=0.05, top=0.9, wspace=0, hspace=0.2 \n",
    "    )\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### cnnclustering CNN from distances on-the-fly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-01T14:21:29.190480Z",
     "start_time": "2020-07-01T14:21:22.259375Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2569b625cf40467d86b3f5f6903d94ae",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Plot the test data sets\n",
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots(2, 3)\n",
    "Ax = ax.flatten()\n",
    "\n",
    "generation_fxns = [\n",
    "    gen_circles, gen_moons, gen_varied,\n",
    "    gen_aniso, gen_blobs, gen_no_structure,\n",
    "    ]\n",
    "\n",
    "fit_params = [\n",
    "    {\"radius_cutoff\": 0.2, \"cnn_cutoff\": 5},                        # circles\n",
    "    {\"radius_cutoff\": 0.2, \"cnn_cutoff\": 5},                        # moons\n",
    "    {\"radius_cutoff\": 0.18, \"cnn_cutoff\": 20, \"member_cutoff\": 8},  # varied\n",
    "    {\"radius_cutoff\": 0.15, \"cnn_cutoff\": 10},                      # aniso\n",
    "    {\"radius_cutoff\": 0.2, \"cnn_cutoff\": 5},                        # blobs\n",
    "    {\"radius_cutoff\": 0.2, \"cnn_cutoff\": 5},                        # no structure\n",
    "    ]\n",
    "\n",
    "for count, (gen_fxn, params) in enumerate(zip(generation_fxns, fit_params)):\n",
    "    # Fit\n",
    "    data = gen_fxn(5000)\n",
    "    clustering = cnn.CNN(data)\n",
    "    clustering.calc_dist()\n",
    "    clustering.fit(**params, rec=False, policy=\"conservative\")\n",
    "    \n",
    "    # Plot\n",
    "    plot_data(data, labels=clustering.labels, ax=Ax[count], noise=0)\n",
    "    Ax[count].set(**ax_props)\n",
    "    \n",
    "    try:\n",
    "        name = gen_fxn.label\n",
    "    except AttributeError:\n",
    "        name = gen_fxn.__name__\n",
    "\n",
    "    Ax[count].set_title(f'{name}', fontsize=10, pad=4)\n",
    "    \n",
    "fig.subplots_adjust(\n",
    "    left=0, right=1, bottom=0.05, top=0.9, wspace=0, hspace=0.2 \n",
    "    )\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### cnnclustering CNN from distances bulk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-01T14:21:46.671467Z",
     "start_time": "2020-07-01T14:21:44.180198Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "737b1163f1344edf92102e225b1abf59",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Plot the test data sets\n",
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots(2, 3)\n",
    "Ax = ax.flatten()\n",
    "\n",
    "generation_fxns = [\n",
    "    gen_circles, gen_moons, gen_varied,\n",
    "    gen_aniso, gen_blobs, gen_no_structure,\n",
    "    ]\n",
    "\n",
    "fit_params = [\n",
    "    {\"radius_cutoff\": 0.2, \"cnn_cutoff\": 5},                        # circles\n",
    "    {\"radius_cutoff\": 0.2, \"cnn_cutoff\": 5},                        # moons\n",
    "    {\"radius_cutoff\": 0.18, \"cnn_cutoff\": 20, \"member_cutoff\": 8},  # varied\n",
    "    {\"radius_cutoff\": 0.15, \"cnn_cutoff\": 10},                      # aniso\n",
    "    {\"radius_cutoff\": 0.2, \"cnn_cutoff\": 5},                        # blobs\n",
    "    {\"radius_cutoff\": 0.2, \"cnn_cutoff\": 5},                        # no structure\n",
    "    ]\n",
    "\n",
    "for count, (gen_fxn, params) in enumerate(zip(generation_fxns, fit_params)):\n",
    "    # Fit\n",
    "    data = gen_fxn(5000)\n",
    "    clustering = cnn.CNN(data)\n",
    "    clustering.calc_dist()\n",
    "    clustering.fit(**params, rec=False, policy=\"progressive\")\n",
    "    \n",
    "    # Plot\n",
    "    plot_data(data, labels=clustering.labels, ax=Ax[count], noise=0)\n",
    "    Ax[count].set(**ax_props)\n",
    "    \n",
    "    try:\n",
    "        name = gen_fxn.label\n",
    "    except AttributeError:\n",
    "        name = gen_fxn.__name__\n",
    "\n",
    "    Ax[count].set_title(f'{name}', fontsize=10, pad=4)\n",
    "    \n",
    "fig.subplots_adjust(\n",
    "    left=0, right=1, bottom=0.05, top=0.9, wspace=0, hspace=0.2 \n",
    "    )\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### scikit-learn OPTICS (DBSCAN)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T14:10:57.819794Z",
     "start_time": "2020-07-03T14:10:32.348434Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "408fcdce575d42ed9c5dcd0f82a58f29",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Plot the test data sets\n",
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots(2, 3)\n",
    "Ax = ax.flatten()\n",
    "\n",
    "generation_fxns = [\n",
    "    gen_circles, gen_moons, gen_varied,\n",
    "    gen_aniso, gen_blobs, gen_no_structure,\n",
    "    ]\n",
    "\n",
    "fit_params = [\n",
    "    {\"max_eps\": 0.25, \"min_samples\": 5, \"cluster_method\": \"dbscan\"},    # circles\n",
    "    {\"max_eps\": 0.25, \"min_samples\": 5, \"cluster_method\": \"dbscan\"},    # moons\n",
    "    {\"max_eps\": 0.125, \"min_samples\": 20, \"cluster_method\": \"dbscan\"},   # varied\n",
    "    {\"max_eps\": 0.15, \"min_samples\": 20, \"cluster_method\": \"dbscan\"},   # aniso\n",
    "    {\"max_eps\": 1, \"min_samples\": 5, \"cluster_method\": \"dbscan\"},    # blobs\n",
    "    {\"max_eps\": 1, \"min_samples\": 5, \"cluster_method\": \"dbscan\"},    # no structure\n",
    "    ]\n",
    "\n",
    "for count, (gen_fxn, params) in enumerate(zip(generation_fxns, fit_params)):\n",
    "    # Fit\n",
    "    data = gen_fxn(5000)\n",
    "    clustering = skcluster.OPTICS(**params)\n",
    "    clustering.fit(data)\n",
    "    \n",
    "    # Plot\n",
    "    plot_data(data, labels=clustering.labels_, ax=Ax[count], noise=-1)\n",
    "    Ax[count].set(**ax_props)\n",
    "    \n",
    "    try:\n",
    "        name = gen_fxn.label\n",
    "    except AttributeError:\n",
    "        name = gen_fxn.__name__\n",
    "\n",
    "    Ax[count].set_title(f'{name}', fontsize=10, pad=4)\n",
    "    \n",
    "fig.subplots_adjust(\n",
    "    left=0, right=1, bottom=0.05, top=0.9, wspace=0, hspace=0.2 \n",
    "    )\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### scikit-learn OPTICS (XI)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T14:32:09.331107Z",
     "start_time": "2020-07-03T14:31:33.840578Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "74c7776f092b4c769fc9248c113c336c",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Plot the test data sets\n",
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots(2, 3)\n",
    "Ax = ax.flatten()\n",
    "\n",
    "generation_fxns = [\n",
    "    gen_circles, gen_moons, gen_varied,\n",
    "    gen_aniso, gen_blobs, gen_no_structure,\n",
    "    ]\n",
    "\n",
    "fit_params = [\n",
    "    {\"min_samples\": 5, \"xi\": 0.05, \"min_cluster_size\": 0.4},    # circles\n",
    "    {\"min_samples\": 5, \"xi\": 0.05, \"min_cluster_size\": 0.4},    # moons\n",
    "    {\"min_samples\": 10, \"xi\": 0.01, \"min_cluster_size\": 0.2},   # varied\n",
    "    {\"min_samples\": 20, \"xi\": 0.03, \"min_cluster_size\": 0.1},   # aniso\n",
    "    {\"min_samples\": 5, \"xi\": 0.05, \"min_cluster_size\": 0.3},    # blobs\n",
    "    {\"min_samples\": 20, \"xi\": 0.2, \"min_cluster_size\": 0.2},    # no structure\n",
    "    ]\n",
    "\n",
    "for count, (gen_fxn, params) in enumerate(zip(generation_fxns, fit_params)):\n",
    "    # Fit\n",
    "    data = gen_fxn(5000)\n",
    "    clustering = skcluster.OPTICS(**params)\n",
    "    clustering.fit(data)\n",
    "    \n",
    "    # Plot\n",
    "    plot_data(data, labels=clustering.labels_, ax=Ax[count], noise=-1)\n",
    "    Ax[count].set(**ax_props)\n",
    "    \n",
    "    try:\n",
    "        name = gen_fxn.label\n",
    "    except AttributeError:\n",
    "        name = gen_fxn.__name__\n",
    "\n",
    "    Ax[count].set_title(f'{name}', fontsize=10, pad=4)\n",
    "    \n",
    "fig.subplots_adjust(\n",
    "    left=0, right=1, bottom=0.05, top=0.9, wspace=0, hspace=0.2 \n",
    "    )\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Timings"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Blobs set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:33:19.625944Z",
     "start_time": "2020-07-03T13:33:19.618524Z"
    }
   },
   "outputs": [],
   "source": [
    "benchmark_signatures = [\n",
    "    (\"DBSCAN\", skcluster.dbscan, (\"DS_ATTR:points\", ), {\"default\": {\"eps\": 0.2, \"min_samples\": 5}}),\n",
    "    (\"CommonNN\", skextracluster.commonnn, (\"DS_ATTR:points\", ), {\"default\": {\"eps\": 0.2, \"min_samples\": 5}}),\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:49:37.907435Z",
     "start_time": "2020-07-03T13:46:47.397743Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------------------------------------------------------------------------\n",
      "20000\n",
      "         DBSCAN:   1.000\n",
      "       CommonNN:  16.444\n"
     ]
    }
   ],
   "source": [
    "# TIMINGS = {}\n",
    "TIMINGS.update(time_runs(benchmark_signatures, gen_blobs, samples=[20000]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-03T13:49:38.253002Z",
     "start_time": "2020-07-03T13:49:38.220499Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "147f05c641b74751845a612e290541cf",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "fig, ax = plt.subplots()\n",
    "x = TIMINGS.keys()\n",
    "\n",
    "lines = []\n",
    "for l in [\"DBSCAN\", \"CommonNN\"]:\n",
    "    x, y = zip(*[\n",
    "        [k, v[l].average]\n",
    "        for k, v in TIMINGS.items()\n",
    "        if v.get(l, None) is not None\n",
    "    ])\n",
    "    lines.append(ax.plot(x, y))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-01T16:04:22.174644Z",
     "start_time": "2020-07-01T16:04:22.164313Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[200, 0.001104492306285725],\n",
       " [2000, 0.0010950690424282844],\n",
       " [5000, 0.0010891374409994958]]"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fit variants"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Benchmarks for different approaches to the common-nearest-neighbours fit function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-27T14:17:17.363759Z",
     "start_time": "2020-05-27T14:17:17.347320Z"
    },
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "# Benchmark results will be saved under:\n",
    "report_dir = \"reports/T460\"\n",
    "# report_dir = \"reports/qcw21\"\n",
    "# report_dir = \"reports/qcm07\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### From neighbours"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-12T15:33:50.824853Z",
     "start_time": "2020-05-12T15:33:50.812778Z"
    }
   },
   "source": [
    "Tests of fit functions taking pre-computed neighbourhoods as input. Two ways of setting up the benchmarks are provided:\n",
    "  - Use `prepare_neighbours` function to quickly generate the input data\n",
    "  - Use `DS` benchmark class to organise different runs on essentially the same data set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-01T14:51:56.363867Z",
     "start_time": "2020-07-01T14:51:56.356662Z"
    }
   },
   "outputs": [],
   "source": [
    "ds = DS(gen_circles, 100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### From list of sets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-07-01T14:31:06.950572Z",
     "start_time": "2020-07-01T14:31:06.880307Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "circles_2000_n0.5\n"
     ]
    }
   ],
   "source": [
    "# Prepare neighbours as list of sets\n",
    "ds.prepare_neighbourhoods(0.5)\n",
    "print(ds)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Baseline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T08:36:25.945104Z",
     "start_time": "2020-05-14T08:36:21.879559Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Length of labels:    2000\n",
      "Noise:               0\n",
      "Largest:             1000\n",
      "Clusters:            2\n",
      "\n",
      "*** Profile printout saved to text file 'reports/T460/fit_from_neighbours_baseline_circles_2000_n0.5.lprun'. \n",
      "256 ms ± 5.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "# Implementation using original implementation\n",
    "profile_fxn(\n",
    "    fits.fit_from_neighbours_baseline,\n",
    "    ds, report_dir,\n",
    "    20, ds.neighbours,  # function args\n",
    "    label=\"baseline\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Stdlib index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T08:52:24.245627Z",
     "start_time": "2020-05-15T08:52:24.137235Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "119ea1be4fe1431290aa01daefc904aa",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots()\n",
    "plot_data(ds.points, labels=np.asarray(fits.fit_stdlib_from_neighbours_index(20, ds.neighbourhoods)), ax=ax, noise=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T08:51:43.228406Z",
     "start_time": "2020-05-15T08:51:38.940902Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Length of labels:    2000\n",
      "Noise:               0\n",
      "Largest:             1000\n",
      "Clusters:            2\n",
      "\n",
      "*** Profile printout saved to text file 'reports/T460/fit_stdlib_from_neighbours_index_circles_2000_n0.5.lprun'. \n",
      "36.3 ms ± 88 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "# Implementation using only standard library\n",
    "profile_fxn(\n",
    "    fits.fit_stdlib_from_neighbours_index,\n",
    "    ds, report_dir,\n",
    "    20, ds.neighbourhoods,  # function args\n",
    "    label=\"std_index\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T08:36:54.661424Z",
     "start_time": "2020-05-14T08:36:54.525964Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "37342e9248a54c9790490b54608d2300",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots()\n",
    "plot_data(ds.points, labels=np.asarray(fits.fit_stdlib_from_neighbours_loop(20, ds.neighbours)), ax=ax, noise=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T08:37:57.659941Z",
     "start_time": "2020-05-14T08:37:52.727558Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Length of labels:    2000\n",
      "Noise:               0\n",
      "Largest:             1000\n",
      "Clusters:            2\n",
      "\n",
      "*** Profile printout saved to text file 'reports/T460/fit_stdlib_from_neighbours_loop_circles_2000_n0.5.lprun'. \n",
      "40 ms ± 511 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "# Implementation using only standard library\n",
    "profile_fxn(\n",
    "    fits.fit_stdlib_from_neighbours_loop,\n",
    "    ds, report_dir,\n",
    "    20, ds.neighbours,  # function args\n",
    "    label=\"std_loop\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T08:38:38.662331Z",
     "start_time": "2020-05-14T08:38:38.561090Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9e84d8619ff243778e46f17695456c71",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots()\n",
    "plot_data(ds.points, labels=np.asarray(fits.fit_stdlib_from_neighbours_loop_membercheck(20, ds.neighbours)), ax=ax, noise=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T08:38:50.879587Z",
     "start_time": "2020-05-14T08:38:45.879765Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Length of labels:    2000\n",
      "Noise:               0\n",
      "Largest:             1000\n",
      "Clusters:            2\n",
      "\n",
      "*** Profile printout saved to text file 'reports/T460/fit_stdlib_from_neighbours_loop_membercheck_circles_2000_n0.5.lprun'. \n",
      "41.8 ms ± 3.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "# Implementation using only standard library\n",
    "profile_fxn(\n",
    "    fits.fit_stdlib_from_neighbours_loop_membercheck,\n",
    "    ds, report_dir,\n",
    "    20, ds.neighbours,  # function args\n",
    "    label=\"std_loop_membercheck\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Stdlib cython"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T09:23:46.197722Z",
     "start_time": "2020-05-15T09:23:46.122099Z"
    },
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b15c65fe167740c9afdc269b14d9dfb5",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots()\n",
    "plot_data(ds.points, labels=np.asarray(_cfits.fit_from_neighbours(20, ds.neighbourhoods)), ax=ax, noise=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T09:24:06.967203Z",
     "start_time": "2020-05-15T09:24:06.852622Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Length of labels:    2000\n",
      "Noise:               0\n",
      "Largest:             1000\n",
      "Clusters:            2\n"
     ]
    },
    {
     "ename": "ValueError",
     "evalue": "max() arg is an empty sequence",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-31-14cc7f3c97a4>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[1;31m# Implementation using only standard library\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m profile_fxn(\n\u001b[0m\u001b[0;32m      3\u001b[0m     \u001b[0m_cfits\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfit_from_neighbours\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      4\u001b[0m     \u001b[0mds\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mreport_dir\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      5\u001b[0m     \u001b[1;36m20\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mds\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mneighbourhoods\u001b[0m\u001b[1;33m,\u001b[0m  \u001b[1;31m# function args\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32m<ipython-input-22-1405b8a32e8f>\u001b[0m in \u001b[0;36mprofile_fxn\u001b[1;34m(f, ds, report_dir, t, l, m, label, *args, **kwargs)\u001b[0m\n\u001b[0;32m     68\u001b[0m     \u001b[1;31m# Profile\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     69\u001b[0m     \u001b[1;32mif\u001b[0m \u001b[0ml\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 70\u001b[1;33m         \u001b[0mget_ipython\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mrun_line_magic\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'lprun'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'-T {report_dir}/{fxn.__name__}_{ds.__str__()}.lprun -f fxn fxn(*fxn_args, **fxn_kwargs)'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m     71\u001b[0m     \u001b[1;32mif\u001b[0m \u001b[0mt\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     72\u001b[0m         \u001b[0mo\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mget_ipython\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mrun_line_magic\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'timeit'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'-o fxn(*fxn_args, **fxn_kwargs)'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32m~/.local/share/virtualenvs/CNN-5gkgQAOT/lib/python3.8/site-packages/IPython/core/interactiveshell.py\u001b[0m in \u001b[0;36mrun_line_magic\u001b[1;34m(self, magic_name, line, _stack_depth)\u001b[0m\n\u001b[0;32m   2315\u001b[0m                 \u001b[0mkwargs\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'local_ns'\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0msys\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_getframe\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mstack_depth\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mf_locals\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   2316\u001b[0m             \u001b[1;32mwith\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mbuiltin_trap\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2317\u001b[1;33m                 \u001b[0mresult\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mfn\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m*\u001b[0m\u001b[0margs\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   2318\u001b[0m             \u001b[1;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   2319\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32m<decorator-gen-129>\u001b[0m in \u001b[0;36mlprun\u001b[1;34m(self, parameter_s)\u001b[0m\n",
      "\u001b[1;32m~/.local/share/virtualenvs/CNN-5gkgQAOT/lib/python3.8/site-packages/IPython/core/magic.py\u001b[0m in \u001b[0;36m<lambda>\u001b[1;34m(f, *a, **k)\u001b[0m\n\u001b[0;32m    185\u001b[0m     \u001b[1;31m# but it's overkill for just that one bit of state.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    186\u001b[0m     \u001b[1;32mdef\u001b[0m \u001b[0mmagic_deco\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0marg\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 187\u001b[1;33m         \u001b[0mcall\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;32mlambda\u001b[0m \u001b[0mf\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m*\u001b[0m\u001b[0ma\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mk\u001b[0m\u001b[1;33m:\u001b[0m \u001b[0mf\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m*\u001b[0m\u001b[0ma\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mk\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    188\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    189\u001b[0m         \u001b[1;32mif\u001b[0m \u001b[0mcallable\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0marg\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32m~/.local/share/virtualenvs/CNN-5gkgQAOT/lib/python3.8/site-packages/line_profiler/line_profiler.py\u001b[0m in \u001b[0;36mlprun\u001b[1;34m(self, parameter_s)\u001b[0m\n\u001b[0;32m    374\u001b[0m         \u001b[1;31m# Trap text output.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    375\u001b[0m         \u001b[0mstdout_trap\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mStringIO\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 376\u001b[1;33m         \u001b[0mprofile\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mprint_stats\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mstdout_trap\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0moutput_unit\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0moutput_unit\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mstripzeros\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;34m's'\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mopts\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    377\u001b[0m         \u001b[0moutput\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mstdout_trap\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mgetvalue\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    378\u001b[0m         \u001b[0moutput\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0moutput\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mrstrip\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32m~/.local/share/virtualenvs/CNN-5gkgQAOT/lib/python3.8/site-packages/line_profiler/line_profiler.py\u001b[0m in \u001b[0;36mprint_stats\u001b[1;34m(self, stream, output_unit, stripzeros)\u001b[0m\n\u001b[0;32m    142\u001b[0m         \"\"\"\n\u001b[0;32m    143\u001b[0m         \u001b[0mlstats\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_stats\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 144\u001b[1;33m         \u001b[0mshow_text\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mlstats\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtimings\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mlstats\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0munit\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0moutput_unit\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0moutput_unit\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mstream\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mstream\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mstripzeros\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mstripzeros\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    145\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    146\u001b[0m     \u001b[1;32mdef\u001b[0m \u001b[0mrun\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcmd\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32m~/.local/share/virtualenvs/CNN-5gkgQAOT/lib/python3.8/site-packages/line_profiler/line_profiler.py\u001b[0m in \u001b[0;36mshow_text\u001b[1;34m(stats, unit, output_unit, stream, stripzeros)\u001b[0m\n\u001b[0;32m    263\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    264\u001b[0m     \u001b[1;32mfor\u001b[0m \u001b[1;33m(\u001b[0m\u001b[0mfn\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mlineno\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mname\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mtimings\u001b[0m \u001b[1;32min\u001b[0m \u001b[0msorted\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mstats\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 265\u001b[1;33m         show_func(fn, lineno, name, stats[fn, lineno, name], unit,\n\u001b[0m\u001b[0;32m    266\u001b[0m             output_unit=output_unit, stream=stream, stripzeros=stripzeros)\n\u001b[0;32m    267\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32m~/.local/share/virtualenvs/CNN-5gkgQAOT/lib/python3.8/site-packages/line_profiler/line_profiler.py\u001b[0m in \u001b[0;36mshow_func\u001b[1;34m(filename, start_lineno, func_name, timings, unit, output_unit, stream, stripzeros)\u001b[0m\n\u001b[0;32m    227\u001b[0m         \u001b[0mstream\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mwrite\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"Continuing without the function's contents.\\n\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    228\u001b[0m         \u001b[1;31m# Fake empty lines so we can see the timings, if not the code.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 229\u001b[1;33m         \u001b[0mnlines\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mmax\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mlinenos\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;33m-\u001b[0m \u001b[0mmin\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmin\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mlinenos\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mstart_lineno\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;33m+\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    230\u001b[0m         \u001b[0msublines\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m[\u001b[0m\u001b[1;34m''\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m*\u001b[0m \u001b[0mnlines\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    231\u001b[0m     \u001b[1;32mfor\u001b[0m \u001b[0mlineno\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnhits\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mtime\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mtimings\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mValueError\u001b[0m: max() arg is an empty sequence"
     ]
    }
   ],
   "source": [
    "# Implementation using only standard library\n",
    "profile_fxn(\n",
    "    _cfits.fit_from_neighbours,\n",
    "    ds, report_dir,\n",
    "    20, ds.neighbourhoods,  # function args\n",
    "    label=\"std_cython\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T09:24:16.712935Z",
     "start_time": "2020-05-15T09:24:14.570571Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "25.8 ms ± 541 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit _cfits.fit_from_neighbours(20, ds.neighbourhoods)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-12T16:01:15.725401Z",
     "start_time": "2020-05-12T16:01:15.714863Z"
    }
   },
   "source": [
    "#### From numpy.array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T09:22:18.236029Z",
     "start_time": "2020-05-14T09:22:18.209195Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<module 'snippets.fits' from '/home/janjoswig/CNN/tests/benchmark/snippets/fits.py'>"
      ]
     },
     "execution_count": 136,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "importlib.reload(fits)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T08:43:40.770080Z",
     "start_time": "2020-05-14T08:43:40.573675Z"
    }
   },
   "outputs": [],
   "source": [
    "# Switch to neighbourhoods as numpy.array of numpy.arrays\n",
    "ds.prepare_neighbours(0.5, numpy=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 142,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T09:25:57.501463Z",
     "start_time": "2020-05-14T09:25:57.055907Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "1d986c8eff4b40288ec76bc65e64b913",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots()\n",
    "plot_data(ds.points, labels=baseline_to_labels(fits.fit_from_neighbours_baseline(20, ds.neighbours)), ax=ax, noise=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 140,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T09:24:46.767373Z",
     "start_time": "2020-05-14T09:24:41.204169Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Length of labels:    2000\n",
      "Noise:               0\n",
      "Largest:             1000\n",
      "Clusters:            2\n",
      "\n",
      "*** Profile printout saved to text file 'reports/T460/fit_from_neighbours_baseline_circles_2000_n0.5.lprun'. \n",
      "368 ms ± 57.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "# Implementation using original implementation\n",
    "profile_fxn(\n",
    "    fits.fit_from_neighbours_baseline,\n",
    "    ds, report_dir,\n",
    "    20, ds.neighbours,  # function args\n",
    "    label=\"baseline\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 137,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T09:22:19.737235Z",
     "start_time": "2020-05-14T09:22:19.521795Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "77b1d640f4734322a1545d44a18aedf3",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots()\n",
    "plot_data(ds.points, labels=fits.fit_numpy_mix(20, ds.neighbours), ax=ax, noise=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 143,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T09:26:19.807908Z",
     "start_time": "2020-05-14T09:26:07.501844Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Length of labels:    2000\n",
      "Noise:               0\n",
      "Largest:             1000\n",
      "Clusters:            2\n",
      "\n",
      "*** Profile printout saved to text file 'reports/T460/fit_numpy_mix_circles_2000_n0.5.lprun'. \n",
      "146 ms ± 1.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "# Implementation using numpy\n",
    "profile_fxn(\n",
    "    fits.fit_numpy_mix,\n",
    "    ds, report_dir,\n",
    "    20, ds.neighbours,  # function args\n",
    "    label=\"numpy_index\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T08:43:41.819301Z",
     "start_time": "2020-05-14T08:43:41.643328Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d31e7529f6ea43179ee5343718778ead",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots()\n",
    "plot_data(ds.points, labels=np.asarray(fits.fit_numpy_from_neighbours_index(20, ds.neighbours)), ax=ax, noise=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 139,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T09:23:09.065595Z",
     "start_time": "2020-05-14T09:22:59.353230Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Length of labels:    2000\n",
      "Noise:               0\n",
      "Largest:             1000\n",
      "Clusters:            2\n",
      "\n",
      "*** Profile printout saved to text file 'reports/T460/fit_numpy_from_neighbours_index_circles_2000_n0.5.lprun'. \n",
      "103 ms ± 19.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "# Implementation using numpy\n",
    "profile_fxn(\n",
    "    fits.fit_numpy_from_neighbours_index,\n",
    "    ds, report_dir,\n",
    "    20, ds.neighbours,  # function args\n",
    "    label=\"numpy_index\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T08:44:29.094125Z",
     "start_time": "2020-05-14T08:44:28.898125Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "060cd7d76eec415d8206fc8d8cd883c9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots()\n",
    "plot_data(ds.points, labels=np.asarray(fits.fit_numpy_from_neighbours_loop(20, ds.neighbours)), ax=ax, noise=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T08:44:53.210410Z",
     "start_time": "2020-05-14T08:44:43.962404Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Length of labels:    2000\n",
      "Noise:               0\n",
      "Largest:             1000\n",
      "Clusters:            2\n",
      "\n",
      "*** Profile printout saved to text file 'reports/T460/fit_numpy_from_neighbours_loop_circles_2000_n0.5.lprun'. \n",
      "96.9 ms ± 2.24 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "# Implementation using numpy\n",
    "profile_fxn(\n",
    "    fits.fit_numpy_from_neighbours_loop,\n",
    "    ds, report_dir,\n",
    "    20, ds.neighbours,  # function args\n",
    "    label=\"numpy_loop\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T08:45:39.229926Z",
     "start_time": "2020-05-14T08:45:39.098221Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "646a4ca25ef1436a95b77c8503d1b2c1",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots()\n",
    "plot_data(ds.points, labels=np.asarray(fits.fit_numpy_from_neighbours_filtermembers(20, ds.neighbours)), ax=ax, noise=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T09:07:18.215444Z",
     "start_time": "2020-05-14T09:07:14.430605Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Length of labels:    2000\n",
      "Noise:               0\n",
      "Largest:             1000\n",
      "Clusters:            2\n",
      "\n",
      "*** Profile printout saved to text file 'reports/T460/fit_numpy_from_neighbours_filtermembers_circles_2000_n0.5.lprun'. \n",
      "43.8 ms ± 965 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "# Implementation using numpy\n",
    "profile_fxn(\n",
    "    fits.fit_numpy_from_neighbours_filtermembers,\n",
    "    ds, report_dir,\n",
    "    20, ds.neighbours,  # function args\n",
    "    label=\"numpy_filter\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T09:20:12.936113Z",
     "start_time": "2020-05-14T09:20:12.913045Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<module 'snippets.fits' from '/home/janjoswig/CNN/tests/benchmark/snippets/fits.py'>"
      ]
     },
     "execution_count": 132,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "importlib.reload(fits)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T09:06:58.864939Z",
     "start_time": "2020-05-14T09:06:58.744883Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "dc91ed8c4c8b408eb8185bb08e4b8ca4",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.close(\"all\")\n",
    "fig, ax = plt.subplots()\n",
    "plot_data(ds.points, labels=np.asarray(fits.fit_numpy_from_neighbours_membercheck(20, ds.neighbours)), ax=ax, noise=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-14T09:07:52.463530Z",
     "start_time": "2020-05-14T09:07:48.498782Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Length of labels:    2000\n",
      "Noise:               0\n",
      "Largest:             1000\n",
      "Clusters:            2\n",
      "\n",
      "*** Profile printout saved to text file 'reports/T460/fit_numpy_from_neighbours_membercheck_circles_2000_n0.5.lprun'. \n",
      "46.7 ms ± 4.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "# Implementation using numpy\n",
    "profile_fxn(\n",
    "    fits.fit_numpy_from_neighbours_membercheck,\n",
    "    ds, report_dir,\n",
    "    20, ds.neighbours,  # function args\n",
    "    label=\"numpy_filter\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-12T16:37:21.000832Z",
     "start_time": "2020-05-12T16:37:20.958042Z"
    }
   },
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "Buffer dtype mismatch, expected 'npy_intp' but got 'double'",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-73-d8c72077d3b4>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[1;31m# Implementation using cythonised numpy\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m profile_fxn(\n\u001b[0m\u001b[0;32m      3\u001b[0m     \u001b[0mcfits\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcfit_from_neighbours\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      4\u001b[0m     \u001b[0mds\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mreport_dir\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      5\u001b[0m     \u001b[1;36m1\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mds\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mneighbours\u001b[0m\u001b[1;33m,\u001b[0m  \u001b[1;31m# function args\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32m<ipython-input-65-3b108bdce0ce>\u001b[0m in \u001b[0;36mprofile_fxn\u001b[1;34m(f, ds, report_dir, t, l, m, label, *args, **kwargs)\u001b[0m\n\u001b[0;32m     44\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     45\u001b[0m     \u001b[1;31m# Validate function result\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 46\u001b[1;33m     \u001b[0mresult\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mfxn\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m*\u001b[0m\u001b[0mfxn_args\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mfxn_kwargs\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m     47\u001b[0m     \u001b[1;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mlist\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;32mand\u001b[0m \u001b[0misinstance\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mndarray\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     48\u001b[0m         \u001b[1;31m# Convert result if from original implementation\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32m~/CNN/tests/benchmark/snippets/cfits.pyx\u001b[0m in \u001b[0;36mcfits.cfit_from_neighbours\u001b[1;34m()\u001b[0m\n",
      "\u001b[1;31mValueError\u001b[0m: Buffer dtype mismatch, expected 'npy_intp' but got 'double'"
     ]
    }
   ],
   "source": [
    "# Implementation using cythonised numpy\n",
    "profile_fxn(\n",
    "    cfits.cfit_from_neighbours,\n",
    "    ds, report_dir,\n",
    "    1, ds.neighbours,  # function args\n",
    "    label=\"cython_numpy_loop\",\n",
    "    l=False\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T09:26:31.530988Z",
     "start_time": "2020-05-15T09:26:31.521665Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "        std_cnn:   1.000\n"
     ]
    }
   ],
   "source": [
    "for x in ds.ratios():\n",
    "    print(f\"{x[0]:>15}: {x[1]:7.3f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Check in CNN class context"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-26T08:15:54.189879Z",
     "start_time": "2020-05-26T08:15:51.524120Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "*** Profile printout saved to text file 'reports/T460/fit_circles_2000_n0.5.lprun'. \n",
      "31.7 ms ± 447 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "cobj = cnn.CNN(neighbourhoods=cnn.NeighbourhoodsList(ds.neighbourhoods, 0.5))\n",
    "profile_fxn(\n",
    "    cobj.fit,\n",
    "    ds, report_dir,\n",
    "    0.5, 20,  # function args\n",
    "    label=\"std_cnn\",\n",
    "    rec=False\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-26T08:16:11.896565Z",
     "start_time": "2020-05-26T08:16:11.880553Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Labels([1, 2, 2, ..., 1, 2, 2])"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cobj.labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T08:46:58.901004Z",
     "start_time": "2020-05-15T08:46:58.895828Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<module 'cfits' from '/home/janjoswig/CNN/cfits.cpython-38-x86_64-linux-gnu.so'>"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "importlib.reload(cfits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### From density graph"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### From SparsegraphArray"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-26T08:30:14.241174Z",
     "start_time": "2020-05-26T08:30:14.231270Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<module 'core._cfits' from '/home/janjoswig/CNN/core/_cfits.cpython-38-x86_64-linux-gnu.so'>"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "importlib.reload(cnn)\n",
    "importlib.reload(_cfits)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-26T08:30:41.948476Z",
     "start_time": "2020-05-26T08:30:40.072104Z"
    }
   },
   "outputs": [],
   "source": [
    "Graph = cnn.SparsegraphArray(*_cfits.NeighbourhoodsList2SparsegraphArray(ds.neighbourhoods, 20))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-26T08:30:43.299838Z",
     "start_time": "2020-05-26T08:30:43.296208Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2001"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Graph._indices.shape[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-26T08:32:39.735687Z",
     "start_time": "2020-05-26T08:32:36.270983Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "417 µs ± 9.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit labels = cnn.Labels(_cfits.bfs_SparsegraphArray(Graph, Graph._indices))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-26T08:31:13.029070Z",
     "start_time": "2020-05-26T08:31:13.012730Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Labels([1, 2, 2, ..., 1, 2, 2])"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "labels"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Initialization Cell",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "164.98px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
