Metadata-Version: 2.1
Name: splink
Version: 0.1.4
Summary: WORK IN PROGRESS:  Implementation in Apache Spark of the EM algorithm to estimate parameters of Fellegi-Sunter's canonical model of record linkage.
Home-page: https://github.com/moj-analytical-services/splink
License: MIT
Author: Robin Linacre
Author-email: robinlinacre@hotmail.com
Requires-Python: >=3.6,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Dist: jsonschema (>=3.2,<4.0)
Project-URL: Repository, https://github.com/moj-analytical-services/splink
Description-Content-Type: text/markdown

[![Coverage Status](https://coveralls.io/repos/github/moj-analytical-services/splink/badge.svg?branch=dev)](https://coveralls.io/github/moj-analytical-services/splink?branch=dev)
![issues-status](https://img.shields.io/github/issues-raw/moj-analytical-services/splink)
![python-version-dependency](https://img.shields.io/badge/python-%3E%3D3.6-blue)


# splink: Probabalistic record linkage at scale

WARNING:  Splink is work in progress and is currently in beta testing.  Please feel free to try it, but note this software is not fully tested, and the interface is likely to continue to change.

`splink` implements Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters of the model.

The aim of `splink` is to:

- Work at much greater scale than current open source implementations (100 million records +).

- Get results faster than current open source implementations - with runtimes of less than an hour.

- Have a highly transparent methodology, so the match scores can be easily explained both graphically and in words

- Have accuracy similar to some of the best alternatives

## Interactive demo

You can run demos of `splink` in an interactive Jupyter notebook by clicking the button below:

[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/moj-analytical-services/splink_demos/master?urlpath=lab/tree/index.ipynb)

## Documentation

Better docs to come. The best documentation is currently a series of demonstrations notebooks in the [splink_demos](https://github.com/moj-analytical-services/splink_demos) repo.

## Acknowledgements

We are grateful to [ADR UK](https://www.adruk.org/) (Administrative Data Research UK) for providing funding for this work as part of the [Data First](https://www.adruk.org/our-work/browse-all-projects/data-first-harnessing-the-potential-of-linked-administrative-data-for-the-justice-system-169/) project.
