Metadata-Version: 2.0
Name: sketchtml
Version: 0.0.2
Summary: Helper library to experiment with HTML fingerprinting.
Home-page: https://github.com/redapple/sketchtml
Author: Paul Tremberth
Author-email: paul.tremberth@gmail.com
License: MIT license
Keywords: sketchtml
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Requires-Dist: lxml

=========
SketcHTML
=========


.. image:: https://img.shields.io/pypi/v/sketchtml.svg
        :target: https://pypi.python.org/pypi/sketchtml

.. image:: https://img.shields.io/travis/redapple/sketchtml.svg
        :target: https://travis-ci.org/redapple/sketchtml

.. image:: https://readthedocs.org/projects/sketchtml/badge/?version=latest
        :target: https://sketchtml.readthedocs.io/en/latest/?badge=latest
        :alt: Documentation Status

.. image:: https://pyup.io/repos/github/redapple/sketchtml/shield.svg
     :target: https://pyup.io/repos/github/redapple/sketchtml/
     :alt: Updates


Helper library to experiment with HTML fingerprinting.


* Free software: MIT license
* Documentation: https://sketchtml.readthedocs.io.


Features
--------

* TODO

References
----------

* `Locality Sensitive Hashing for Scalable Structural Classification and Clustering of Web Documents (2013)
  <https://www.researchgate.net/publication/256004161_Locality_Sensitive_Hashing_for_Scalable_Structural_Classification_and_Clustering_of_Web_Documents>`__
* `Enforcing k-anonymity in Web Mail Auditing -- Mail-Hash (2016) <http://dl.acm.org/citation.cfm?id=2835803>`__
  (`patent <http://www.freepatentsonline.com/y2017/0169251.html>`__)
* `Structural Clustering of Machine-Generated Mail (2016) <http://dl.acm.org/citation.cfm?id=2983350>`__
* `Web-Scale Information Extraction with Vertex (2011) <http://dl.acm.org/citation.cfm?id=2005642>`__

Credits
---------

This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.

.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage



=======
History
=======

0.0.2 (2017-06-23)
------------------

* Add lxml.etre.iterparse-based tag sequence iterator
* Add implementation for `MailHash`_
* Add implementation for `Stripped-XPath-lists`_

.. _MailHash: http://dl.acm.org/citation.cfm?id=2835803
.. _Stripped-XPath-lists: http://dl.acm.org/citation.cfm?id=2983350


0.0.1 (2017-06-19)
------------------

* Hachenberg & Gottron HTML tag fingerprint based on LZW


