Metadata-Version: 2.1
Name: smart-match
Version: 0.1.0
Summary: A smart match package
Home-page: https://github.com/jiayingwang/smart_match
Author: Jiaying Wang
Author-email: jiaying@sjzu.edu.cn
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown

# Introduction

The smart-match module contains functions for calculating strings/sets similarity.

## Concept

1. __similarity__:
A value in a range of [0, 1], which represents how similar the two strings are. 
The larger the value, the more similar the two strings are.

2. __dissimilarity__:
A value in a range of [0, 1], which represents how dissimilar the two strings are. 
The larger the value, the more dissimilar the two strings are.
For a pair of strings, similarity = 1 - dissimilarity

3. __distance__:
How far the two strings are. Notice that not all the methods support distance method.

4. __score__
The larger the score, the more similar the two strings are. Notice not all the methods have score method.

We support three levels of string matching.

1. __char__:
Similarity computation based on characters in the strings.

2. __term__:
Similarity computation based on terms in the strings.

3. __gram__:
Similarity computation based on q-grams in the strings.


## Methods

We support the following methods.

Abbreviation | Full name | similarity | dissimilarity | distance | score
-------------|-----------|------------|---------------|----------|------
LE(Default) | Levenshtein |     ✅   |    ✅        |  ✅  | ❌
ED  | EuclideanDistance   |     ✅   |    ✅        |  ✅  | ❌
DL  | Damerau Levenshtein |     ✅   |    ✅        |  ✅  | ❌
BD  |    Block Distance   |     ✅   |    ✅        |  ✅  | ❌
cos  | Cosine Similarity |     ✅   |    ✅        |  ❌ | ❌
TC | TanimotoCoefficient | ✅ | ✅ | ❌ | ❌
dice | Dice Similarity |     ✅   |    ✅        |  ❌ | ❌
simon | SimonWhite | ✅ | ✅ | ❌ | ❌
LCST | LongestCommonSubstring | ✅ | ✅ | ✅ | ✅
LCSQ | LongestCommonSubSequence | ✅ | ✅ | ✅ | ✅
OC | OverlapCoefficient | ✅ | ✅ | ❌ | ❌
GOC | GeneralizedOverlapCoefficient | ✅ | ✅ | ❌ | ❌
jac  | Jaccard     |  ✅ | ✅ | ❌ | ❌
gjac | GeneralizedJaccard | ✅ | ✅ | ❌ | ❌
HD | HammingDistance | ✅ | ✅ | ✅ | ❌
jaro | Jaro | ✅ | ✅ | ❌ | ❌
JW | JaroWinkler | ✅ | ✅ | ❌ | ❌
NW | NeedlemanWunch | ✅ | ✅ | ❌ | ✅
SW | SmithWaterman | ✅ | ✅ | ❌ | ✅
SWG | SmithWatermanGotoh | ✅ | ✅ | ❌ | ✅
MK   | MongeElkan  |  ✅ | ✅ | ❌ | ❌


# Installation

```shell
pip install smart-match
```

# Usage

```python
import smart_match
print(smart_match.similarity('hello', 'hero'))
print(smart_match.dissimilarity('hello', 'hero'))
print(smart_match.distance('hello', 'hero'))
```
Output:
```shell
0.6
0.4
2
```

Check [Wiki](https://github.com/jiayingwang/smart-match/wiki) for more details.

# License

smart-match is a free software. See the file LICENSE for the full text.

# Authors

![qrcode_for_wechat_official_account](https://wx3.sinaimg.cn/mw1024/bdb7558bly1gjo23b3jrmj207607674r.jpg)



