Metadata-Version: 2.4
Name: easymdm
Version: 0.1.2
Summary: easymdm is an open source mdm system, usefull for user data consolidation.
Author-email: ankit K <ankit48365@gmail.com>
License-File: LICENSE
Requires-Python: >=3.13
Requires-Dist: duckdb>=1.3.2
Requires-Dist: fuzzywuzzy>=0.18.0
Requires-Dist: networkx>=3.5
Requires-Dist: pandas>=2.3.1
Requires-Dist: pyaml>=25.7.0
Requires-Dist: python-levenshtein>=0.27.1
Requires-Dist: recordlinkage>=0.16
Description-Content-Type: text/markdown

![pylint](https://img.shields.io/badge/pylint-5.61-red)
[![PyPi Deployment](https://github.com/ankit48365/easymdm/actions/workflows/deployy_cicd.yml/badge.svg)](https://github.com/ankit48365/easymdm/actions/workflows/deployy_cicd.yml)

#### Prerequisite
Define a yaml file for configuration details like below, Need to pass its name and location to CLI as shown below

#### YAML Construct Readme

```
priority_rule:
  conditions:
    - column: priority_score
      value: 5  # Selects records with exactly priority_score = 5
    - column: confidence_level
      value: 100  # If multiple records have priority_score=5, picks those with confidence_level=100

survivorship:
  rules:
    - column: last_updated
      strategy: most_recent  
    - column: source_id
      strategy: source_priority 
      source_order: ["erp", "crm"]
    - column: address
      strategy: longest_string 
    - column: priority_score
      strategy: highest_value
    - column: confidence_level
      strategy: lowest_value
    - column: quality_rating
      strategy: greater_than_threshold
      threshold: 75
    - column: quality_rating
      strategy: less_than_threshold
```

#### Sample YAML
```
sqlite:
  - DB_PATH: 'D:\path	o\database\'
    DB_NAME: 'mydatabase.db'

blocking:
  columns:
    - first_name
    - last_name
similarity:
  - column: first_name
    method: jarowinkler
  - column: middle_name
    method: jarowinkler
  - column: last_name
    method: jarowinkler
  - column: address
    method: levenshtein
  - column: city
    method: jarowinkler
  - column: zip_code
    method: exact

thresholds:
  review: 0.6
  auto_merge: 0.8

survivorship:
  rules:
    - column: Last_Updated_On
      strategy: most_recent # longest_string

priority_rule:
  conditions:
    - column: original
      value: 1
    - column: Address
      value: *STREET*

```
### CLI Run

```
uv run roar --help

For flat file
> uv run roar --source file --name D:\path	o_your_file\123.csv --config D:\path	o_your_config
```

### Local Test Run

```
uv run .\src\easymdm\cli.py --source file --name .\sample\testdata.csv --config .\sample\testdata.yaml --outpath .\sample\
uv run .\src\easymdm\cli.py --source duckdb --name .\sample\testdata.csv --config .\sample\testdata.yaml --outpath .\sample\
```

### Pylint Action

```
    if: contains(github.event.head_commit.message, 'CheckCodeQuality')
```
