Metadata-Version: 2.1
Name: dejan
Version: 1.4
Summary: Machine learning utilities by DEJAN.
Home-page: https://github.com/dejanmarketing/dejan
Author: DEJAN
Author-email: enquiries@dejanmarketing.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: scikit-learn
Requires-Dist: transformers
Requires-Dist: torch
Requires-Dist: click

## Dejan: SEO Machine Learning Utilities

Dejan is a growing collection of SEO-related machine learning utilities designed to assist with various tasks in the field of search engine optimization. This repository will be continuously updated with new tools and features aimed at helping SEO professionals streamline their workflows using advanced ML techniques.

### Installation

You can install the package using pip:

```bash
pip install dejan
```

### Current Utilities

#### roo

**Purpose:** Fetches and processes data from the Algoroo API, providing insights into search engine fluctuations.

**Search Engine Options:**

* 2: Google.com (Desktop)
* 3: Google.com.au (Desktop)
* 4: Google.com (Mobile)
* 5: Google.com.au (Mobile)

**Output:** The data can be returned either as a raw JSON object or as a pandas DataFrame for further analysis.

**Example Usage:**

```python
from dejan import roo

def main():
    # Mapping of search engines to their corresponding identifiers
    search_engines = {
        2: "google.com/desktop",
        3: "google.com.au/desktop",
        4: "google.com/mobile",
        5: "google.com.au/mobile"
    }
    
    # Choose the search engine by setting the appropriate identifier
    search_engine = 2  # Change this number to select a different search engine:
                       # 2: google.com/desktop
                       # 3: google.com.au/desktop
                       # 4: google.com/mobile
                       # 5: google.com.au/mobile
    
    # Fetch data as a pandas DataFrame
    roo_data = roo.get_roo(search_engine, as_dataframe=True)
    
    # Display the first few rows of the DataFrame
    print(f"Data for search engine {search_engine} ({search_engines[search_engine]}):")
    print(roo_data.head())

if __name__ == "__main__":
    main()

```

#### linkbert

**Purpose:** Uses the LinkBERT model to predict link tokens in the provided text, useful for analyzing link placement within content.

**Grouping Modes:**

* `subtoken`: Returns individual subword tokens classified as links.
* `token`: Merges any subtokens into whole tokens (words).
* `phrase`: Groups predictions into phrases, treating the entire phrase as a link if any part of it is classified as a link.

**Example Usage:**

```python
from dejan import linkbert

def main():
    # Initialize the LinkBERTInference model
    model = linkbert.LinkBERTInference()

    # Sample text for prediction
    text = "LinkBERT is a model developed by Dejan Marketing designed to predict natural link placement within web content."

    print("Input Text:")
    print(text)
    print("-" * 50)

    # Group by subtoken
    links_subtoken = model.predict_link_tokens(text, group="subtoken")
    print(f"Predicted link tokens (subtoken): {links_subtoken}")

    # Group by token
    links_token = model.predict_link_tokens(text, group="token")
    print(f"Predicted link tokens (token): {links_token}")

    # Group by phrase
    links_phrase = model.predict_link_tokens(text, group="phrase")
    print(f"Predicted link tokens (phrase): {links_phrase}")

if __name__ == "__main__":
    main()
```

#### turboquant

**Purpose:** Compresses dense vector embeddings to 3.5-15.5x smaller using hybrid residual quantization, while preserving 82-99.8% retrieval recall. Based on Google's TurboQuant (arXiv:2504.19874).

**Presets:**

| Preset | Method | Recall | Compression |
|--------|--------|--------|-------------|
| quality | N4+N4+TQ1 | ~0.998 | 3.5x |
| balanced | N4+TQ1 | ~0.963 | 6.4x |
| compact | TQ1+TQ1 | ~0.820 | 15.5x |

**Example Usage:**

```python
from dejan import turboquant

# Compress
tq = turboquant.TurboQuant(preset="balanced")
compressed = tq.compress(embeddings)  # (n, d) float32 numpy array
tq.save(compressed, "corpus.tq")

# Load and search
tq, compressed = turboquant.TurboQuant.load("corpus.tq")
indices, scores = tq.search(queries, compressed, k=10)

# Perfect recall with rescore (requires original fp32 embeddings)
indices, scores = tq.search(queries, compressed, k=10,
                             rescore_from=embeddings, rescore_k=20)
```

**CLI Usage:**

```bash
dejan turboquant embeddings.npy
dejan turboquant embeddings.csv --preset quality
dejan turboquant embeddings.npy -o corpus.tq --preset compact
```

