Metadata-Version: 2.4
Name: ADFMentor
Version: 0.3.1
Summary: Parse Azure Data Factory project files and evaluate them using AI models.
Author-email: Qobiljon Xayrullayev <qobiljonkhayrullayev@gmail.com>
License: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: google-genai>=1.0.0
Requires-Dist: python-dotenv>=1.0.0
Dynamic: license-file

ADFMentor
=========

A Python package for parsing Azure Data Factory (ADF) project files and evaluating them using AI models like Google Gemini.

## Features

- 🏗️ **ADF Processing**: Parse ADF pipeline, dataset, linked service, trigger, and dataflow JSON files
- **Pipeline** — Parse ADF pipeline JSON files (activities, dependencies, parameters)
- **Written** — Evaluate text-based answers about ADF concepts
- 📝 **Detailed Reports**: Generate comprehensive grading reports from ADF project structures
- 📦 **ZIP Support**: Automatically handles ZIP file submissions — no manual extraction needed
- 🗂️ **Flexible Inputs**: Accepts a directory, a ZIP, or a single file (`.json`, `.txt`)
- 🔍 **Auto File Discovery**: Locates ADF resource folders (`pipeline/`, `dataset/`, `linkedService/`, etc.)
- ⚠️ **Graceful Missing-File Scoring**: Missing required files yield a `0` score and clear feedback
- 🧩 **Lesson Question Parser**: Parse structured lesson text into `pipeline` and `text` question blocks
- 🔧 **Easy Integration**: Simple API for evaluating student assignments and projects

## Installation

### From Source

```bash
git clone https://github.com/yourusername/ADFMentor.git
cd ADFMentor
pip install -e .
```

### Using pip (when published)

```bash
pip install ADFMentor
```

## Quick Start

```python
from ADFmentor import ADFMentor

# Initialize with your Gemini API key
mentor = ADFMentor(api_key="your-api-key")

# Evaluate a full submission (pipeline + written)
# Works with directories, ZIP files, or single files (.json/.txt)
questions = """
PIPELINE: Create an ADF pipeline to copy data from Blob Storage to SQL Database
TEXT: Explain your pipeline design choices
"""

prompts = {
    "pipeline": "Evaluate pipeline structure, activities, and best practices",
    "text": "Evaluate clarity and reasoning",
}

result = mentor.evaluate_all(
    answer_path="path/to/submission/",  # or "submission.zip"
    questions=questions,
    prompts=prompts,
)

print(f"Score: {result['score']}/100")
print(f"Feedback:\n{result['feedback']}")
```

## Package Structure

```
ADFMentor/
├── __init__.py           # Main package entry point
├── core.py               # ADFMentor class with evaluation methods
├── models/               # AI model wrappers
│   ├── __init__.py
│   ├── model.py          # Abstract base model
│   └── gemini.py         # Google Gemini implementation
└── utils/                # Utility functions
    ├── __init__.py
    ├── processor.py      # ADF JSON parsing and report generation
    ├── checker.py        # File discovery helpers
    ├── extractor.py      # ZIP extraction utilities
    └── question_parser.py # Lesson question parser helpers
```

## Core Components

### ADFMentor Class

The main class provides a single evaluation method:

- **`evaluate_all(answer_path, questions, prompts)`**: Evaluates pipeline structure and written answers together and returns an overall score and combined feedback

Notes:
- `answer_path` can be a directory, ZIP file, or a single submission file (`.json`, `.txt`).
- `questions` and `prompts` must include `pipeline` and `text` keys. A section is skipped if its question is set to `None`.

### ADF Processor

`ADFMentor.utils.processor` provides functions for processing ADF projects:

#### `parse_adf_json(json_path)`
Reads and parses a single ADF resource JSON file.

#### `discover_adf_resources(directory)`
Scans a directory for ADF resource folders:
- `pipeline/` — Pipeline definitions
- `dataset/` — Dataset definitions
- `linkedService/` — Linked service definitions
- `trigger/` — Trigger definitions
- `dataflow/` — Dataflow definitions

#### `extract_grading_info(resources)`
Extracts key elements for grading:
- Pipelines: activities, dependencies, parameters, variables
- Datasets: type, linked service reference, schema, location
- Linked Services: type, connection details (sanitized)
- Triggers: type, schedule, pipeline references
- Dataflows: sources, sinks, transformations

#### `generate_grading_report(grading_info)`
Formats extracted information into a readable text report.

#### `analyze_adf(adf_path)`
Convenience function that chains all steps above.

### AI Models

#### Gemini Model
`ADFMentor.models.gemini.Gemini`

```python
from ADFmentor.models import Gemini

model = Gemini(api_key="your-api-key", model_name="gemini-2.0-flash-exp")

# Evaluate text-based answers
result = model.evaluate(
    question="What are ADF linked services?",
    answer="Linked services are connection strings...",
    prompt="Evaluate for accuracy and completeness"
)
```

**Response Format:**
```json
{
  "score": 85,
  "feedback": "Strong implementation with minor issues..."
}
```

## Configuration

### API Key Setup

Create a `.env` file in your project root:

```env
API_KEY=your_gemini_api_key_here
```

Load it in your code:

```python
from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.getenv("API_KEY")
```

## Detailed Usage Examples

### 1. Analyze an ADF Project

```python
from ADFmentor.utils import analyze_adf

# Generate a detailed report from an ADF project directory
report = analyze_adf("path/to/adf-project/")
print(report)
```

**Sample Output:**
```
============================================================
AZURE DATA FACTORY PROJECT REPORT
============================================================

PIPELINES:
  - CopyBlobToSQL
    Parameters: inputPath, outputTable
    Activities (3):
      • LookupSource (type: Lookup)
      • CopyData (type: Copy)
        depends on: LookupSource [Succeeded]
        source: BlobSource
        sink: SqlSink
      • StoredProcedure (type: SqlServerStoredProcedure)
        depends on: CopyData [Succeeded]

DATASETS:
  - BlobInput (type: DelimitedText)
    linked service: AzureBlobStorage
    location: type: AzureBlobStorageLocation, folder: input
  - SqlOutput (type: AzureSqlTable)
    linked service: AzureSqlDatabase
    table: dbo.SalesData

LINKED SERVICES:
  - AzureBlobStorage (type: AzureBlobStorage)
  - AzureSqlDatabase (type: AzureSqlDatabase)

TRIGGERS:
  - DailyTrigger (type: ScheduleTrigger)
    schedule: every 1 Day
    pipelines: CopyBlobToSQL

DATAFLOWS:
  none

SUMMARY:
  - total_pipelines: 1
  - total_activities: 3
  - total_datasets: 2
  - total_linked_services: 2
  - total_triggers: 1
  - total_dataflows: 0
```

### 2. Complete Evaluation Pipeline

```python
from ADFmentor import ADFMentor

mentor = ADFMentor(api_key="your-api-key")

# Define questions and prompts for each evaluation type
questions = {
    "pipeline": "Create a pipeline to copy data from Blob to SQL with error handling",
    "text": "Explain your pipeline design choices"
}

prompts = {
    "pipeline": "Evaluate pipeline structure, activities, error handling, and best practices",
    "text": "Evaluate clarity, justification, and understanding"
}

# Evaluate all aspects
result = mentor.evaluate_all(
    answer_path="path/to/student/submission/",
    questions=questions,
    prompts=prompts
)

print(f"Overall Score: {result['score']}/100")
print(f"Feedback:\n{result['feedback']}")
```

### 3. Parse Lesson Questions

If your lesson content uses codes like `"TEXT001"`, `"PIPELINE002"`, you can parse it into question blocks:

```python
from ADFmentor.utils.question_parser import parse_lesson_questions

lesson_text = """
"TEXT001"
1. Explain the purpose of linked services in ADF.

"PIPELINE002"
2. Create a pipeline to copy data from Blob Storage to SQL Database.
"""

questions = parse_lesson_questions(lesson_text)
# -> {"text": "1. ...", "pipeline": "2. ..."}

# Map to the evaluate_all() schema
questions = {
    "pipeline": questions["pipeline"],
    "text": questions["text"],
}
```

### 4. Skipping an Evaluation Section

To skip a section, set its question to `None` (the key must still exist):

```python
questions = {
    "pipeline": "Create a copy pipeline with parameterized paths",
    "text": None,
}

prompts = {
    "pipeline": "Evaluate pipeline structure and best practices",
    "text": "Evaluate clarity, justification, and understanding",
}
```

## ADF Project Structure

ADFMentor expects submissions to follow the standard ADF project structure:

```
adf-project/
├── pipeline/          # Pipeline JSON definitions
│   └── CopyPipeline.json
├── dataset/           # Dataset JSON definitions
│   ├── BlobInput.json
│   └── SqlOutput.json
├── linkedService/     # Linked service definitions
│   ├── BlobStorage.json
│   └── SqlDatabase.json
├── trigger/           # Trigger definitions
│   └── DailyTrigger.json
└── dataflow/          # Dataflow definitions (optional)
```

Each JSON file follows the standard ADF resource format with `name`, `type`, and `properties` fields.

## Development

### Running Tests

```bash
# Run integration tests
python tests/test.py
```

### Project Dependencies

Core:
- `google-genai>=1.0.0` - Google Gemini API client
- `python-dotenv>=1.0.0` - Environment variable management

Optional:
- `google-cloud-aiplatform>=1.0.0` - For Vertex AI support

## Requirements

- Python 3.9 or higher
- Google Gemini API key (get one at [Google AI Studio](https://makersuite.google.com/app/apikey))

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

MIT License - see LICENSE file for details
