Metadata-Version: 2.1
Name: data-governance-checkup
Version: 0.1.2
Summary: A library to simplify and enhance data governance and compliance processes
Home-page: https://github.com/pratiklahudkar/data-governance
Author: Pratik P. Lahudkar
Author-email: pratik.lahudkar@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown

## Data Governance Library

The **Data Governance Library** is a Python package designed to help organizations maintain data compliance, security, and governance standards. With features tailored to address common regulatory frameworks like GDPR, HIPAA, ISO 27001, and more, this library provides tools for automated compliance checks, metadata management, data lineage tracking, and role-based access control auditing.

---

### Features

1. **Automated Compliance Checks**
   - Support for frameworks like GDPR, HIPAA, ISO 27001, and CCPA.
   - Customizable rules for new compliance standards.

2. **Data Lineage Tracking**
   - Monitor data flow and identify its origin, transformation, and destination.

3. **Role-Based Access Control (RBAC) Auditing**
   - Ensure that data access policies are adhered to.

4. **Metadata Management and Cataloging**
   - Store, manage, and query metadata associated with your datasets.

5. **Data Masking and Anonymization**
   - Protect sensitive data with masking and anonymization techniques.

---

### Installation

Install the library via pip:

```bash
pip install data_governance_checkup
```

---

### Usage

#### Example: Running Compliance Checks

```python
from data_governance_checkup.compliance import ComplianceFrameworkValidator

# Initialize the compliance framework validator
validator = ComplianceFrameworkValidator()

# GDPR Data Example
gdpr_data = {"name": "John Doe", "email": "john.doe@example.com"}
print("GDPR Compliance:", validator.validate("GDPR", gdpr_data))

# HIPAA Data Example
hipaa_data = {
    "security_log": [
        {"user_id": "12345", "access_time": "2025-01-14T12:00:00Z"},
        {"user_id": "67890", "access_time": "2025-01-14T13:00:00Z"}
    ]
}
print("HIPAA Compliance:", validator.validate("HIPAA", hipaa_data))

# CCPA Data Example
ccpa_opt_out_data = {"data_sales_opt_out": True}
ccpa_deletion_log = [
    {"request_id": "1", "status": "completed"},
    {"request_id": "2", "status": "completed"}
]
# Validate CCPA opt-out compliance
print("CCPA Compliance (Opt-Out):", validator.validate("CCPA", ccpa_opt_out_data, validation_type="opt_out"))
# Validate CCPA deletion request compliance
print("CCPA Compliance (Deletion Requests):", validator.validate("CCPA", ccpa_deletion_log, validation_type="deletion_requests"))

# ISO27001 Data Example (Fixing structure)
iso27001_access_logs = [
    {"user_id": "12345", "access_time": "2025-01-14T12:00:00Z"},
    {"user_id": "67890", "access_time": "2025-01-14T13:00:00Z"}
]
iso27001_risk_assessment_report = [
    {"risk_id": "R1", "description": "Risk 1", "mitigation_plan": "Plan A"},
    {"risk_id": "R2", "description": "Risk 2", "mitigation_plan": "Plan B"}
]
print("ISO27001 Compliance (Access Control):", validator.validate("ISO27001", iso27001_access_logs, validation_type="access_control"))
print("ISO27001 Compliance (Risk Assessment):", validator.validate("ISO27001", iso27001_risk_assessment_report, validation_type="risk_assessment"))
```
---

#### Data Lineage Tracking

```python
import json
from data_governance_checkup.lineage.lineage import DataLineageTracker

# Initialize the lineage tracker
lineage_tracker = DataLineageTracker()

# Sample dataset
sample_data = [
    {"order_id": 1, "customer_id": 101, "status": "completed", "amount": 250},
    {"order_id": 2, "customer_id": 102, "status": "pending", "amount": 300},
    {"order_id": 3, "customer_id": 101, "status": "completed", "amount": 400},
    {"order_id": 4, "customer_id": 103, "status": "cancelled", "amount": 150},
]

customer_data = [
    {"customer_id": 101, "name": "Alice"},
    {"customer_id": 102, "name": "Bob"},
    {"customer_id": 103, "name": "Charlie"},
]

# Step 1: Add source details
lineage_tracker.add_source(
    data_id="order_data",
    source_details={"type": "JSON", "description": "Order details JSON data"}
)

# Step 2: Apply a transformation - Filter completed orders
filtered_data = [row for row in sample_data if row["status"] == "completed"]
lineage_tracker.add_transformation(
    data_id="order_data",
    transformation="Filtered rows where 'status' = 'completed'"
)

# Step 3: Apply a transformation - Join with customer data
joined_data = [
    {
        "order_id": row["order_id"],
        "customer_name": next(
            (customer["name"] for customer in customer_data if customer["customer_id"] == row["customer_id"]), 
            None
        ),
        "amount": row["amount"]
    }
    for row in filtered_data
]
lineage_tracker.add_transformation(
    data_id="order_data",
    transformation="Joined with 'customer_data' on 'customer_id'"
)

# Step 4: Set destination
lineage_tracker.set_destination(
    data_id="order_data",
    destination_details={"type": "JSON", "description": "Processed data"}
)

# Print the processed data
print("Processed Data:")
print(json.dumps(joined_data, indent=4))

# Retrieve and print the data lineage
lineage = lineage_tracker.get_lineage("order_data")
print("\nData Lineage:")
print(json.dumps(lineage, indent=4))

# Export lineage to a file
lineage_tracker.export_lineage("lineage_output.json")
print("\nLineage data exported to 'lineage_output.json'")
```

---

#### Metadata Management

```python
from data_governance_checkup.metadata import MetadataManager

# Initialize the metadata manager
metadata_manager = MetadataManager()

# Add metadata
metadata_manager.add_metadata("resource_1", {"owner": "Alice", "created_at": "2025-01-14"})
metadata_manager.add_metadata("resource_2", {"owner": "Bob", "created_at": "2025-01-13"})

# Update metadata
metadata_manager.update_metadata("resource_1", {"last_accessed": "2025-01-14"})

# Retrieve metadata
print("Metadata for resource_1:", metadata_manager.get_metadata("resource_1"))

# List all metadata
print("All metadata:", metadata_manager.list_all_metadata())

# Save metadata to a file
metadata_manager.save_metadata_to_file("metadata.json")

# Load metadata from a file
metadata_manager.load_metadata_from_file("metadata.json")
print("Metadata after loading from file:", metadata_manager.list_all_metadata())
```

---

#### Role-Based Access Control (RBAC) Auditing

```python
from data_governance_checkup.rbac import RBACManager

# Initialize RBAC Manager
rbac = RBACManager()

# Create roles
rbac.create_role("Admin")
rbac.create_role("Editor")
rbac.create_role("Viewer")

# Assign permissions to roles
rbac.assign_permission_to_role("Admin", "delete_data")
rbac.assign_permission_to_role("Admin", "edit_data")
rbac.assign_permission_to_role("Editor", "edit_data")
rbac.assign_permission_to_role("Viewer", "view_data")

# Create users
rbac.create_user("alice")
rbac.create_user("bob")

# Assign roles to users
rbac.assign_role_to_user("alice", "Admin")
rbac.assign_role_to_user("bob", "Viewer")

# Check permissions
print(f"Alice has 'edit_data' permission: {rbac.has_permission('alice', 'edit_data')}")
print(f"Bob has 'delete_data' permission: {rbac.has_permission('bob', 'delete_data')}")

# Get all permissions for a user
print(f"Alice's permissions: {rbac.get_user_permissions('alice')}")
print(f"Bob's permissions: {rbac.get_user_permissions('bob')}")

# Revoke a role
rbac.revoke_role_from_user("alice", "Admin")
print(f"Alice's permissions after revoking Admin role: {rbac.get_user_permissions('alice')}")

```

---

#### Data Masking and Anonymization

```python
from data_governance_checkup.masking import DataMasking

# Example usage
data = {
    "name": "John Doe",
    "email": "john.doe@example.com",
    "ssn": "123-45-6789",
}

mask_fields = ["email", "ssn"]
masker = DataMasking()
masked_data = masker.mask_data(data, mask_fields)

print("Original Data:", data)
print("Masked Data:", masked_data)

```

---

### Contributing

Contributions are welcome! Please submit pull requests or open issues for any enhancements, bugs, or additional compliance frameworks you'd like to see.

---

### License

This project is licensed under the MIT License. See the LICENSE file for more details.

---

### Roadmap

1. Add support for more compliance standards (e.g., SOC 2, PCI DSS).
2. Build visualization dashboards for compliance status.
3. Integrate with real-time data pipelines for live compliance checks.

---

### Contact

For questions or support, please contact pratik.lahudkar@gmail or open an issue on the GitHub repository.
