Metadata-Version: 2.1
Name: file_genie
Version: 0.0.2
Summary: File Genie is designed to parse various file types and transform them according to provided configuration
Author: Dinesh Lakhara
Author-email: dinesh.lakhara@cashfree.com
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: boto3==1.12.42
Requires-Dist: botocore==1.15.49
Requires-Dist: pandas<=2.2.3,>=2.0.0
Requires-Dist: mt-940==4.23.0
Requires-Dist: xlrd==2.0.1
Requires-Dist: openpyxl==3.1.2
Requires-Dist: s3fs==0.4.2
Requires-Dist: s3transfer==0.3.3
Requires-Dist: python-dateutil==2.8.2
Requires-Dist: pytz==2020.1
Requires-Dist: json-logging==1.2.0
Requires-Dist: pyzipper==0.3.6
Requires-Dist: lxml==5.2.2
Requires-Dist: tabula-py==2.1.1

## FileGenie SDK
FileGenie SDK is a Python library designed to simplify parsing files from AWS S3 in various formats (e.g., TEXT, CSV, EXCEL, ZIP, XML, PDF) and transforming the data using user-defined functions into desired output formats. By providing file parsing configurations and custom transformation logic, this library effortlessly processes and provide the output as needed.

### Features
- **Multi-format Support:** Effortlessly parse files in formats such as TEXT, CSV, EXCEL, ZIP, XML, and PDF directly from AWS S3.
- **Flexible Response Types:** Generate responses tailored to user needs, including DATAFRAME, JSON, or FILE outputs.
- **Password-Protected Files:** Seamlessly parse files secured with passwords.
- **Custom Edge Case Handling:** Apply user-defined custom functions to manage specific parsing and transformation needs, including data sanitization, value conversions, or reformatting date fields for consistency.
AWS S3 Integration: Fetch files directly from AWS S3 buckets using IAM roles for secure access.
Streamlined Configuration: Set up easily with minimal configuration, eliminating the need of writing parser for specific file type.

### Installation
Install the SDK using pip:
```
pip install file_genie
```

### Prerequisites
- **Your application should be deployed on AWS EKS to enable the SDK to utilize AWS S3 credentials.**
- **Python:** >= '3.6'
- **Pandas:** '2.0.0'

### Getting Started
- **Define Custom Edge Cases:**
Let's say you need to sanitize columns (e.g., standardise column values to a common format before applying custom logic) during file parsing, you can define custom functions for the SDK to use.

To implement this:

- Create an edgeCases folder in your project.
- Add a file named user_edge_cases.py.
- Define your custom functions in this file.
- Reference these functions in the edge_case section of the file_config.
- The SDK will automatically import and apply these functions during file parsing or transformation.

```
from edgeCases import user_edge_cases
self.edge_cases = user_edge_cases
```

- **Define the configuration required for file parsing logic and S3 bucket names**
```
    s3_config: {
        upload_bucket: reconciliation-live
        download_bucket: reconciliation-live
    }
    file_config: {
        "file_source_1": {
            "read_from_s3_func":"read_complete_excel_file",
            "parameters_for_read_s3": None,
            "file_dtype":{
                "Order_Number": str,
                "Added On":str,
                "Added By":str
            },
            "columns_mapping": {
                <!-- "Column Name in file": "Column name required in output" -->
                "Transaction Type": "TransactionType",
                "Cust Name": "CustomerName",
                "Cust ID": "CustomerId",
                "Transaction Amount": "Amount",
                "OrderNumber": "TransactionReference",
                "Reference ID": "CustomerReferenceId",
                "Target Date": "TargetDate",
                "TransactionDate": "TransactionDate",
                "FeeAmount": "ServiceCharge",
                "TaxAmount": "ServiceTax",
                "NetAmount": "NetAmount"
            }
            "edge_case": {
                <!-- edge case function name which you have defined in user_edge_case.py : params required for that function
                there can be different type of params. For eg. - dict, list, str -->
                <!-- In this convert_amount_as_per_currency is the edge case function which you want to apply while transforming the entries and "Amount" is the param to this function where you will apply the currency conversion -->
                "convert_amount_as_per_currency": "Amount"
            }
        },
    }
```

- **Define a ParsedDataResponseType enum**
```
import enum
class ParsedDataResponseType(enum.Enum):
    DATAFRAME="DATAFRAME"
    FILE="FILE"
    JSON="JSON"
```

- **Import and initialise the file genie**
```
from file_genie import FileGenie

file_genie = FileGenie(config={s3_config: s3_config, file_config: file_config})
parsed_data = file_genie.parse("s3://your-bucket-name/path/to/your/file.csv", file_source, ParsedDataResponseType.DATAFRAME.value)
//By default SDK will provide response as DATAFRAME
```

