Metadata-Version: 2.1
Name: record-batcher
Version: 0.1.0
Summary: Batch records based on size constraints
License: MIT
Author: Krystian Jakusik
Author-email: krystian.jakusik@gmail.com
Requires-Python: >=3.7,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Description-Content-Type: text/markdown

# Record batcher
This is a simple record batcher that takes a list of records and batches them into lists of records. The batcher is implemented in Python and is available as a Python package.

## Installation
The package can be installed using pip:
```bash
pip install record-batcher
```

## Usage
The package provides a single class `RecordBatcher` which takes the following parameters:
- `max_record_size`: the maximum size of a record in bytes (int, default: 1MB),
- `max_batch_size:`: the maximum size of a batch in bytes (int, default: 5MB),
- `max_records_per_batch`: the maximum number of records per batch (int, default: 500),
- `include_list_object_in_batch_size: bool = False`: whether to include the size of the list object in the batch size (default: False),
- `include_garbage_collector_overhead: bool = False`: whether to include the garbage collector overhead in any object size (default: False).

### batch_records
The main method of the class is `batch_records` which takes a list of records and returns a list of batches. The method takes only one parameter:
- `records`: a list of records to be batched (list of strings),

The method returns a list of batches where each batch is a list of records. The method batches the records in such a way that the size of each batch does not exceed the maximum batch size and the number of records in each batch does not exceed the maximum number of records per batch. The method also ensures that the size of each record does not exceed the maximum record size and if it does, the record is discarded.

### generate_records
The method `generate_records` generates a list of records of specified size. The method takes the following parameters:
- `num_records`: the number of records to generate (int, default: 1),
- `record_sizes`: the sizes of the records to generate (list of ints OR None, default: None),
- `random_sizes`: whether to generate random sizes for the records of max size equal to `record_size` (bool, default: False),
- `record_size`: the size of each record in bytes (int, default: 1024). If `random_sizes` is set to True, this is the maximum size of the records

The method returns a list of records where each record is a string of the specified size. If `record_sizes` is provided, the method generates records of the specified sizes. If `random_sizes` is set to True, it generates records of random sizes between empty string size and `record_size`. If `record_sizes` is None and `random_sizes` is False, the method generates records of size `record_size`.

### Example
This example generates 10 records of random sizes between empty string size and 64 bytes and batches them into lists of records where each batch does not exceed 256 bytes and contains at most 10 records.
```python
from record_batcher import RecordBatcher


record_batcher = RecordBatcher(max_record_size=64, max_batch_size=160, max_records_per_batch=10)

records = record_batcher.generate_records(num_records=10, random_sizes=True, record_size=64)
batches = record_batcher.batch_records(records)

print(f"Number of batches: {len(batches)}\n")
for batch in batches:
    print(
        f"Number of records in batch: {len(batch)}\n"
        f"Records: {batch}\n"
    )

```
