Metadata-Version: 2.2
Name: HoWDe
Version: 0.1.0
Summary: A package for detecting home and work locations from individual timestamped sequences of stop locations.
Home-page: https://github.com/LLucchini/HoWDe
Author: Silvia De Sojo Caso, Lorenzo Lucchini, Laura Alessandretti
Author-email: sdesojoc@gmail.com, lorenzo.f.lucchini.work@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: numpy>=1.26.0
Requires-Dist: pandas>=2.2.0
Requires-Dist: python-dateutil>=2.9.0
Requires-Dist: tqdm>=4.67.0
Requires-Dist: pyspark>=3.5.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# HoWDe

**HoWDe** (Home and Work Detection) is a Python package designed to identify home and work locations from individual timestamped sequences of stop locations. It processes stop location data to label each location as 'Home', 'Work', or 'None' based on user-defined parameters and heuristics.

## Features

- Processes stop location datasets to detect home and work locations.
- Allows customization through various parameters to fine-tune detection heuristics.
- Supports batch processing with multiple parameter configurations.
- Outputs results as a PySpark DataFrame for seamless integration with big data workflows.

## Installation

To install HoWDe, ensure you have Python 3.6 or later and PySpark installed. You can then install the package using pip:

```bash
pip install HoWDe
```

## Usage

The core function of the HoWDe package is `HoWDe_labelling`, which performs the detection of home and work locations.

### `HoWDe_labelling` Function

```python
def HoWDe_labelling(
    input_data=None,
    spark=None,
    HW_PATH='./',
    SAVE_PATH=None,
    SAVE_NAME='',
    save_multiple=False,
    edit_config_default=None,
    range_window=42,
    bnd_none_day=6,
    bnd_none_home=0.4,
    bnd_none_work=0.8,
    range_freq_home=0.2,
    range_freq_work_h=0.2,
    range_freq_work_d=0.2,
    stops_output=True,
    verbose=False,
    driver_memory=250
):
    """
    Perform Home and Work Detection (HoWDe)
    """
```

#### Parameters

- `input_data` (PySpark DataFrame, default=None): Preloaded data containing all mandatory fields. If not provided, data will be loaded from the `HW_PATH` directory.
- `spark` (PySpark SparkSession, default=None): Spark session used to load the `input_data`. Mandatory if `input_data` is provided.
- `HW_PATH` (str, default='./'): Path to the stop location data in `.parquet` format.
- `SAVE_PATH` (str, default=None): Path where the labeled results should be saved. If not provided, the function returns the labeled DataFrame.
- `SAVE_NAME` (str, default=''): Name of the output file. Used as a suffix if `save_multiple` is True.
- `save_multiple` (bool, default=False): If True, saves multiple output files for each combination of parameters. Requires `SAVE_NAME` to be specified.
- `edit_config_default` (dict, default=None): Dictionary to override default configuration settings.
- `range_window` (float or list, default=42): Size of the window used to detect home and work locations. Can be a list to explore multiple values.
- `bnd_none_day` (float or list, default=6): Minimum hours of data required in a day. Can be a list to explore multiple values.
- `bnd_none_home` (float or list, default=0.4): Minimum ratio of presence required at a location to label it as 'Home'. Can be a list to explore multiple values.
- `bnd_none_work` (float or list, default=0.8): Minimum ratio of presence required at a location to label it as 'Work'. Can be a list to explore multiple values.
- `range_freq_home` (float or list, default=0.2): Minimum frequency of visits within the window for a location to be considered 'Home'. Can be a list to explore multiple values.
- `range_freq_work_h` (float or list, default=0.2): Minimum frequency of visits within work hours for a location to be considered 'Work'. Can be a list to explore multiple values.
- `range_freq_work_d` (float or list, default=0.2): Minimum fraction of days with visits within the window for a location to be considered 'Work'. Can be a list to explore multiple values.
- `stops_output` (bool, default=True): If True, outputs results with stops split within day limits and an additional `location_type` column. If False, outputs a condensed DataFrame with only changes in detected home and work locations.
- `verbose` (bool, default=False): If True, reports processing steps.
- `driver_memory` (float, default=250): Driver memory allocation for the Spark session.

#### Returns

- A PySpark DataFrame with an additional column `location_type` indicating the detected location type ('H' for Home, 'W' for Work, or None).

## Example Usage

### Example 1: Providing Pre-loaded Data and Spark Session

```python
from pyspark.sql import SparkSession
from howde import HoWDe_labelling

# Initialize Spark session
spark = SparkSession.builder.appName('HoWDeApp').getOrCreate()

# Load your stop location data
input_data = spark.read.parquet('path_to_your_data.parquet')

# Run HoWDe labelling
labeled_data = HoWDe_labelling(
    input_data=input_data,
    spark=spark,
    range_window=42,
    bnd_none_day=6,
    bnd_none_home=0.4,
    bnd_none_work=0.8,
    range_freq_home=0.2,
    range_freq_work_h=0.2,
    range_freq_work_d=0.2,
    stops_output=True,
    verbose=True
)

# Show the results
labeled_data.show()
```

### Example 2: Self-contained Usage

```python
from howde import HoWDe_labelling

# Define path to your stop location data
HW_PATH = './'

# Run HoWDe labelling
labeled_data = HoWDe_labelling(
    HW_PATH=HW_PATH,
    range_window=42,
    bnd_none_day=6,
    bnd_none_home=0.4,
    bnd_none_work=0.8,
    range_freq_home=0.2,
    range_freq_work_h=0.2,
    range_freq_work_d=0.2,
    stops_output=True,
    verbose=True
)

# Show the results
labeled_data.show()
```

## License

This project is licensed under the MIT License. See the [License file](https://opensource.org/licenses/MIT) for details.
