Metadata-Version: 2.1
Name: senderstats
Version: 1.2.3
Summary: Tool to Process Smart Search Results and Identify Top Senders
License: MIT
Project-URL: repository, https://github.com/pfptcommunity/senderstats
Requires-Python: >3.9
Description-Content-Type: text/markdown
Requires-Dist: xlsxwriter
Requires-Dist: tldextract

# Proofpoint Sender Analyzer

This tool helps identify the top senders based on smart search outbound message exports or CSV data.

### Requirements:

* Python 3.9+

### Installing the Package

You can install the tool using the following command directly from Github.

```
pip install git+https://github.com/pfptcommunity/senderstats.git
```

or can install the tool using pip.

```
pip install senderstats
```

### Use Cases:

**Outbound message volumes and data transferred by:**

* Envelope sender
* Header From:
* Return-Path:
* Envelope header: From:, MessageID Host, MessageID Domain (helpful to identify original sender)
* Envelope sender and header From: for SPF alignment purposes
* Random subject line sampling to help understand the type of traffic

**Summarize message volume information:**

* Estimated application email traffic based on sender volume threshold:
    * Estimated application data
    * Estimated application messages
    * Estimated application average size
    * Estimated application peak hourly volume
* Total outbound data
    * Total outbound data
    * Total outbound messages
    * Total outbound average size
    * Total outbound peak hourly volume

### Usage Options:

```
usage: senderstats [-h] -i <file> [<file> ...] -o <xlsx> [--mfrom MFrom] [--hfrom HFrom] [--rpath RPath]
                   [--msgid MsgID] [--subject Subject] [--size MsgSz] [--date Date] [--gen-hfrom] [--gen-rpath]
                   [--gen-alignment] [--gen-msgid] [-t N] [--no-display-name] [--remove-prvs] [--decode-srs]
                   [--no-empty-hfrom] [--sample-subject] [--excluded-domains <domain> [<domain> ...]]
                   [--restrict-domains <domain> [<domain> ...]] [--excluded-senders <sender> [<sender> ...]]
                   [--date-format DateFmt] [--show-skip-detail]

This tool helps identify the top senders based on smart search outbound message exports.

Input / Output arguments (required):
  -i <file> [<file> ...], --input <file> [<file> ...]  Smart search files to read.
  -o <xlsx>, --output <xlsx>                           Output file

Field mapping arguments (optional):
  --mfrom MFrom                                        CSV field of the envelope sender address. (default=Sender)
  --hfrom HFrom                                        CSV field of the header From: address. (default=Header_From)
  --rpath RPath                                        CSV field of the Return-Path: address. (default=Header_Return-
                                                       Path)
  --msgid MsgID                                        CSV field of the message ID. (default=Message_ID)
  --subject Subject                                    CSV field of the Subject, only used if --sample-subject is
                                                       specified. (default=Subject)
  --size MsgSz                                         CSV field of message size. (default=Message_Size)
  --date Date                                          CSV field of message date/time. (default=Date)

Reporting control arguments (optional):
  --gen-hfrom                                          Generate report showing the header From: data for messages
                                                       being sent.
  --gen-rpath                                          Generate report showing return path for messages being sent.
  --gen-alignment                                      Generate report showing envelope sender and header From:
                                                       alignment
  --gen-msgid                                          Generate report showing parsed Message ID. Helps determine the
                                                       sending system
  -t N, --threshold N                                  Adjust summary report threshold for messages per day to be
                                                       considered application traffic. (default=100)

Parsing behavior arguments (optional):
  --no-display-name                                    Remove display and use address only. Converts 'Display Name
                                                       <user@domain.com>' to 'user@domain.com'
  --remove-prvs                                        Remove return path verification strings e.g.
                                                       prvs=tag=sender@domain.com
  --decode-srs                                         Convert sender rewrite scheme,
                                                       forwardmailbox+srs=hash=tt=domain.com=user to user@domain.com
  --no-empty-hfrom                                     If the header From: is empty the envelope sender address is
                                                       used
  --sample-subject                                     Enable probabilistic random sampling of subject lines found
                                                       during processing
  --excluded-domains <domain> [<domain> ...]           Exclude domains from processing.
  --restrict-domains <domain> [<domain> ...]           Constrain domains for processing.
  --excluded-senders <sender> [<sender> ...]           Exclude senders from processing.
  --date-format DateFmt                                Date format used to parse the timestamps.
                                                       (default=%Y-%m-%dT%H:%M:%S.%f%z)

Extended processing controls (optional):
  --show-skip-detail                                   Show skipped details

Usage:
  -h, --help                                           Show this help message and exit
```

### Using the Tool with Proofpoint Smart Search

Export all outbound message traffic as a smart search CSV. You may need to export multiple CSVs if the data per time
window exceeds 1M records. The tool can ingest multiple CSVs files at once.

![smart_search_outbound](https://github.com/pfptcommunity/senderstats/assets/83429267/83693152-922e-489a-b06d-a0765ecaf3e8)

Once the files are downlaoded to a target folder, you can run the following command with the path to the files you
downloaded and specify a wildard.

```
senderstats -i C:\path\to\downloaded\files\smart_search_results_custer_hosted_2024_03_04_*.csv -o C:\path\to\output\file\my_cluster_hosted.xlsx
```

### Sample Output

The execution results should look similar to the following depending the options you select.

```
Files to be processed:
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173552.csv
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173855.csv
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173656.csv
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173754.csv
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173834.csv

Domains excluded from processing:
knowledgefront.com
pphosted.com
ppops.net

Processing:  C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173552.csv
Processing:  C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173855.csv
Processing:  C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173656.csv
Processing:  C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173754.csv
Processing:  C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173834.csv

File Processing Summary
Total Records:  4409754
Skipped Records:  2237796

Records by Day
2024-02-03: 43926
2024-02-04: 48567
2024-02-05: 82679
2024-02-06: 100960
2024-02-07: 97990
2024-02-08: 100370
2024-02-09: 85954
2024-02-10: 19740
2024-02-11: 15595
2024-02-12: 94800
2024-02-13: 99043
2024-02-14: 96919
2024-02-15: 95478
2024-02-16: 88463
2024-02-17: 19021
2024-02-18: 16961
2024-02-19: 81489
2024-02-20: 96920
2024-02-21: 103170
2024-02-22: 104562
2024-02-23: 81652
2024-02-24: 17902
2024-02-25: 16311
2024-02-26: 97154
2024-02-27: 99578
2024-02-28: 109633
2024-02-29: 104672
2024-03-01: 117695
2024-03-02: 20002
2024-03-03: 14752

Please see report: C:\Users\ljerabek\Downloads\my_cluster_hosted.xlsx
```

### Sample Summary Statistics

![image](https://github.com/pfptcommunity/senderstats/assets/83429267/93207754-9e58-4e7b-8266-e78eadb48d3a)

### Sample Details (Sender + From by Volume):

![image](https://github.com/pfptcommunity/senderstats/assets/83429267/4fa58247-bf7b-4e9f-ba31-e6173b35da1d)

### Sample Details (Message ID) Inferencing:

![image](https://github.com/pfptcommunity/senderstats/assets/83429267/c6cb1102-c8b5-49c2-b498-51dfa30ae04a)




