Metadata-Version: 2.4
Name: imessage-conversation-analyzer
Version: 3.0.0rc1
Summary: Analyzes the entire history of a macOS Messages conversation
Keywords: apple,imessage,messages,macos,conversation,chat,analysis,pandas
Author: Caleb Evans
Author-email: Caleb Evans <caleb@calebevans.me>
License-Expression: MIT
Requires-Dist: pandas
Requires-Dist: pyarrow
Requires-Dist: tabulate
Requires-Dist: openpyxl
Requires-Dist: pytypedstream
Requires-Dist: phonenumbers
Requires-Dist: tzlocal
Requires-Dist: emoji
Requires-Dist: duckdb
Maintainer: Caleb Evans
Maintainer-email: Caleb Evans <caleb@calebevans.me>
Requires-Python: >=3.9
Project-URL: changelog, https://github.com/caleb531/imessage-conversation-analyzer/releases
Project-URL: documentation, https://github.com/caleb531/imessage-conversation-analyzer#readme
Project-URL: homepage, https://github.com/caleb531/imessage-conversation-analyzer
Project-URL: repository, https://github.com/caleb531/imessage-conversation-analyzer
Description-Content-Type: text/markdown

# iMessage Conversation Analyzer

*Copyright 2020-2026 Caleb Evans*  
*Released under the MIT license*

[![tests](https://github.com/caleb531/imessage-conversation-analyzer/actions/workflows/tests.yml/badge.svg)](https://github.com/caleb531/imessage-conversation-analyzer/actions/workflows/tests.yml)
[![Coverage Status](https://coveralls.io/repos/caleb531/imessage-conversation-analyzer/badge.svg?branch=main)](https://coveralls.io/r/caleb531/imessage-conversation-analyzer?branch=main)

iMessage Conversation Analyzer (ICA) is a fully-typed Python library (and CLI
utility) that will read the contents of an iMessage conversation via the
Messages database on macOS. You can then gather various metrics of interest from
the messages in that conversation.

Much of this program was inspired by and built using findings from [this blog post by Yorgos Askalidis][blog-post].

[blog-post]: https://medium.com/@yaskalidis/heres-how-you-can-access-your-entire-imessage-history-on-your-mac-f8878276c6e9

## Installation

Open a Terminal and run the following:

```sh
pip3 install imessage-conversation-analyzer
```

You can also install ICA via [uv][uv]:

```sh
uv tool install imessage-conversation-analyzer
```

[uv]: https://docs.astral.sh/uv/

## Usage

The package includes both a Command Line API for simplicity/convenience, as well
as a Python API for developers who want maximum flexibility.

### Command Line API

To use ICA from the command line, run the `ica` command from the Terminal. The
minimum required arguments are:

1. A path to an analyzer file to run, or the name of a built-in analyzer
2. The first and last name of the contact(s), via the `--contact` / `-c` flag
   1. If the contact has no last name on record, you can just pass the first
      name
   2. You can also pass any phone number or email address associated with the
      contact; keep in mind that analysis will still run on all phone numbers /
      email addresses associated with the contact, not just the one you specify
   3. For group chats, simply pass multiple `--contact` / `-c` flags

#### Example

```sh
ica message_totals -c 'Thomas Riverstone' -c 'Daniel Brightingale'
```

The following outputs a table like:

```
Metric               Total
Messages             20036
Messages From Me      7000
Messages From Daniel  6501
Messages From Thomas  6535
Reactions             4880
Reactions From Me     1700
Reactions From Daniel 1675
Reactions From Thomas 1505
Days Messaged          115
Days Missed              0
Days With No Reply       0
```

#### Built-in analyzers

ICA includes several built-in analyzers out of the box:

1. `message_totals`: a summary of message and reaction counts, by person and in
   total, as well as other insightful metrics
2. `attachment_totals`: lists count data by attachment type, including
   number of Spotify links shared, YouTube videos, Apple Music, etc.
3. `most_frequent_emojis`: count data for the top 10 most frequently used emojis
   across the entire conversation
4. `totals_by_day`: a comprehensive breakdown of message totals for every day
   you and the other participants have been messaging in the conversation
5. `transcript`: a full, unedited transcript of every message, including
   reactions, between you and the other participants (attachment files not included)
6. `count_phrases`: count the number of case-insensitive occurrences of any
   arbitrary strings across all messages in a conversation (excluding
   reactions); use the `-s` / `--case-sensitive` option for case-sensitive
   counts, and the `-r` / `--use-regex` option to enable regular expression mode
   for all phrases you specify
7. `from_sql`: execute an arbitrary SQL query against the conversation data
   (messages and attachments), using an in-memory SQLite database

#### Filtering

There are several built-in flags you can use to filter messages and attachments.

- `--from-date`: A start date to filter messages by (inclusive); the format must
  be ISO 8601-compliant, e.g. YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS
- `--to-date`: An end date to filter messages by (exclusive); the format must be
  ISO 8601-compliant, e.g. YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS
- `--from-person` / `-p`: A reference to the person by whom to filter messages;
  accepted values can be `me`, `them`, `all`, or another participant; you can
  specifying another participant using their first name, full name, phone
  number, or email address (defaults to `all`); to filter by multiple people,
  pass this flag multiple times (e.g. `-p Thomas -p Daniel`)


```sh
ica message_totals -c 'Thomas Riverstone' --from-date 2024-12-01 --to-date 2025-01-01 --from-person 'Thomas'
```

```sh
# Filtering by more than one person
ica message_totals -c 'Thomas Riverstone' -c 'Daniel Brightingale' --from-date 2024-12-01 --to-date 2025-01-01 --from-person 'Thomas' --from-person 'Jane'
```

#### Other formats

You can optionally pass the `-f`/`--format` flag to output to a specific format
like CSV (supported formats include `csv`, `excel`/`xlsx`, `markdown`/`md`, and `json`).

```sh
ica message_totals -c 'Thomas Riverstone' -f csv
```

```sh
ica ./my_custom_analyzer.py -c 'Thomas Riverstone' -f csv
```

#### Writing to a file

Finally, there is an optional `-o`/`--output` flag if you want to output to a
specified file. ICA will do its best to infer the format from the file
extension, although you could also pass `--format` if you have special filename
requirements.

```sh
ica transcript -c 'Thomas Riverstone' -o ./my_transcript.xlsx
```

### Python API

The Python API is much more powerful, allowing you to integrate ICA into any
type of Python project that can run on macOS. All of the built-in analyzers
(under the `ica/analyzers` directory) actually use this API.

Here's a complete example that shows how to retrieve the transcript of an entire
iMessage conversation with one or more other people.

```python
# get_my_transcript.py

import pandas as pd

import ica


# Export a transcript of the entire conversation
def main() -> None:
    # Allow your program to accept all the same CLI arguments as the `ica`
    # command; you can skip calling this if have other means of specifying the
    # contact name and output format; you can also add your own arguments this
    # way (see the count_phrases analyzer for an example of this)
    cli_args = ica.get_cli_parser().parse_args(
        namespace=ica.TypedCLIArguments()
    )
    # Retrieve the dataframes corresponding to the processed contents of the
    # database; dataframes include `messages` and `attachments`
    dfs = ica.get_dataframes(
        contacts=cli_args.contacts,
        timezone=cli_args.timezone,
        from_date=cli_args.from_date,
        to_date=cli_args.to_date,
        from_people=cli_args.from_people,
    )
    # Send the results to stdout (or to file) in the given format
    ica.output_results(
        pd.DataFrame(
            {
                "timestamp": dfs.messages["datetime"],
                "is_from_me": dfs.messages["is_from_me"],
                "is_reaction": dfs.messages["is_reaction"],
                # U+FFFC is the object replacement character, which appears as
                # the textual message for every attachment
                "message": dfs.messages["text"].replace(
                    r"\ufffc", "(attachment)", regex=True
                ),
            }
        ),
        # The default format (None) corresponds to the pandas default dataframe
        # table format
        format=cli_args.format,
        # When output is None (the default), ICA will print to stdout
        output=cli_args.output,
        # Make certain column labels more human-friendly with
        # prettified_label_overrides
        prettified_label_overrides={
            'is_from_me': 'Is from Me?',
            'is_reaction': 'Is Reaction?'
        }
    )


if __name__ == "__main__":
    main()
```

You can run the above program using the `ica` command, or execute it directly
like any other Python program.

```sh
ica ./get_my_transcript.py -c 'Thomas Riverstone'
```

```sh
python ./get_my_transcript.py -c 'Thomas Riverstone'
```

```sh
python -m get_my_transcript -c 'Thomas Riverstone'
```

You're not limited to writing a command line program, though! The
`ica.get_dataframes()` function is the only function you will need in any
analyzer program. But beyond that, feel free to import other modules, send your
results to other processes, or whatever you need to do!

### Errors and exceptions

- `BaseAnalyzerException`: the base exception class for all library-related
  errors and exceptions
- `ContactNotFoundError`: raised if the specified contact was not found
- `ConversationNotFoundError`: raised if the specified conversation was not
  found
- `FormatNotSupportedError`: raised if the specified format is not supported by
  the library

#### Using a specific timezone

By default, all dates and times are in the local timezone of the system on which
ICA is run. If you'd like to change this, you can pass the `--timezone` / `-t`
option to the CLI with an [IANA timezone name][iana].

```sh
ica totals_by_day -c 'Daniel Brightingale' -t UTC
```

```sh
ica totals_by_day -c 'Daniel Brightingale' -t America/New_York
```

[iana]: https://data.iana.org/time-zones/tzdb-2021a/zone1970.tab

The equivalent option for the Python API is the `timezone` parameter to
`ica.get_dataframes`:

```python
dfs = ica.get_dataframes(contact=my_contact, timezone='UTC')
```

### Data Schema

All analyzers (including the built-in `from_sql` analyzer and any custom
analyzers you write) have access to the following dataframes/tables. An object
with these dataframes are returned by the `ica.get_dataframes()` function in the
Python API.

#### `messages`

A list of all messages in the conversation, including text messages and reactions.

| Column | Type | Description |
| :--- | :--- | :--- |
| `ROWID` | `int` | The unique identifier of the message |
| `text` | `str` | The content of the message |
| `datetime` | `datetime.datetime` | The timestamp of the message whose timezone is based on the `timezone` parameter you pass to `get_dataframes()` (defaults to the system's local timezone) |
| `sender_display_name` | `str` | A display name representing the sender of the message; can be a first name, full name, phone number, email address, or "Me" if `is_from_me` is true for that message |
| `sender_handle` | `str` | The specific handle (phone number or email address) from which the sender sent the message |
| `is_from_me` | `bool` | Whether the message was sent by you (`True`) or another participant (`False`) |
| `is_reaction` | `bool` | Whether the message is a reaction (e.g. "Loved ...") |

#### `attachments`

A list of all attachments in the conversation, including images, videos, audio\, and any other types of files. Please note that no content is included, only metadata.

| Column | Type | Description |
| :--- | :--- | :--- |
| `ROWID` | `int` | The unique identifier of the attachment |
| `filename` | `str` | The filename of the attachment |
| `mime_type` | `str` | The MIME type of the attachment (e.g. `image/jpeg`) |
| `message_id` | `int` | The `ROWID` of the associated message |
| `datetime` | `datetime.datetime` | The localized timestamp of the message |
| `is_from_me` | `bool` | Whether the attachment was sent by you (`True`) or another participant (`False`) |
| `sender_handle` | `str` | The specific handle (phone number or email address) from which the sender sent the attachment |

#### `handles`

A list of all handles (phone numbers and email addresses) associated with the
participants of the conversation (other than the host user / "me"). This allows
for easy joining with the `messages` dataframe.

| Column | Type | Description |
| :--- | :--- | :--- |
| `handle_id` | `int` | The unique numeric ID of the handle |
| `name` | `str` | The full name of the contact associated with the handle |
| `first_name` | `str` | The first name of the participant (as found on their contact record) |
| `last_name` | `str` | The last name of the participant (as found on their contact record) |
| `identifier` | `str` | The specific handle (phone number or email address) belonging to the participant |
| `contact_id` | `str` | The unique identifier of the contact record |
| `display_name` | `str` | A unique display name for the participant; can be a first name, full name, phone number, or email address (to ensure uniqueness) |

### SQL Functions

The Python API also exposes several powerful functions that allow you to query
your conversation data using SQL. This is powered by an in-memory SQLite
database that is automatically populated with the available iMessage dataframes.
Please refer to the *Data Schema* section above for details on the available
tables and their columns.

- `get_sql_connection(dfs)`: A context manager which creates a temporary in-memory SQLite database from your ICA dataframes, allowing you to operate on them with the `ica.execute_sql_query()` function (documented below)
- `execute_sql_query(query, con)`: Executes a SQL query against the connection provided by `get_sql_connection`; returns a pandas dataframe with the results

```python
import ica

def main() -> None:
    # Retrieve conversation data
    dfs = ica.get_dataframes(contacts=["Jane Doe"])

    # Run SQL queries against the data
    with ica.get_sql_connection(dfs) as con:
        results = ica.execute_sql_query(
            "SELECT * FROM messages WHERE is_from_me = 1",
            con
        )
        ica.output_results(results)

if __name__ == "__main__":
    main()
```

## Developer Setup

The following instructions are written for developers who want to run the
package locally or write their own analyzers.

We recommend using the uv package manager for easier environment and dependency
management ([instructions][installation-docs]).

[installation-docs]: https://docs.astral.sh/uv/getting-started/installation/#installation-methods

### 1. Install uv

```sh
curl -LsSf https://astral.sh/uv/install.sh | sh
```

### 2. Create virtual environment and install dependencies

```sh
uv sync
```

### 3. Run CLI like normal

When you install ICA with uv, an editable installation of the package gets
installed into the virtual environment that uv creates for you. This allows you
to make changes to the source code and continue to invoke `ica` like normal:

```sh
ica message_totals -c 'Thomas Riverstone'
```
