Metadata-Version: 2.4
Name: gca_analyzer
Version: 0.4.7
Summary: A package for Group Communication Analysis with improved text processing and visualization
Home-page: https://github.com/etShaw-zh/gca_analyzer
Author: Jianjun Xiao <et_shaw@126.com>
Author-email: et_shaw@126.com
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.3.0
Requires-Dist: numpy>=1.20.0
Requires-Dist: matplotlib>=3.4.0
Requires-Dist: plotly>=5.3.0
Requires-Dist: loguru>=0.7.0
Requires-Dist: transformers>=4.30.0
Requires-Dist: sentence-transformers>=2.2.0
Requires-Dist: typing-extensions>=4.0.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

English | [简体中文](README_zh.md)
# GCA Analyzer

[![PyPI version](https://badge.fury.io/py/gca-analyzer.svg)](https://pypi.org/project/gca-analyzer)
[![support-version](https://img.shields.io/pypi/pyversions/gca-analyzer)](https://img.shields.io/pypi/pyversions/gca-analyzer)
[![license](https://img.shields.io/github/license/etShaw-zh/gca_analyzer)](https://github.com/etShaw-zh/gca_analyzer/blob/master/LICENSE)
[![commit](https://img.shields.io/github/last-commit/etShaw-zh/gca_analyzer)](https://github.com/etShaw-zh/gca_analyzer/commits/master)
[![flake8](https://github.com/etShaw-zh/gca_analyzer/workflows/lint/badge.svg)](https://github.com/etShaw-zh/gca_analyzer/actions?query=workflow%3ALint)
![Tests](https://github.com/etShaw-zh/gca_analyzer/actions/workflows/python-test.yml/badge.svg)
[![Coverage Status](https://codecov.io/gh/etShaw-zh/gca_analyzer/branch/main/graph/badge.svg?token=GLAVYYCD9L)](https://codecov.io/gh/etShaw-zh/gca_analyzer)
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/581d2fea968f4b0ab821c8b3d94eaac0)](https://app.codacy.com/gh/etShaw-zh/gca_analyzer/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
[![Documentation Status](https://readthedocs.org/projects/gca-analyzer/badge/?version=latest)](https://gca-analyzer.readthedocs.io/en/latest/?badge=latest)
[![PyPI Downloads](https://static.pepy.tech/badge/gca-analyzer)](https://pepy.tech/projects/gca-analyzer)
[![PyPI Downloads](https://static.pepy.tech/badge/gca-analyzer/month)](https://pepy.tech/projects/gca-analyzer)
[![DOI](https://zenodo.org/badge/915395583.svg)](https://doi.org/10.5281/zenodo.14647250)
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/11TC3wzCmP0r2axRUc1FuyWOBiZS1j-Qg?usp=sharing)
[![Open in ModelScope](https://img.shields.io/badge/ModelScope-Run%20in%20Community-blue?logo=appveyor)](https://modelscope.cn/notebook/share/ipynb/9d562da5/base_usage.ipynb.ipynb)

## Introduction

GCA Analyzer is a Python package for analyzing group communication dynamics using NLP techniques and quantitative metrics. It provides comprehensive tools for understanding **participation patterns**, **interaction dynamics**, **content newness**, and **communication density** in group communications.

## Features

- **Multi-language Support**: Built-in support for Chinese and other languages through LLM models
- **Built-in Sample Data**: Includes ready-to-use sample conversations for immediate testing
- **Notebook Integration**: Jupyter examples for quick runs in Google Colab
- **Comprehensive Metrics**: Analyzes group interactions through multiple dimensions
- **Automated Analysis**: Finds optimal analysis windows and generates detailed statistics
- **Flexible Configuration**: Customizable parameters for different analysis needs
- **Easy Integration**: Command-line interface and Python API support

## Quick Start
### 🚀 Latest Update: Colab Support

#### Experience Google Colab Online

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/11TC3wzCmP0r2axRUc1FuyWOBiZS1j-Qg?usp=sharing)

No installation required. Click the Colab badge above to run GCA Analyzer directly in your browser and get started quickly.

-------

### Installation

```bash
# Install from PyPI
pip install gca-analyzer

# For development
git clone https://github.com/etShaw-zh/gca_analyzer.git
cd gca_analyzer
pip install -e .
```

### Basic Usage

#### Option 1: Use Built-in Sample Data (Recommended for First-time Users)

Start immediately with built-in sample data:

```bash
# Use built-in sample data
python -m gca_analyzer --sample-data

# Preview the sample data first
python -m gca_analyzer --sample-data --preview

# Interactive mode with sample data (recommended)
python -m gca_analyzer --interactive
```

**Sample Data Contents:**
- 42 different engineering project conversations across multiple teams
- 2,727 authentic conversation messages  
- 48 different participants from collaborative engineering sessions
- Original data filtered to include only conversations with ≥30 messages for meaningful analysis

**Data Source & Citation:**
This sample data is adapted from the Epistemic Network Analysis (ENA) Web Tool example dataset. When using this sample data in research or publications, please cite:

- Shaffer, D. W., Collier, W., & Ruis, A. R. (2016). A tutorial on epistemic network analysis: Analyzing the structure of connections in cognitive, social, and interaction data. *Journal of Learning Analytics*, 3(3), 9-45.
- ENA Web Tool: https://app.epistemicnetwork.org/

#### Option 2: Use Your Own Data

1. Prepare your communication data in CSV format with required columns:
```
conversation_id,person_id,time,text
1A,student1,0:08,Hello teacher!
1A,teacher,0:10,Hello everyone!
```

2. Run analysis:

   **Interactive Mode:**
   ```bash
   python -m gca_analyzer --interactive
   # or
   python -m gca_analyzer -i
   ```
   
   **Command Line Mode:**
   ```bash
   python -m gca_analyzer --data your_data.csv
   ```
   
   **Advanced Options:**
   ```bash
   python -m gca_analyzer --data your_data.csv --output results/ --model-name your-model --console-level INFO
   ```

#### Analysis Results

The analyzer generates comprehensive statistics for GCA measures:

![Descriptive Statistics](/docs/_static/gca_results.jpg)

- **Participation**
  - Measures relative contribution frequency
  - Negative values indicate below-average participation
  - Positive values indicate above-average participation

- **Responsivity**
  - Measures how well participants respond to others
  - Higher values indicate better response behavior

- **Internal Cohesion**
  - Measures consistency in individual contributions
  - Higher values indicate more coherent messaging

- **Social Impact**
  - Measures influence on group discussion
  - Higher values indicate a stronger impact on others

- **Newness**
  - Measures introduction of new content
  - Higher values indicate more novel contributions

- **Communication Density**
  - Measures information content per message
  - Higher values indicate more information-rich messages

Results are saved as CSV files in the specified output directory.

#### Visualizations

The analyzer provides interactive and informative visualizations:

![GCA Analysis Results](/docs/_static/vizs.png)

- **Radar Plots**: Compare measures across participants
- **Distribution Plots**: Visualize measure distributions

Results are saved as interactive HTML files in the specified output directory.

## Citation
[![DOI](https://zenodo.org/badge/915395583.svg)](https://doi.org/10.5281/zenodo.14647250)

If you use **GCA Analyzer** in your research, please cite it as follows:

```bibtex
@software{xiao2025gca,
  author       = {Xiao, J.},
  title        = {etShaw-zh/gca_analyzer: GCA analyzer: A python package for group communication analysis},
  version      = {v0.4.5},
  year         = {2025},
  url          = {https://doi.org/10.5281/zenodo.15906956},
  note         = {Computer software},
}
