Metadata-Version: 2.4
Name: krippendorrf-graph
Version: 0.2.0
Summary: A Python package for computing krippendorrfs alpha for graph (modified from https://github.com/grrrr/krippendorff-alpha/blob/master/krippendorff_alpha.py)
Home-page: https://github.com/junbohuang/Krippendorrf-alpha-for-graph
Author: Junbo Huang
Author-email: junbo.huang@uni-hamburg.de
License: Apache 2 License
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: networkx
Requires-Dist: numpy
Requires-Dist: tqdm
Dynamic: author
Dynamic: author-email
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: requires-dist
Dynamic: summary

# Krippendorrf-alpha-for-graph
Compute Krippendorrf's alpha for graph, modified from https://github.com/grrrr/krippendorff-alpha/

## Changes
1. Used Networkx to instantiate graph 
2. Added custom node/edge and graph metrics (see below)
3. Forced a pre-computation of distance matrix to boost efficiency for computing, and store it as .npy
   - within-units disagreement (Do) 
   - within- and between-units expected total disagreement (De)
4. Not properly tested, but as long as you have a pandas dataframe that satisfies the following shape, it works.
   - the df has a feature column storing annotated graphs (list of tuples, such as [("subject_1", "predicate_1", "object_1"), ("subject_2", "predicate_2", "object_2")])
   - feature column can also be nodes or edges (tuple of strings)
   - a column indicating annotator id
   - annotation id is ordered the same way for all annotator
5. Note that, distance metric interacts with the networkx graph type when calling instantiate_networkx_graph(). There are the following graph types,
   - nx.Graph
   - nx.DiGraph
   - nx.MultiGraph
   - nx.MultiDiGraph
6. Two categories of distance metric are implemented. 
   - Lenient metric: node/edge or graph overlap
   - Strict metric: nominal metric, graph edit distance
7. Depending on your how many graphs you have, computation of graph distance matrix can take a long time. 


### Node/edge Metrics
#### Lenient metric
1. Node overlap metric: if two sets of nodes or edges overlap, the distance between these two sets is 0; else 1.

#### Strict metric
1. Nominal metric: exact match of two sets of ndoes or edges.

### Graph Metrics
#### Lenient metric
1. Graph overlap metric: if two graphs overlap, the distance between these two sets is 0; else 1.

#### Strict metric
1. Normalized graph edit distance
    - normalized by computing distance between g1 and g0 and between g2 and g0
    - g0 is an empty graph

### Example Usage
###### Compute distance matrix of graphs 
```
from krippendorrf_graph import compute_alpha, compute_distance_matrix, graph_edit_distance, graph_overlap_metric, nominal_metric

data = [
    df[df["annotator"]==1].graph_feature.to_list(),
    df[df["annotator"]==2].graph_feature.to_list(),
    df[df["annotator"]==3].graph_feature.to_list(),
    df[df["annotator"]==4].graph_feature.to_list()
]

empty_graph_indicator = "*" # indicator for missing values
feature_column="graph_feature"
save_path = "./lenient_distance_matrix.npy"
graph_distance_metric= node_overlap_metric
forced = True

if not Path(save_path).exists() or forced:
    distance_matrix = compute_distance_matrix(df_task2_annotation, feature_column=feature_column, graph_distance_metric=graph_distance_metric, 
                                              empty_graph_indicator=empty_graph_indicator, save_path=save_path, graph_type=nx.Graph)
else: 
    distance_matrix = np.load(save_path)
    
print("Lenient node metric: %.3f" % compute_alpha(data, distance_matrix=distance_matrix, missing_items=empty_graph_indicator))
```

(Please help contributing by making a PR - it will be faster than reporting an issue since the maintainer might be slower than you.) 
