# Frege indexer library

An indexer library for a Frege project at Jagiellonian University

## How to install

This library is published in pip, so you should be able to install it by:

`pip3 install fregeindexerlib`

and upgrade by:

`pip3 install fregeindexerlib --upgrade`

## How to use

Example usage is available in file [example.py](example.py).

Basically, you need to implement an abstract `fregeindexerlib.indexer.Indexer` class: to be more precise only a `crawl_next_repository` method is needed.

List of methods that can be implemented:
* `crawl_next_repository(self, prev_repository_id: Optional[str]) -> Optional[CrawlResult]`
  
  This method get a string with previously crawled repository id (or None if there was no previously crawled repository)
  and should return a `fregeindexerlib.crawl_result.CrawlResult` dataclass filled with proper information about crawled repository
  or None if there are no more repositories to crawl
  
  The repository id is an id returned by a code hosting API (not the one generated by an Indexer - generation of proper id for this project is a responsibility of that lib).


* `before_crawl(self, prev_repository_id: Optional[str])`
  
  Method invoked right before a `crawl_next_repository` method.
  Get the previously indexed repository id (like a `crawl_next_repository` method). Default implementation is empty.


* `after_crawl(self, crawl_result: CrawlResult)`

  Method invoked right after a `crawl_next_repository` method.
  Get a CrawlResult returned by a `crawl_next_repository` call. Default implementation is empty.
  

* `on_successful_process(self, crawl_result: CrawlResult)`

  Method invoked after successful save a crawl result into a database and successful send a message to the `download` queue.
  Get a CrawlResult returned by a `crawl_next_repository` call. Default implementation is empty.
  

* `on_error(self, exception: Exception)`

  Method invoked when exception occur during a crawling, saving into a database or pushing a message to a queue.
  Get an exception that occur. Default implementation is empty.
  
  Probably there is no need to handle this situation, because library itself handle it.

When you implement a method(s) from a `Indexer` in your own class then create an instance of this class,
passing a proper parameters to its constructor. Here is its definition:

```
__init__(self, indexer_type: IndexerType, rabbitmq_parameters: RabbitMQConnectionParameters,
                 database_parameters: DatabaseConnectionParameters, rejected_publish_delay: int)
```

where:
* `indexer_type` is an `fregeindexerlib.indexer_type.IndexerType` enum - choose a proper indexer that you implement.
* `rabbitmq_parameters` is a `fregeindexerlib.rabbitmq_connection.RabbitMQConnectionParameters` dataclass
* `database_parameters` is a `fregeindexerlib.database_connection.DatabaseConnectionParameters` dataclass
* `rejected_publish_delay` is a number of seconds between tries when queue is full

Finally, invoke a `run` method on this class instance.