Metadata-Version: 2.4
Name: primesgen
Version: 1.0.3
Summary: finds primes using Eratosthenes sieving algorithm
Author-email: Robert Koegler <robert.koegler@gmail.com>
Maintainer-email: Robert Koegler <robert.koegler@gmail.com>
License-Expression: MIT AND Apache-2.0
Description-Content-Type: text/x-rst
Requires-Dist: numpy

#########
primesgen
#########

primesgen is a collection of implementations of the Eratosthenes Sieve algorithm, leveraging numpy

all timing results displayed are from a 2.3 GHz 8-Core Intel Core i9/16 GB 2400 MHz DDR4 machine (Darwin)


L1D cache size
==============

there is a function to get the L1D cache size in __init__.py. It is platform dependent

.. code-block:: python

    ''' 'Darwin' works
        'Linux' is untested
        'Windows' (or others) is not implemented'''
    def get_cache_size() -> int:
        os = platform.system()
        match os:
            case 'Darwin':
                return _get_darwin_cache_size()
            case 'Linux':
                # not tested, may raise
                return _untested_get_linux_cache_size()
            case 'Windows':
                raise NotImplementedError(f'not implemented for your operating system: {os}')
            case _:
                raise NotImplementedError(f'not implemented for your operating system: {os}')

implementations
===============
the 2 implementations can either return a list of primes (all in an array), or a generator

primesgen.Erastothenes
----------------------

fastest, and it's quite simple. Here is a quick example with timing:

.. code-block:: python

    from src.primesgen import Eratosthenes

    max_val = 1_000_000_000
    as_list = Eratosthenes().primes(max_val)

finds all 50,847,533 primes < 1,000,000,000 in 8.9 seconds

primesgen.SegmentedEratosthenes
-------------------------------
segments the sieve which allows processing larger numbers than the other implementations

multiples and primes
^^^^^^^^^^^^^^^^^^^^
the segmented algorithm accumulates prime values and segment multiples (1 for each prime < sqrt(max value)) for each
segment processed

these will ultimately run out of memory

if you don't care to manage the details, just use the factory methods

.. code-block:: python

    from src.multiples import Multiples
    from src.primes import Primes

    primes = Primes.factory()
    multiples = Multiples.factory()

you can also specify in memory types

.. code-block:: python

    from src.multiples import MultiplesInMem
    from src.primes import PrimesInMem

    max_val = 1_000_000_000
    primes = PrimesInMem()
    multiples = MultiplesInMem(max_val)

you can also specify backed by disk types, which only keep a portion of data on the data in memory with the rest backed
by files (files are in /var/tmp/primesgen*, with previous run(s) deleted each time)

.. code-block:: python

    from src.multiples import Multiples, MultiplesOnDisk
    from src.primes import Primes, PrimesOnDisk

    max_val = 1_000_000_000
    primes = PrimesOnDisk(max_val, Primes.on_disk_primes_block_size)
    multiples = MultiplesOnDisk(max_val, Multiples.on_disk_multiples_block_size

finally, you can use a type for Primes that broadcasts the primes to a queue

you still need to decide what kind of multiples to use (here I chose OnDisk)

.. code-block:: python

    import queue
    from src.multiples import Multiples, MultiplesOnDisk
    from src.primes import PrimesToQueue

    max_val = 1_000_000_000
    primes_queue = queue.Queue(maxsize=0)
    primes = PrimesToQueue(primes_queue)
    multiples = MultiplesOnDisk(max_val, Multiples.on_disk_multiples_block_size)

full example
^^^^^^^^^^^^

using the PrimesToQueue and MultiplesOnDisk types (with a custom queue consumer that prints to screen)

.. code-block:: python

    import queue
    import time
    from threading import Thread

    from src import get_cache_size
    from src.multiples import Multiples, MultiplesOnDisk
    from src.primes import Primes
    from src.primesgenseg import SegmentedEratosthenes


    def example_queue_consumer(in_queue: queue.Queue[int]) -> None:
        # this is an example of someone consuming the primes if a queue is provided
        print_count = 10_000_000
        count = 0
        partial_count = 0
        while True:
            prime = in_queue.get()
            in_queue.task_done()
            if prime is not None:
                count += 1
                partial_count += 1
            if partial_count >= print_count or prime is None:
                print(f'\tfound {count:,} primes thus far')
                partial_count = 0
            if prime is None:
                break

    max_val = 1_000_000_000
    primes_queue = queue.Queue(maxsize=0)
    primes = Primes.factory(in_queue=primes_queue)
    multiples = MultiplesOnDisk(max_val, Multiples.on_disk_multiples_block_size)
    segment_max_size = get_cache_size()
    sieve = SegmentedEratosthenes(segment_max_size, primes, multiples)
    print_thread = Thread(target=example_queue_consumer, args=(primes_queue,))
    start = time.time()
    sieve_thread = Thread(target=sieve.primes, args=(max_val,))
    sieve_thread.start()
    print_thread.start()
    sieve_thread.join()
    duration = time.time() - start
    print_thread.join()

.. code-block::

    <consumer> received 10,000,000 primes thus far (34.957 seconds)
    <consumer> received 20,000,000 primes thus far (75.803 seconds)
    <consumer> received 30,000,000 primes thus far (118.922 seconds)
    <consumer> received 40,000,000 primes thus far (155.940 seconds)
    <consumer> received 50,000,000 primes thus far (185.011 seconds)
    <consumer> received 50,847,533 primes thus far (185.791 seconds)

finds/broadcasts all 50,847,533 primes <= 1,000,000,000 in 184.0 seconds

example with knobs
^^^^^^^^^^^^^^^^^^
using the PrimesOnDisk and MultiplesOnDisk types, with all the block and segment sizes as vars you can tweak

.. code-block:: python

    from src.multiples import MultiplesOnDisk
    from src.primes import PrimesOnDisk
    from src.primesgen import SegmentedEratosthenes


    max_val = 100_000_000

    primes_block_size = 100_000
    mulitples_block_size = 50_000
    segment_max_size = 32_000

    primes = PrimesOnDisk(primes_block_size)
    multiples = MultiplesOnDisk(max_val, mulitples_block_size)
    sieve = SegmentedEratosthenes(segment_max_size, primes, multiples)
    primes = sieve.primes(max_val)

Tests
=====
pytest results should look like

.. code-block::

    ============================= test session starts ==============================
    collecting ... collected 11 items

    test_mulitples.py::test_multiplesinmem
    test_mulitples.py::test_multiplesondisk
    test_primes.py::test_primesinmem
    test_primes.py::test_primesinqueue
    test_primes.py::test_primesondisk
    test_primesgen.py::test_100_000
    test_primesgen.py::test_1_000_000
    test_primesgen.py::test_10_000_000
    test_primesgen.py::test_100_000_000
    test_primesgen.py::test_35_123_231
    test_primesgen.py::test_84_max_val_9_segment_size

    ============================= 11 passed in 22.31s ==============================
