Metadata-Version: 2.4
Name: core-cdc
Version: 3.0.0
Summary: This library provides the mechanism to implement Change Data Capture (CDC) services...
Author-email: Alejandro Cora González <alek.cora.glez@gmail.com>
Maintainer: Alejandro Cora González
License-Expression: MIT
Project-URL: Homepage, https://gitlab.com/bytecode-solutions/core/core-cdc
Project-URL: Repository, https://gitlab.com/bytecode-solutions/core/core-cdc
Project-URL: Documentation, https://core-cdc.readthedocs.io/en/latest/
Project-URL: Issues, https://gitlab.com/bytecode-solutions/core/core-cdc/-/issues
Project-URL: Changelog, https://gitlab.com/bytecode-solutions/core/core-cdc/-/blob/master/CHANGELOG.md
Classifier: Intended Audience :: Developers
Classifier: Development Status :: 5 - Production/Stable
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >=3.9
Description-Content-Type: text/x-rst
License-File: LICENSE
Requires-Dist: core-mixins>=2.2.4
Provides-Extra: all
Requires-Dist: mysql-replication>=1.0.9; extra == "all"
Requires-Dist: pymongo>=4.0.0; extra == "all"
Provides-Extra: dev
Requires-Dist: core-dev-tools>=1.2.1; extra == "dev"
Requires-Dist: core-tests>=2.0.5; extra == "dev"
Requires-Dist: types-PyMySQL; extra == "dev"
Provides-Extra: mongo
Requires-Dist: pymongo>=4.0.0; extra == "mongo"
Provides-Extra: mysql
Requires-Dist: mysql-replication>=1.0.9; extra == "mysql"
Dynamic: license-file

core-cdc (CDC a.k.a Change Data Capture)
===============================================================================

It provides the core mechanism and required resources to 
implement "Change Data Capture" services...

===============================================================================


.. image:: https://img.shields.io/pypi/pyversions/core-cdc.svg
    :target: https://pypi.org/project/core-cdc/
    :alt: Python Versions

.. image:: https://img.shields.io/badge/license-MIT-blue.svg
    :target: https://gitlab.com/bytecode-solutions/core/core-cdc/-/blob/main/LICENSE
    :alt: License

.. image:: https://gitlab.com/bytecode-solutions/core/core-cdc/badges/release/pipeline.svg
    :target: https://gitlab.com/bytecode-solutions/core/core-cdc/-/pipelines
    :alt: Pipeline Status

.. image:: https://readthedocs.org/projects/core-cdc/badge/?version=latest
    :target: https://readthedocs.org/projects/core-cdc/
    :alt: Docs Status

.. image:: https://img.shields.io/badge/security-bandit-yellow.svg
    :target: https://github.com/PyCQA/bandit
    :alt: Security

|


Installation
===============================================================================

Install from PyPI using pip:

.. code-block:: bash

    pip install core-cdc
    uv pip install core-cdc  # Or using UV...


Features
===============================================================================

**Multi-Database CDC Support**
    - MySQL Binary Log (BinLog) based change capture
    - MongoDB Change Streams for real-time event streaming
    - Extensible processor architecture for additional database engines

**Comprehensive Event Handling**
    - DML operations: INSERT, UPDATE, DELETE
    - DDL operations: CREATE, ALTER, DROP (schemas and tables)
    - Configurable event filtering by operation type

**Flexible Target Replication**
    - Implement your own target by subclassing ``ITarget``
    - Send records to any destination: database, queue, data warehouse, etc.
    - Support for multiple simultaneous targets

**Standardized Data Format**
    - Common Record structure for cross-service integration
    - Includes metadata: timestamps, transaction IDs, source position
    - JSON serialization support for streaming and messaging systems

**Production-Ready Features**
    - Built-in error handling and retry mechanisms
    - Comprehensive logging for monitoring and debugging
    - Optional event timestamp column for UPSERT/MERGE operations


Quick Start
===============================================================================

Installation
-------------------------------------------------------------------------------

Install the package:

.. code-block:: bash

    pip install core-cdc
    uv pip install core-cdc     # Or using UV...
    pip install -e ".[dev]"     # For development...


Setting Up Environment
-------------------------------------------------------------------------------

1. Install required libraries:

.. code-block:: bash

    pip install --upgrade pip
    pip install virtualenv

2. Create Python virtual environment:

.. code-block:: bash

    virtualenv --python=python3.12 .venv

3. Activate the virtual environment:

.. code-block:: bash

    source .venv/bin/activate

Install packages
-------------------------------------------------------------------------------

.. code-block:: bash

    pip install .
    pip install -e ".[dev]"

Optional libraries
-------------------------------------------------------------------------------

.. code-block:: bash

    pip install '.[all]'    # MySQL + MongoDB
    pip install '.[mysql]'  # MySQL BinLog support
    pip install '.[mongo]'  # MongoDB Change Streams support


Check tests and coverage
-------------------------------------------------------------------------------

.. code-block:: bash

    python manager.py run-tests                   # unit tests
    python manager.py run-tests --test-type integration
    python manager.py run-coverage


Functional Tests
-------------------------------------------------------------------------------

Functional tests require live database servers and are **not** discovered by
``pytest`` or ``tox`` automatically (files are named ``check_*.py`` to prevent
accidental execution).

The helper script ``tests/functional/quick_test.sh`` checks connectivity, runs
both MySQL and MongoDB test suites, and prints a metrics summary:

.. code-block:: bash

    bash tests/functional/quick_test.sh

All connection parameters default to the Docker values below and can be
overridden via environment variables (``MYSQL_HOST``, ``MYSQL_PASSWORD``,
``MONGO_HOST``, ``MONGO_DATABASE``, etc.).

To run individual test files:

.. code-block:: bash

    python manager.py run-tests --test-type functional --pattern "*.py"


Spinning Up Local Servers with Docker
-------------------------------------------------------------------------------

**MongoDB Replica Set** (required for Change Streams):

.. code-block:: bash

    docker network create mongoCluster

    docker run -d --rm -p 27017:27017 --name mongo1 --network mongoCluster \
        mongo:5 mongod --replSet myReplicaSet --bind_ip localhost,mongo1

    docker run -d --rm -p 27018:27017 --name mongo2 --network mongoCluster \
        mongo:5 mongod --replSet myReplicaSet --bind_ip localhost,mongo2

    docker run -d --rm -p 27019:27017 --name mongo3 --network mongoCluster \
        mongo:5 mongod --replSet myReplicaSet --bind_ip localhost,mongo3

    docker exec -it mongo1 mongosh --eval "rs.initiate({
      _id: \"myReplicaSet\",
      members: [
        {_id: 0, host: \"mongo1\"},
        {_id: 1, host: \"mongo2\"},
        {_id: 2, host: \"mongo3\"}
      ]
    })"

Check cluster status:

.. code-block:: bash

    docker ps
    docker exec -it mongo1 mongosh --eval "rs.status()"

**MySQL** (BinLog replication enabled by default in the official image):

.. code-block:: bash

    docker run \
      --env=MYSQL_ROOT_PASSWORD=mysql_password \
      --volume=/var/lib/mysql \
      -p 3306:3306 \
      --restart=no \
      -d mysql:latest


Implemented CDC Engines
===============================================================================

The following database engines have CDC implementations:

Fully Implemented
-------------------------------------------------------------------------------

**MySQL** - Binary Log (BinLog) based CDC
    - Uses ``mysql-replication`` library
    - Captures INSERT, UPDATE, DELETE operations
    - Supports DDL events (CREATE, ALTER, DROP)
    - Fallback mechanism for column name resolution
    - See: `core_cdc/processors/mysql/ <core_cdc/processors/mysql/>`_

**MongoDB** - Change Streams based CDC
    - Uses native MongoDB Change Streams
    - Captures INSERT, UPDATE, DELETE operations
    - Requires replica set configuration
    - Real-time event streaming
    - See: `core_cdc/processors/mongo/ <core_cdc/processors/mongo/>`_

Planned / Documentation Only
-------------------------------------------------------------------------------

**MS SQL Server** and **Oracle** implementations are not yet included.
Reference guides and implementation templates are available in the documentation.


Contributing
===============================================================================

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch
3. Write tests for new functionality
4. Ensure all tests pass: ``python manager.py run-tests --test-type integration``
5. Run linting: ``pylint core_cdc``
6. Run security checks: ``bandit -r core_cdc``
7. Submit a pull request


License
===============================================================================

This project is licensed under the MIT License. See the LICENSE file for details.


Links
===============================================================================

* **Documentation:** https://core-cdc.readthedocs.io/en/latest/
* **Repository:** https://gitlab.com/bytecode-solutions/core/core-cdc
* **Issues:** https://gitlab.com/bytecode-solutions/core/core-cdc/-/issues
* **Changelog:** https://gitlab.com/bytecode-solutions/core/core-cdc/-/blob/master/CHANGELOG.md
* **PyPI:** https://pypi.org/project/core-cdc/


Support
===============================================================================

For questions or support, please open an issue on GitLab or contact the maintainers.


Authors
===============================================================================

* **Alejandro Cora González** - *Initial work* - alek.cora.glez@gmail.com
