Metadata-Version: 2.1
Name: extract-emails
Version: 4.0.2
Summary: Extract email addresses from given URL.
Home-page: https://github.com/dmitriiweb/extract-emails
Author: Dmitrii K
Author-email: dmitriik@tutanota.com
License: MIT
Keywords: extract emails email
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.6
Requires-Dist: requests (>=2.23.0)
Requires-Dist: selenium (>=3.141.0)
Provides-Extra: dev
Requires-Dist: mock ; extra == 'dev'
Requires-Dist: coverage ; extra == 'dev'
Requires-Dist: pytest (>=3.10) ; extra == 'dev'
Requires-Dist: readme-renderer ; extra == 'dev'
Requires-Dist: sphinx ; extra == 'dev'
Requires-Dist: sphinx-rtd-theme (>=0.4.0) ; extra == 'dev'
Provides-Extra: dev-docs
Requires-Dist: readme-renderer ; extra == 'dev-docs'
Requires-Dist: sphinx ; extra == 'dev-docs'
Requires-Dist: sphinx-rtd-theme (>=0.4.0) ; extra == 'dev-docs'
Provides-Extra: dev-lint
Requires-Dist: mock ; extra == 'dev-lint'
Provides-Extra: dev-test
Requires-Dist: mock ; extra == 'dev-test'
Requires-Dist: coverage ; extra == 'dev-test'
Requires-Dist: pytest (>=3.10) ; extra == 'dev-test'
Provides-Extra: timezone
Requires-Dist: pytz ; extra == 'timezone'

Extract Emails
==============

Extract emails from a given website

Documentation_

.. _Documentation: https://dmitriiweb.github.io/extract-emails/

Requirements
------------

-  Python >= 3.6
-  requests
-  selenium

Installation
------------

::

    pip install extract_emails

Usage
-----

With default browsers
~~~~~~~~~~~~~~~~~~~~~

::

    from extract_emails import EmailExtractor
    from extract_emails.browsers import ChromeBrowser


    with ChromeBrowser() as browser:
        email_extractor = EmailExtractor("http://www.tomatinos.com/", browser, depth=2)
        emails = email_extractor.get_emails()


    for email in emails:
        print(email)
        print(email.as_dict())

    # Email(email="bakedincloverdale@gmail.com", source_page="http://www.tomatinos.com/")
    # {'email': 'bakedincloverdale@gmail.com', 'source_page': 'http://www.tomatinos.com/'}
    # Email(email="freshlybakedincloverdale@gmail.com", source_page="http://www.tomatinos.com/")
    # {'email': 'freshlybakedincloverdale@gmail.com', 'source_page': 'http://www.tomatinos.com/'}

::

    from extract_emails import EmailExtractor
    from extract_emails.browsers import RequestsBrowser


    with RequestsBrowser() as browser:
        email_extractor = EmailExtractor("http://www.tomatinos.com/", browser, depth=2)
        emails = email_extractor.get_emails()


    for email in emails:
        print(email)
        print(email.as_dict())

    # Email(email="bakedincloverdale@gmail.com", source_page="http://www.tomatinos.com/")
    # {'email': 'bakedincloverdale@gmail.com', 'source_page': 'http://www.tomatinos.com/'}
    # Email(email="freshlybakedincloverdale@gmail.com", source_page="http://www.tomatinos.com/")
    # {'email': 'freshlybakedincloverdale@gmail.com', 'source_page': 'http://www.tomatinos.com/'}

With custom browser
~~~~~~~~~~~~~~~~~~~

::

    from extract_emails import EmailExtractor
    from extract_emails.browsers import BrowserInterface

    from selenium import webdriver
    from selenium.webdriver.firefox.options import Options


    class FirefoxBrowser(BrowserInterface):
        def __init__(self):
            ff_options = Options()
            self._driver = webdriver.Firefox(
                options=ff_options, executable_path="/home/di/geckodriver",
            )

        def close(self):
            self._driver.quit()

        def get_page_source(self, url: str) -> str:
            self._driver.get(url)
            return self._driver.page_source


    with FirefoxBrowser() as browser:
        email_extractor = EmailExtractor("http://www.tomatinos.com/", browser, depth=2)
        emails = email_extractor.get_emails()

    for email in emails:
        print(email)
        print(email.as_dict())

    # Email(email="bakedincloverdale@gmail.com", source_page="http://www.tomatinos.com/")
    # {'email': 'bakedincloverdale@gmail.com', 'source_page': 'http://www.tomatinos.com/'}
    # Email(email="freshlybakedincloverdale@gmail.com", source_page="http://www.tomatinos.com/")
    # {'email': 'freshlybakedincloverdale@gmail.com', 'source_page': 'http://www.tomatinos.com/'}



