Metadata-Version: 2.1
Name: pyxurls
Version: 0.1.2
Summary: A regular expression based URL extractor which extracts URLs from text.
Home-page: https://github.com/andytzeng/pyxurls
Author: Andy Tzeng
Author-email: andytzeng@aol.tw
License: UNKNOWN
Keywords: url regex extract
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: General
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Python: >=3.6
Description-Content-Type: text/markdown

# PyXURLs

[![PyPI version](https://badge.fury.io/py/pyxurls.svg)](https://badge.fury.io/py/pyxurls)
[![Build Status](https://travis-ci.com/andytzeng/pyxurls.svg?branch=main)](https://travis-ci.com/andytzeng/pyxurls)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pyxurls)

A regular expression based URL extractor which extracts URLs from text.

Thanks to [Daniel Martí](https://github.com/mvdan) invests the project [mvdan/xurls](https://github.com/mvdan/xurls). This python project developed by the same concept as the golang version.

## Installing

```bash
# the alternative is regex as engine if you suffered installing on re2
pip install google-re2 pyxurls
```

## Usage

### Extract URLs by strict strategy

```python
import xurls

extractor = xurls.Strict()

url = extractor.findfirst('we have the link with scheme https://www.python.org and https://www.github.com')
#  https://www.python.org

urls = extractor.findall('we have the link with scheme https://www.python.org and https://github.com')
#  ['https://www.python.org', 'https://github.com']
```

### Extract URLs by relaxed strategy

```python
import xurls

extractor = xurls.Relaxed()

url = extractor.findfirst('we have the link with scheme www.python.org and https://www.github.com')
#  www.python.org

urls = extractor.findall('we have the link with scheme www.python.org and https://github.com')
#  ['www.python.org', 'https://github.com']
```

### Extract URLs by limit scheme

```python
import xurls

# limit to https
extractor = xurls.StrictScheme('https://')

url = extractor.findfirst('we have the link with scheme custom://domain.com and https://www.python.org noscheme.com')
#  https://www.python.org

# unlimit to standard scheme
extractor = xurls.StrictScheme(xurls.express.ANY_SCHEME)
urls = extractor.findall('we have the link with scheme custom://domain.com and https://www.python.org noscheme.com')
#  ['custom://domain.com', 'https://www.python.org']
```


