Metadata-Version: 2.4
Name: upa_url
Version: 1.1.0
Summary: WHATWG URL Standard compliant URL parser library
Author-Email: =?utf-8?q?Rimas_Misevi=C4=8Dius?= <rmisev3@gmail.com>
License-Expression: BSD-2-Clause
License-File: LICENSE
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: C++
Classifier: Programming Language :: Python :: 3
Project-URL: Homepage, https://github.com/upa-url/upa_url-py
Project-URL: Issues, https://github.com/upa-url/upa_url-py/issues
Requires-Python: >=3.8
Description-Content-Type: text/markdown

# upa_url package

This package provides Python bindings for [Upa URL](https://github.com/upa-url/upa) – a library compliant with the [WHATWG URL standard](https://url.spec.whatwg.org/). This is the same standard followed by modern browsers and JavaScript runtimes such as Bun, Deno, and Node.js.

This package is designed to be as close to the URL standard as possible. It uses the same class names ([URL](https://url.spec.whatwg.org/#url-class), [URLSearchParams](https://url.spec.whatwg.org/#interface-urlsearchparams)), their function names, the same function parameters, and the same behavior.

## Installation

```sh
pip install upa_url
```

If the binary wheel is not available for your platform, then you will need a C++ compiler that supports C++17 and CMake to build the Python package.

## Getting started

First, you need to import classes:
```python
from upa_url import PSL, URL, URLSearchParams
```

### URL class

The `URL` class provides a structured way to parse, manipulate, and serialize URLs.

An URL can be parsed using one of two methods:
1. Use the `URL` constructor. It throws an exception on error:
   ```python
   try:
       url = URL('https://upa-url.github.io/docs/')
       print(url.href)
   except Exception:
       print('URL parse error')
   ```
2. Use the `URL.parse` fucntion. It returns `None` on error:
   ```python
   url = URL.parse('docs', 'https://upa-url.github.io')
   if url is not None:
       print(url.href)
   ```
The components of the parsed URL object can be accessed using getters and setters: `href`, `origin` (only get value), `protocol`, `username`, `password`, `host`, `hostname`, `port`, `pathname`, `search` and `hash`. You can also get and change the search parameters using the `searchParams` getter, which returns the `URLSearchParams` object associated with the URL:
```python
url = URL.parse('https://example.org')
if url is not None:
    url.searchParams.append('lang', 'lt')
    print(url.href) # https://example.org/?lang=lt
```

To serialize a parsed URL, use either `url.href` or `str(url)`.

If you only need to check URL validity, then the `URL.canParse` function can be used:
```python
if URL.canParse('docs', 'https://upa-url.github.io'):
    print('URL is valid')
```

### URLSearchParams class

The `URLSearchParams` class provides a structured way to parse, manipulate, and serialize the query string of a URL.

An `URLSearchParams` object can be created by using a constructor:
1. To create empty: `params = URLSearchParams()`
2. Create from a string: `params = URLSearchParams('lang=lt&id=123')`
3. Create from a dictionary or a list:
   ```python
   params1 = URLSearchParams({'lang': 'lt', 'id': '123'})
   params2 = URLSearchParams([('lang', 'lt'), ['id', '123']])
   ```

Use `get` or `getAll` to retrieve parameter values:
```python
params = URLSearchParams('a=b&a=c&b=10')
print(params.get('a'))    # b
print(params.getAll('a')) # ['b', 'c']
```

To check for name and optionally value in parameters, use the `has` function:
```python
print(params.has('a'))      # True
print(params.has('a', 'c')) # True
print(params.has('c'))      # False
```

Iterate over all parameters:
```python
params = URLSearchParams('a=1&b=2')
# Get all name-value pairs:
for name, value in params:
    print(name, '=', value)
# Get all parameter names
for name in params.keys():
    print(name)
# Get all parameter values
for value in params.values():
    print(value)
```

Count parameters:
```python
print(params.size) # 2
print(len(params)) # 2
```

To serialize a `URLSearchParams` object, use `str(params)`.

There are functions to manipulate search parameters:
1. Add or replace parameters:
   ```python
   params = URLSearchParams('a=a')
   params.append('a', 'aa')
   params.append('b', 'bb')
   print(params) # a=a&a=aa&b=bb
   params.set('a', '1')
   print(params) # a=1&b=bb
   ```
2. Remove parameters:
   ```python
   params = URLSearchParams('a=a&a=aa&b=b&b=bb')
   params.delete('a')
   print(params) # b=b&b=bb
   params.delete('b', 'bb')
   print(params) # b=b
   ```
3. Sort parameters by name:
   ```python
   params = URLSearchParams('c=1&b=2&a=3')
   params.sort()
   print(params) # a=3&b=2&c=1
   ```

### PSL class

The PSL class allows getting the [public suffix](https://url.spec.whatwg.org/#host-public-suffix) and [registrable domain](https://url.spec.whatwg.org/#host-registrable-domain) of a given host.

First, you need to create a PSL object and load the [Public Suffix List](https://publicsuffix.org/). This list can be downloaded from https://publicsuffix.org/list/public_suffix_list.dat. The downloaded file can be loaded using one of the following methods:
1. Use the `load` function:
   ```python
   psl = PSL.load('public_suffix_list.dat')
   if (psl is not None):
       print(psl.public_suffix('upa-url.github.io')) # github.io
   ```
2. Use the `PSL` constructor:
   ```python
   try:
       psl = PSL('public_suffix_list.dat')
       # Use psl
   except Exception:
       print('PSL loading error')
   ```

The Public Suffix List can be loaded from memory using the push interface:
1. Line by line:
   ```python
   psl = PSL()
   with open('public_suffix_list.dat', 'r', encoding='utf-8') as f:
       for line in f:
           psl.push_line(line.rstrip())
   if psl.finalize():
       # Use psl
   ```
2. Using the memory buffer, for example, to load a list from the web:
   ```python
   import urllib.request
   url = 'https://upa-url.github.io/demo/public_suffix_list.dat'
   psl = PSL()
   with urllib.request.urlopen(url) as response:
       while (chunk := response.read(4096)):
           psl.push(chunk)
   if psl.finalize():
       # Use psl
   ```

The following examples show how to get a [public suffix](https://url.spec.whatwg.org/#host-public-suffix) and a [registrable domain](https://url.spec.whatwg.org/#host-registrable-domain):
```python
# Get from the host string
print(psl.public_suffix('abc.ålgård.no')) # xn--lgrd-poac.no
print(psl.registrable_domain('abc.ålgård.no')) # abc.xn--lgrd-poac.no

# Get from the host string and do not convert the output to ASCII
print(psl.public_suffix('abc.ålgård.no', ascii=False)) # ålgård.no
print(psl.registrable_domain('abc.ålgård.no', ascii=False)) # abc.ålgård.no

# Get from the URL
url = URL('https://upa-url.github.io/docs/')
print(psl.public_suffix(url)) # github.io
print(psl.registrable_domain(url)) # upa-url.github.io
```

## License

This package is licensed under the [BSD 2-Clause License](https://opensource.org/license/bsd-2-clause/) (see `LICENSE` file).
