Metadata-Version: 2.1
Name: pyce3
Version: 1.0.0
Summary: Multilingual Web Page Content Extractor
Home-page: https://github.com/liuzl/pyce3
Author: Zhanliang Liu
Author-email: liang@zliu.org
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: chardet
Requires-Dist: python-dateutil
Requires-Dist: lxml

# pyce3: Multilingual Web Page Content Extractor for Python3

## Introduction

`pyce3` is a python3 package for multilingual web page content extraction. It is used to extract the content of article type web pages, such as news, blog posts, etc.

## Usage

```python
import pyce3
import requests

url = "http://caijing.chinadaily.com.cn/a/201911/21/WS5dd62455a31099ab995ed438.html"
html = requests.get(url).content
encoding, time, title, text, next_link = pyce3.parse(url, html)
print("编码："+encoding)
print('='*10)
print("标题："+title)
print("时间："+time)
print('='*10)
print("内容："+text)
print("NextPageLink: ", next_link)
```


