Metadata-Version: 2.1
Name: qspider
Version: 0.1.2
Summary: An easy to use tools module for writing multi-thread and multi-process programs.
Home-page: UNKNOWN
Author: Tishacy
Author-email: 
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
Requires-Dist: termcolor (>=1.1.0)
Requires-Dist: colorama (>=0.4.3)

# QSpider

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow)](https://opensource.org/licenses/MIT) [![Pyversion](https://img.shields.io/badge/python-3.x-green)](https://pypi.org/project/qspider/) [![Version](https://img.shields.io/badge/pypi-v0.1.2-red)](https://pypi.org/project/qspider)

An easy to use tools module for writing multi-thread and multi-process programs.

## Install

QSpider could be easily installed using pip:

```bash
$ pip install qspider
```

## Usages

### Using Module

```python
# 1. import class QSpider and Task from qspider module 
#   and other modules.
from qspider import QSpider, Task
import requests

# 2. Define a list of task source.
#   Each of the element in this source list is called 'task_source'.
#   'task_source' could be any type, ie str, tuple, object, dict...,
#   it could also be requests.Session or something else.
source = ['https://www.baidu.com' for i in range(100)]

# 3. Create your own task (which need to extends Task).
class TestTask(Task):
    """A test task

    Attributes:
        task_source: the source which needed in the task.
          which is actually the 'task_source' in the source list.
    """
    def __init__(self, task_source):
        Task.__init__(self, task_source)

    def run(self):
        # process the self.task_source here.
        res = requests.get(self.task_source, timeout=3)
        # return values needed
        return res.status_code

# 4. Create the QSpider and run it.
test_spider = QSpider(source, TestTask, has_result=True)
results = test_spider.run()
print(results)
```

Run the script and you'll get:

```bash
[Info] 100 tasks in total.
[Input] Number of threads: 20
[ ✔ ] 100% |███████████████████████████████████| 100/100 [eta-0:00:00, 2.5s, 40.8it/s]
[200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, ... , 200]
```

### Using command line

Create a QSpider using command:

```bash
$ genqspider -h
usage: Generate your qspider based on templates [-h] [-p] name

positional arguments:
  name           Your spider name

optional arguments:
  -h, --help     show this help message and exit
  -p, --process  Using multi-process instead of multi-thread template
```

#### Example

1.  Create a `test` crawler using QSpider.

    ```bash
    $ genqspider test
    A qspider named test is initialized.
    ```

    A python script named `test.py` is created in your current directory.

2. Open the `test.py`，And you'll get:

    ```python
    # -*- coding: utf-8 -*-

    from qspider import ThreadManager, Task

    class TestSpider(ThreadManager):
        def __init__(self, has_result=False, add_failed=True):
            self.name = "test"
            self.has_result = has_result
            self.add_failed = add_failed
            self.source = [0]  # define your source list
            super(TestSpider, self).__init__(self.source, self.QTask, has_result=self.has_result, add_failed=self.add_failed)

        class QTask(Task):
            def __init__(self, task_source):
                Task.__init__(self, task_source)

            def run(self):
                # parse single task source
                pass

    if __name__=="__main__":
        qspider = TestSpider()
        qspider.test()
        # qspider.run()
    ```

3. Modify your source list with the line `self.source = [0]`, and how you gonna process the `task_source` in the method `QTask.run` .

    ```python
    # -*- coding: utf-8 -*-
    import requests
    from qspider.core import QSpider, Task

    class TestSpider(QSpider):
        def __init__(self, has_result=False, add_failed=True):
            self.name = "test"
            self.has_result = has_result
            self.add_failed = add_failed
            # 1. define your source list
            self.source = ['https://www.baidu.com' for i in range(100)]  
            super(TestSpider, self).__init__(self.source, self.QTask, has_result=self.has_result, add_failed=self.add_failed)

        class QTask(Task):
            def __init__(self, task_source):
                Task.__init__(self, task_source)

            # 2. Modify the run method
            def run(self):
                # process the self.task_source here.
                res = requests.get(self.task_source, timeout=3)
                # return values needed
                return res.status_code

    if __name__=="__main__":
      	# 3. 'has_result' is True when there are values returned in QTask.run method.
        qspider = TestSpider(has_result=True)
        # 4. receive the results after run the qspider.
        results = qspider.run()
        print(results)
    ```

4. Run the script and you'll get:

    ```bash
    [Info] 100 tasks in total.
    [Input] Number of threads: 20
    [ ✔ ] 100% |███████████████████████████████████| 100/100 [eta-0:00:00, 2.5s, 40.8it/s]
    [200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, ... , 200]
    ```

## Releases

-   v0.1.1: First release with basic classes.
-   v0.1.2: Reconstruct code, add ThreadManager, ProcessManager and other tool classes.

## License

Copyright (c) 2020 tishacy.

Licensed under the [MIT License](https://github.com/Tishacy/QSpider/blob/master/LICENSE).

