Metadata-Version: 2.1
Name: searchit
Version: 2019.12.30.1
Summary: Aysncio search engine scraping package
Home-page: https://github.com/EdmundMartin/search_it
Author: Edmund Martin
Maintainer: Edmund Martin <edmartin101@gmail.com>
Maintainer-email: edmartin101@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Operating System :: POSIX
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: aiohttp (>=3.6.2)
Requires-Dist: beautifulsoup4 (>=4.8.2)

# searchit
Searchit is a library for async scraping of search engines. The library supports multiple search engines 
(currently Google, Yandex, and Bing) with support for other search engines to come.

# Install
```
pip install searchit
```
Can be installed using pip, by running the above command.

# Using Searchit
```python
import asyncio

from searchit import GoogleScraper, YandexScraper, BingScraper
from searchit import ScrapeRequest

request = ScrapeRequest("watch movies online", 30)
google = GoogleScraper(max_results_per_page=10) # max_results = Number of results per page
yandex = YandexScraper(max_results_per_page=10)

loop = asyncio.get_event_loop()

results = loop.run_until_complete(google.scrape(request))
results = loop.run_until_complete(yandex.scrape(request))
```
To use Searchit users first create a ScrapeRequest object, with term and number of results as required fields. 
This object can then be passed to multiple different search engines and scraped asynchronously.

## Scrape Request - Object
```
term - Required str - the term to be searched for
count - Required int - the total number of results
domain - Optional[str] - the domain to search i.e. .com or .com
sleep - Optional[int] - time to wait betweeen paginating pages - important to prevent getting blocked
proxy - Optional[str] - proxy to be used to make request - default none
language - Optional[str] - language to conduct search in (only Google atm)
yandex_geo - Optional[str] - Yandex location code to conduct search from - default code for London
```

## Roadmap
* Add additional search engines
* Tests
* Blocking non-async scrape method
* Add support for page rendering (Selenium and Puppeteer)

