Metadata-Version: 2.1
Name: hemnes
Version: 0.1.10
Summary: An in-depth ikea scraper
Home-page: https://github.com/sayeefrmoyen/hemnes
Author: Sayeef Moyen
Author-email: develop.sayeefrm@gmail.com
License: UNKNOWN
Keywords: scraping ikea ikea-scraping python3 beautifulsoup,beautiful-soup
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3
Description-Content-Type: text/markdown
Requires-Dist: selenium
Requires-Dist: bs4

<!-- HEADER INFORMATION -->
<h3 align="center">HEMNES</h3>

<p align="center">
    <strong><em>A plug-and-play python3 ikea scraping package</em></strong>
    <br />
    <a href="https://sayeefrmoyen.github.io/hemnes/deploy/html/index.html"><strong>Explore the docs »</strong></a>
    <br />
    <br />
    <a href="https://github.com/sayeefrmoyen/hemnes/issues">Report Bug</a>
    ·
    <a href="https://github.com/sayeefrmoyen/hemnes/issues">Request Feature</a>
  </p>
</p>

<!-- TABLE OF CONTENTS -->
## Table of Contents

* [About the Project](#about-the-project)
  * [Built With](#built-with)
* [Getting Started](#getting-started)
  * [Requirements](#requirements)
  * [Installation](#install)
* [Usage](#usage)
* [License](#license)
* [Known Problems](#problems)
* [Release History](#release-history)
* [Acknowledgments](#acknowledgments)

<!-- ABOUT THE PROJECT -->
## About The Project

Hemnes is a simple python3 package for scraping data from Ikea. Hemnes supports multi-word & strict
queries, as well as saving data to json. The following data is scraped by Hemnes for each matching
product found:

* `name (str)`
* `price (float)`
* `rank (int)`: based on order that products are returned for the query
* `rating (float)`: average user rating
* `url (str)`: product url
* `color (list[str])`: list of colors as strings of the product
* `images (list[str])`: list of full urls to product images

### Built With
Powered by:
* [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/)
* [ChromeDriver](http://chromedriver.chromium.org/getting-started)
* [Selenium](https://www.seleniumhq.org)

<!-- GETTING STARTED -->
## Getting Started

### Requirements

Hemnes requires the following:

* chromedriver
* python3
* pip3

If you have Google Chrome then you should already have chromedriver installed. If not head to the [ChromeDriver](http://chromedriver.chromium.org/getting-started)
page and follow the install instructions. Make sure that chromedriver is on your system path in order for hemnes to work.

### Install

Hemnes is distributed as a pip package. It can be an installed using standard pip installation:
```sh
pip3 install hemnes
```

Import Hemnes into your python scripts:
```python
import hemnes
```

<!-- USAGE EXAMPLES -->
## Usage

Hemnes makes it easy to get detailed product data from Ikea

### Standard Use

For retrieving product results as a list to then process yourself simply call:
```python
product_results = hemnes.get_products("coffee table")
```

`product_results` will now contain a `list[Product]`

`Product` is a simple helper class which contains the following fields:

* `name (str)`
* `tag (str)`
* `price (float)`
* `rank (int)`: based on order that products are returned for the query
* `rating (float)`: average user rating
* `url (str)`: product url
* `color (list[str])`: list of colors as strings of the product
* `images (list[str])`: list of full urls to product images

`tag` is a meta-field that can be used flexibly. By default tag is set to `None`. Some example usages of tag may be:

* Keeping track of which batch each item was stored
* For use as a key in databases

### Saving results to JSON

If you would like to save the results to a json file you can add the `data_path` param:
```python
# saving results to json
product_results = hemnes.get_products("coffee table", data_path="data/coffeetable.json")
```

### Strict Keyword Searching

Hemnes supports "strict searching" to specify required descriptive keywords for returned results. To use this add a `keywords` param:
```python
# adding required keywords
product_results = hemnes.get_products("coffee table", keywords=["large", "wooden"])
```

Hemnes will look for the given keywords in each product's detailed description, and only return those products which contain
all of the given keywords.

### Using the meta Tag

To include a `tag` in the returned results simply pass it to the call:
```python
# including a tag
product_results = hemnes.get_products("coffee table", tag="tables")
```

### Enabling Logs

By default hemnes does not log any output messages. For queries that return many results, hemnes may take several or tens of minutes to complete. For such
queries (e.g. generic queries like "table", or other queries for popular products) seeing log messages can be helpful to know how far along you are.

Hemnes will log output regarding # of results found, # of pages, current page # being processed, current result # being processed, and skipped results
(if strict querying is enabled), in addition to any error messages in the event of a crash. (If you experience a crash it would be extremely helpful for you
to report the exact error message and method call that caused it on github)

To enable logs pass in the `log` param:
```python
# enabling logs
product_results = hemnes.get_products("coffee table", log=True")
```

### Changing Sleep Time

Ikea uses angular to generate many of its product pages. As a result, a `sleep_time` is defined when loading pages in order to
insure that the DOM is properly retrieved before trying to scrape the page. By default `sleep_time` is set to 4 seconds. Depending on
how fast your internet is, this may be too long or too short.

To alter the `sleep_time` for loading pages pass it as a parameter:
```python
# changing sleep time
product_results = hemnes.get_products("coffee table", sleep_time=4) # sleep_time must be an int
```

_For more examples, please refer to the [Documentation](https://sayeefrmoyen.github.io/hemnes/deploy/html/index.html)_

<!-- LICENSE -->
## License

Distributed under the MIT License. See `LICENSE` for more information.

<!-- Known Problems -->
## Known Problems

* Chromedriver runs with GUI - this is to render DOM on Ikea's angular generated pages. If anyone knows a way around this please contact me

<!-- Release History -->
## Release History
Release History

* 0.1.10
	* Add logging
	* Add sleep time modification
	* Fix bug where chromedriver did not close after certain errors
	* Return partial results found before errors
* 0.1.1-0.1.1.9
	* Fix packaging bugs
* 0.1.0
	* First proper release
	* Documentation still **incomplete**
	* Price-based querying functionality implemented, but not yet made available

## Future

* Finish documentation
* Price-based querying functionality implemented, but not yet made available - do this

<!-- Acknowledgments -->
## Acknowledgments

* [Awesome README template](https://github.com/othneildrew/Best-README-Template/blob/master/README.md)

