Metadata-Version: 2.1
Name: CommercialScraper
Version: 0.0.2
Summary: A dynamic and scalable data pipeline from Airbnbs commercial site to your local system / cloud storage.
Home-page: https://github.com/BlairMar/Airbnb-webscraping-project
Author: Omar 4ldrich Tahmas
Author-email: o.ismail@aol.co.uk
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: beautifulsoup4 (==4.10.0)
Requires-Dist: boto3 (==1.20.10)
Requires-Dist: botocore (==1.22.8)
Requires-Dist: greenlet (==1.1.2)
Requires-Dist: jmespath (==0.10.0)
Requires-Dist: lxml (==4.6.4)
Requires-Dist: numpy (==1.21.4)
Requires-Dist: pandas (==1.3.4)
Requires-Dist: psycopg2 (==2.9.2)
Requires-Dist: pytz (==2021.3)
Requires-Dist: s3transfer (==0.5.0)
Requires-Dist: selenium (==3.141.0)
Requires-Dist: six (==1.16.0)
Requires-Dist: soupsieve (==2.3.1)
Requires-Dist: SQLAlchemy (==1.4.27)
Requires-Dist: urllib3 (==1.26.7)

# Airbnb Scraper

A fully dynamic and scalable data pipeline made in Python dedicated to scraping Airbnb's commercial website for both alphanumeric and image data, and saving both locally and/or on the cloud.

## Installation
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install CommercialScraper.
```bash
pip install CommercialScraper
```

## Usage
```python
from CommercialScraper.pipeline import AirbnbScraper
import CommercialScraper.data_processing

scraper = AirbnbScraper()

# Returns a dictionary of structured data and a list of image sources for a single product page
product_dict, imgs = scraper.scrape_product_data('https://any/airbnb/product/page', any_ID_you_wish, 'Any Category Label you wish')

# Returns a dataframe of product entries as well as a dictionary of image sources pertaining to each product entry
df, imgs = scraper.scrape_all()


# Saves the dataframe to a csv in your local directory inside a created 'data/' folder. 
data_processing.df_to_csv(df, 'any_filename')

# Saves images locally
data_processing.images_to_local(images)

# Saves structured data to sql database
data_processing.df_to_sql(df, table_name, username, password, hostname, port, database)

# Saves structured data to AWS cloud services s3 bucket
data_processing.df_to_s3(df, aws_access_key_id, region_name, aws_secret_access_key, bucket_name, upload_name)

# Saves images to AWS cloud services s3 bucket
data_processing.images_to_s3(source_links, aws_access_key_id,region_name, aws_secret_access_key, bucket_name, upload_name)

```
## Docker Image 
This package has been containerised in a docker image where it can be run as an application. Please note that data can only be stored on the cloud by this method, not locally.
[Docker Image](https://hub.docker.com/r/docker4ldrich/airbnb-scraper)

```bash
docker pull docker4ldrich/airbnb-scraper

docker run -it docker4ldrich/airbnb-scraper
```
Follow the prompts and insert credentials carefully, there won't be a chance to correct any typing errors!
It's recommended that you paste credentials in where applicable.

## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

## License
[MIT](https://choosealicense.com/licenses/mit/)

