Metadata-Version: 2.1
Name: partridge
Version: 1.1.2
Summary: Partridge is a python library for working with GTFS feeds using pandas DataFrames.
Home-page: https://github.com/remix/partridge
Author: Danny Whalen
Author-email: daniel.r.whalen@gmail.com
License: MIT license
Keywords: partridge
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Python: >=3.6, <4
Description-Content-Type: text/x-rst
License-File: LICENSE
Requires-Dist: charset_normalizer
Requires-Dist: functools32; python_version < "3"
Requires-Dist: networkx>=2.0
Requires-Dist: pandas
Requires-Dist: isoweek
Provides-Extra: full
Requires-Dist: geopandas; extra == "full"

=========
Partridge
=========


.. image:: https://img.shields.io/pypi/v/partridge.svg
        :target: https://pypi.python.org/pypi/partridge

.. image:: https://img.shields.io/travis/remix/partridge.svg
        :target: https://travis-ci.org/remix/partridge


Partridge is a Python 3.6+ library for working with `GTFS <https://developers.google.com/transit/gtfs/>`__ feeds using `pandas <https://pandas.pydata.org/>`__ DataFrames.

Partridge is heavily influenced by our experience at `Remix <https://www.remix.com/>`__ analyzing and debugging every GTFS feed we could find.

At the core of Partridge is a dependency graph rooted at ``trips.txt``. Disconnected data is pruned away according to this graph when reading the contents of a feed.

Feeds can also be filtered to create a view specific to your needs. It's most common to filter a feed down to specific dates (``service_id``) or routes (``route_id``), but any field can be filtered.

.. figure:: dependency-graph.png
   :alt: dependency graph


Philosophy
----------

The design of Partridge is guided by the following principles:

**As much as possible**

- Favor speed
- Allow for extension
- Succeed lazily on expensive paths
- Fail eagerly on inexpensive paths

**As little as possible**

- Do anything other than efficiently read GTFS files into DataFrames
- Take an opinion on the GTFS spec


Installation
------------

.. code:: console

    pip install partridge


**GeoPandas support**

.. code:: console

    pip install partridge[full]


Usage
-----

**Setup**

.. code:: python

    import partridge as ptg

    inpath = 'path/to/caltrain-2017-07-24/'


Examples
--------

The following is a collection of gists containing Jupyter notebooks with transformations to GTFS feeds that may be useful for intake into software applications.

* `Find the busiest week in a feed and reduce its file size <https://gist.github.com/csb19815/aadef16178dfcb5ba7a8d88fbf718749>`_
* `Combine routes by route_short_name <https://gist.github.com/csb19815/67c0247d1eed2286ca0b323a02a1179f>`_
* `Merge GTFS with shapefile geometries <https://gist.github.com/csb19815/535ddb5d36a081abac3430f1a58bd875>`_
* `Merge multiple agencies into one <https://gist.github.com/csb19815/682e0f6f30844313213fa5715e48df8c>`_
* `Rewrite a feed to clean up formatting issues <https://gist.github.com/csb19815/659c8eba4742cc3f1b8f23d66a760a0c>`_
* `If a feed has stop_code, replace the contents of stop_id with the contents of stop_code <https://gist.github.com/csb19815/5bf7923ffb1ce7ec155ac9a94a83ea70>`_
* `Diff the number of service hours in two feeds <https://gist.github.com/csb19815/476335cb299ddb3d5a1a4b898424bb35>`_
* `Investigate the the distance in meters of each stop to the closest point on a shape <https://gist.github.com/sgoel/bff9384129974967817404abe80e7c6a>`_
* `Convert frequencies.txt to an equivalent trips.txt <https://gist.github.com/invisiblefunnel/6c9f3a9b537d3f0ad192c24777b6ae57>`_
* `Calculate headway for a stop <https://gist.github.com/invisiblefunnel/6015e65684325281e65fa9339a78229b>`_


Inspecting the calendar
~~~~~~~~~~~~~~~~~~~~~~~


**The date with the most trips**

.. code:: python

    date, service_ids = ptg.read_busiest_date(inpath)
    #  datetime.date(2017, 7, 17), frozenset({'CT-17JUL-Combo-Weekday-01'})


**The week with the most trips**


.. code:: python

    service_ids_by_date = ptg.read_busiest_week(inpath)
    #  {datetime.date(2017, 7, 17): frozenset({'CT-17JUL-Combo-Weekday-01'}),
    #   datetime.date(2017, 7, 18): frozenset({'CT-17JUL-Combo-Weekday-01'}),
    #   datetime.date(2017, 7, 19): frozenset({'CT-17JUL-Combo-Weekday-01'}),
    #   datetime.date(2017, 7, 20): frozenset({'CT-17JUL-Combo-Weekday-01'}),
    #   datetime.date(2017, 7, 21): frozenset({'CT-17JUL-Combo-Weekday-01'}),
    #   datetime.date(2017, 7, 22): frozenset({'CT-17JUL-Caltrain-Saturday-03'}),
    #   datetime.date(2017, 7, 23): frozenset({'CT-17JUL-Caltrain-Sunday-01'})}


**Dates with active service**

.. code:: python

    service_ids_by_date = ptg.read_service_ids_by_date(path)

    date, service_ids = min(service_ids_by_date.items())
    #  datetime.date(2017, 7, 15), frozenset({'CT-17JUL-Caltrain-Saturday-03'})

    date, service_ids = max(service_ids_by_date.items())
    #  datetime.date(2019, 7, 20), frozenset({'CT-17JUL-Caltrain-Saturday-03'})


**Dates with identical service**


.. code:: python

    dates_by_service_ids = ptg.read_dates_by_service_ids(inpath)

    busiest_date, busiest_service = ptg.read_busiest_date(inpath)
    dates = dates_by_service_ids[busiest_service]

    min(dates), max(dates)
    #  datetime.date(2017, 7, 17), datetime.date(2019, 7, 19)


Reading a feed
~~~~~~~~~~~~~~


.. code:: python

    _date, service_ids = ptg.read_busiest_date(inpath)

    view = {
        'trips.txt': {'service_id': service_ids},
        'stops.txt': {'stop_name': 'Gilroy Caltrain'},
    }

    feed = ptg.load_feed(path, view)


**Read shapes and stops as GeoDataFrames**

.. code:: python

    service_ids = ptg.read_busiest_date(inpath)[1]
    view = {'trips.txt': {'service_id': service_ids}}

    feed = ptg.load_geo_feed(path, view)

    feed.shapes.head()
    #       shape_id                                           geometry
    #  0  cal_gil_sf  LINESTRING (-121.5661454200744 37.003512297983...
    #  1  cal_sf_gil  LINESTRING (-122.3944115638733 37.776439059278...
    #  2   cal_sf_sj  LINESTRING (-122.3944115638733 37.776439059278...
    #  3  cal_sf_tam  LINESTRING (-122.3944115638733 37.776439059278...
    #  4   cal_sj_sf  LINESTRING (-121.9031703472137 37.330157067882...

    minlon, minlat, maxlon, maxlat = feed.stops.total_bounds
    #  -122.412076, 37.003485, -121.566088, 37.77639


Extracting a new feed
~~~~~~~~~~~~~~~~~~~~~

.. code:: python

    outpath = 'gtfs-slim.zip'

    service_ids = ptg.read_busiest_date(inpath)[1]
    view = {'trips.txt': {'service_id': service_ids}}

    ptg.extract_feed(inpath, outpath, view)
    feed = ptg.load_feed(outpath)

    assert service_ids == set(feed.trips.service_id)


Features
--------

-  Surprisingly fast :)
-  Load only what you need into memory
-  Built-in support for resolving service dates
-  Easily extended to support fields and files outside the official spec
   (TODO: document this)
-  Handle nested folders and bad data in zips
-  Predictable type conversions

Thank You
---------

I hope you find this library useful. If you have suggestions for
improving Partridge, please open an `issue on
GitHub <https://github.com/remix/partridge/issues>`__.


=======
History
=======

1.1.2 (2022-11-23)
------------------

Code changes:

* Remove references to deprecated NumPy types (https://github.com/remix/partridge/pull/69 - thanks @BlackSpade741!)
* Switch from `cChardet <https://github.com/PyYoshi/cChardet>`_ to `charset-normalizer <https://github.com/Ousret/charset_normalizer>`_ for Python 3.10 support (https://github.com/remix/partridge/pull/76 - thanks @brockhaywood!)

Other changes:

* Miscellaneous improvements to tests, code formatting, and documentation (https://github.com/remix/partridge/pull/61 - thanks @invisiblefunnel!)
* Relocate usage examples from wiki to README (https://github.com/remix/partridge/pull/70 - thanks @landonreed!)
* README tweaks (https://github.com/remix/partridge/pull/74 - thanks @chelsey!)
* Use GitHub Actions for automated testing (https://github.com/remix/partridge/pull/79 - thanks @dget!). **Note:** we now test against Python versions 3.8, 3.9, 3.10, and 3.11.


1.1.1 (2019-09-13)
------------------

* Improve file encoding sniffer, which was misidentifying some Finnish/emoji unicode. Thanks to @dyakovlev!


1.1.0 (2019-02-21)
------------------

* Add ``partridge.load_geo_feed`` for reading stops and shapes into GeoPandas GeoDataFrames.


1.0.0 (2018-12-18)
------------------

This release is a combination of major internal refactorings and some minor interface changes. Overall, you should expect your upgrade from pre-1.0 versions to be relatively painless. A big thank you to @genhernandez and @csb19815 for their valuable design feedback. If you still need Python 2 support, please continue using version 0.11.0.

Here is a list of interface changes:

* The class ``partridge.gtfs.feed`` has been renamed to ``partridge.gtfs.Feed``.
* The public interface for instantiating feeds is ``partridge.load_feed``. This function replaces the previously undocumented function ``partridge.get_filtered_feed``.
* A new function has been added for identifying the busiest week in a feed: ``partridge.read_busiest_date``
* The public function ``partridge.get_representative_feed`` has been removed in favor of using ``partridge.read_busiest_date`` directly.
* The public function ``partridge.writers.extract_feed`` is now available via the top level module: ``partridge.extract_feed``.

Miscellaneous minor changes:

* Character encoding detection is now done by the ``cchardet`` package instead of ``chardet``. ``cchardet`` is faster, but may not always return the same result as ``chardet``.
* Zip files are unpacked into a temporary directory instead of reading directly from the zip. These temporary directories are cleaned up when the feed is garbage collected or when the process exits.
* The code base is now annotated with type hints and the build runs ``mypy`` to verify the types.
* DataFrames are cached in a dictionary instead of the ``functools.lru_cache`` decorator.
* The ``partridge.extract_feed`` function now writes files concurrently to improve performance.


0.11.0 (2018-08-01)
-------------------

* Fix major performance issue related to encoding detection. Thank you to @cjer for reporting the issue and advising on a solution.


0.10.0 (2018-04-30)
-------------------

* Improved handling of non-standard compliant file encodings
* Only require functools32 for Python < 3
* ``ptg.parsers.parse_date`` no longer accepts dates, only strings


0.9.0 (2018-03-24)
------------------

* Improves read time for large feeds by adding LRU caching to ``ptg.parsers.parse_time``.


0.8.0 (2018-03-14)
------------------

* Gracefully handle completely empty files. This change unifies the behavior of reading from a CSV with a header only (no data rows) and a completely empty (zero bytes) file in the zip.


0.7.0 (2018-03-09)
------------------

* Fix handling of nested folders and zip containing nested folders.
* Add ``ptg.get_filtered_feed`` for multi-file filtering.


0.6.1 (2018-02-24)
------------------

* Fix bug in ``ptg.read_service_ids_by_date``. Reported by @cjer in #27.


0.6.0 (2018-02-21)
------------------

* Published package no longer includes unnecessary fixtures to reduce the size.
* Naively write a feed object to a zip file with ``ptg.write_feed_dangerously``.
* Read the earliest, busiest date and its ``service_id``'s from a feed with ``ptg.read_busiest_date``.
* Bug fix: Handle ``calendar.txt``/``calendar_dates.txt`` entries w/o applicable trips.


0.6.0.dev1 (2018-01-23)
-----------------------

* Add support for reading files from a folder. Thanks again @danielsclint!


0.5.0 (2017-12-22)
------------------

* Easily build a representative view of a zip with ``ptg.get_representative_feed``. Inspired by `peartree <https://github.com/kuanb/peartree/blob/3bfc3f49ae6986d6020913b63c8ee32582b3dcc3/peartree/paths.py#L26>`_.
* Extract out GTFS zips by agency_id/route_id with ``ptg.extract_{agencies,routes}``.
* Read arbitrary files from a zip with ``feed.get('myfile.txt')``.
* Remove ``service_ids_by_date``, ``dates_by_service_ids``, and ``trip_counts_by_date`` from the feed class. Instead use ``ptg.{read_service_ids_by_date,read_dates_by_service_ids,read_trip_counts_by_date}``.


0.4.0 (2017-12-10)
------------------

* Add support for Python 2.7. Thanks @danielsclint!


0.3.0 (2017-10-12)
------------------

* Fix service date resolution for raw_feed. Previously raw_feed considered all days of the week from calendar.txt to be active regardless of 0/1 value.


0.2.0 (2017-09-30)
------------------

* Add missing edge from fare_rules.txt to routes.txt in default dependency graph.


0.1.0 (2017-09-23)
------------------

* First release on PyPI.
