Metadata-Version: 2.1
Name: cs.timeseries
Version: 20230217
Summary: Efficient portable machine native columnar file storage of time series data for double float and signed 64-bit integers.
Home-page: https://bitbucket.org/cameron_simpson/css/commits/all
Author: Cameron Simpson
Author-email: Cameron Simpson <cs@cskk.id.au>
License: GNU General Public License v3 or later (GPLv3+)
Project-URL: URL, https://bitbucket.org/cameron_simpson/css/commits/all
Keywords: python3
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Description-Content-Type: text/markdown
Provides-Extra: pandas

Efficient portable machine native columnar file storage of time
series data for double float and signed 64-bit integers.

*Latest release 20230217*:
* TimeSeriesFile.save_to: use atomic_filename() to create the updated file.
* Other small fixes and updates.

The core purpose is to provide time series data storage; there
are assorted convenience methods to export arbitrary subsets
of the data for use by other libraries in common forms, such
as dataframes or series, numpy arrays and simple lists.
There are also some simple plot methods for plotting graphs.

Three levels of storage are defined here:
- `TimeSeriesFile`: a single file containing a binary list of
  float64 or signed int64 values
- `TimeSeriesPartitioned`: a directory containing multiple
  `TimeSeriesFile` files, each covering a separate time span
  according to a supplied policy, for example a calendar month
- `TimeSeriesDataDir`: a directory containing multiple
  `TimeSeriesPartitioned` subdirectories, each for a different
  time series, for example one subdirectory for grid voltage
  and another for grid power

Together these provide a hierarchy for finite sized files storing
unbounded time series data for multiple parameters.

On a personal basis, I use this as efficient storage of time
series data from my solar inverter, which reports in a slightly
clunky time limited CSV format; I import those CSVs into
time series data directories which contain the overall accrued
data; see my `cs.splink` module which is built on this module.

## Function `array_byteswapped(ary)`

Context manager to byteswap the `array.array` `ary` temporarily.

## Class `ArrowBasedTimespanPolicy(TimespanPolicy, icontract._metaclass.DBC, HasEpochMixin, TimeStepsMixin, cs.deco.Promotable)`

A `TimespanPolicy` based on an Arrow format string.

See the `raw_edges` method for the specifics of how these are defined.

## Function `as_datetime64s(times, unit='s', utcoffset=0)`

Return a Numpy array of `datetime64` values
computed from an iterable of `int`/`float` UNIX timestamp values.

The optional `unit` parameter (default `'s'`) may be one of:
- `'s'`: seconds
- `'ms'`: milliseconds
- `'us'`: microseconds
- `'ns'`: nanoseconds
and represents the precision to preserve in the source time
when converting to a `datetime64`.
Less precision gives greater time range.

## Function `datetime64_as_timestamp(dt64: numpy.datetime64)`

Return the UNIX timestamp for the `datetime64` value `dt64`.

## Function `deduce_type_bigendianness(typecode: str) -> bool`

Deduce the native endianness for `typecode`,
an array/struct typecode character.

## Class `Epoch(Epoch, builtins.tuple, TimeStepsMixin, cs.deco.Promotable)`

The basis of time references with a starting UNIX time `start`
and a `step` defining the width of a time slot.

## Function `get_default_timezone_name()`

Return the default timezone name.

## Class `HasEpochMixin(TimeStepsMixin)`

A `TimeStepsMixin` with `.start` and `.step` derived from `self.epoch`.

## Function `main(argv=None)`

Run the command line tool for `TimeSeries` data.

## Function `plot_events(start, stop, events, value_func, *, utcoffset, figure=None, ax=None, **scatter_kw) -> matplotlib.axes._axes.Axes`

Plot `events`, an iterable of objects with `.unixtime`
attributes such as an `SQLTagSet`.
Return the `Axes` on which the plot was made.

Parameters:
* `events`: an iterable of objects with `.unixtime` attributes
* `value_func`: a callable to compute the y-axis value from an event
* `start`: optional start UNIX time, used to crop the events plotted
* `stop`: optional stop UNIX time, used to crop the events plotted
* `figure`,`ax`: optional arguments as for `cs.mplutils.axes`
* `utcoffset`: optional UTC offset for presentation
Other keyword parameters are passed to `Axes.scatter`.

## Class `PlotSeries(PlotSeries, builtins.tuple, cs.deco.Promotable)`

Information about a series to be plotted:
- `label`: the label for this series
- `series`: an series
- `extra`: a `dict` of extra information such as plot styling

## Class `TimePartition(TimePartition, builtins.tuple, TimeStepsMixin)`

A `namedtuple` for a slice of time with the following attributes:
* `epoch`: the reference `Epoch`
* `name`: the name for this slice
* `start_offset`: the epoch offset of the start time
* `end_offset`: the epoch offset of the end time

These are used by `TimespanPolicy` instances to express the partitions
into which they divide time.

## Function `timerange(*da, **dkw)`

A decorator intended for plotting functions or methods which
presents optional `start` and `stop` leading positional
parameters and optional `tz` or `utcoffset` keyword parameters.
The decorated function will be called with leading `start`
and `stop` positional parameters and a specific `utcoffset`
keyword parameter.

The as-decorated function is called with the following parameters:
* `start`: an optional UNIX timestamp positional for the
  start of the range; if omitted the default is `self.start`;
  this is a required parameter if the decorator has `needs_start=True`
* `stop`: an optional UNIX timestamp positional parameter for the end
  of the range; if omitted the default is `self.stop`;
  this is a required parameter if the decorator has `needs_stop=True`
* `tz`: optional timezone `datetime.tzinfo` object or
  specification as for `tzfor()`;
  this is used to infer a UTC offset in seconds
* `utcoffset`: an optional offset from UTC time in seconds
Other parameters are passed through to the deocrated function.

A decorated *method* is then called as:

    method(self, start, stop, *a, utcoffset=utcoffset, **kw)

where `*a` and `**kw` are the additional positional and keyword
parameters respectively, if any.

A decorated *function* is called as:

    function(start, stop, *a, utcoffset=utcoffset, **kw)

The `utcoffset` is an offset to apply to UTC-based time data
for _presentation_ on the graph, largely because the plotting
functions use `DataFrame.plot` which broadly ignores attempts
to set locators or formatters because it supplies its own.
The plotting function would shift the values of the `DataFrame`
index using this value.

If neither `utcoffset` or `tz` is supplied by the caller, the
`utcoffset` is `0.0`.
A specified `utcoffset` is passed through.
A `tz` is promoted to a `tzinfo` instance via the `tzfor()`
function and applied to the `stop` timestamp to obtain a
`datetime` from which the `utcoffset` will be derived.
It is an error to specify both `utcoffset` and `tz`.

## Class `TimeSeries(cs.resources.MultiOpenMixin, cs.context.ContextManagerMixin, HasEpochMixin, TimeStepsMixin)`

Common base class of any time series.

## Function `timeseries_from_path(tspath: str, epoch: Optional[cs.timeseries.Epoch] = None, typecode=None)`

Turn a time series filesystem path into a time series:
* a file: a `TimeSeriesFile`
* a directory holding `.csts` files: a `TimeSeriesPartitioned`
* a directory: a `TimeSeriesDataDir`

## Class `TimeSeriesBaseCommand(cs.cmdutils.BaseCommand)`

Abstract base class for command line interfaces to `TimeSeries` data files.

Command line usage:

    Usage: timeseriesbase subcommand [...]
      Subcommands:
        fetch ...
          Fetch raw data files from the primary source to a local spool.
          To be implemented in subclasses.
        help [-l] [subcommand-names...]
          Print the full help for the named subcommands,
          or for all subcommands if no names are specified.
          -l  Long help even if no subcommand-names provided.
        import ...
          Import data into the time series.
          To be implemented in subclasses.
        info
          Report information.
        plot [-f] [-o imgpath.png] [--show] [--tz tzspec] start-time [stop-time] [{glob|fields}...]
          Plot the data from specified fields for the specified time range.
          Options:
            --bare          Strip axes and padding from the plot.
            -f              Force. -o will overwrite an existing image file.
            -o imgpath.png  File system path to which to save the plot.
            --show          Show the image in the GUI.
            --tz tzspec     Skew the UTC times presented on the graph
                            The default skew is 0 i.e. UTC.
                            to emulate the timezone specified by tzspec.
            --stacked       Stack the plot lines/areas.
            start-time      An integer number of days before the current time
                            or any datetime specification recognised by
                            dateutil.parser.parse.
            stop-time       Optional stop time, default now.
                            An integer number of days before the current time
                            or any datetime specification recognised by
                            dateutil.parser.parse.
            glob|fields     If glob is supplied, constrain the keys of
                            a TimeSeriesDataDir by the glob.

## Class `TimeSeriesCommand(TimeSeriesBaseCommand, cs.cmdutils.BaseCommand)`

Command line interface to `TimeSeries` data files.

Command line usage:

    Usage: timeseries [-s ts-step] tspath subcommand...
        -s ts-step  Specify the UNIX time step for the time series,
                    used if the time series is new and checked otherwise.
        tspath      The filesystem path to the time series;
                    this may refer to a single .csts TimeSeriesFile, a
                    TimeSeriesPartitioned directory of such files, or
                    a TimeSeriesDataDir containing partitions for
                    multiple keys.
      Subcommands:
        dump
          Dump the contents of tspath.
        fetch ...
          Fetch raw data files from the primary source to a local spool.
          To be implemented in subclasses.
        help [-l] [subcommand-names...]
          Print the full help for the named subcommands,
          or for all subcommands if no names are specified.
          -l  Long help even if no subcommand-names provided.
        import csvpath datecol[:conv] [import_columns...]
          Import data into the time series.
          csvpath   The CSV file to import.
          datecol[:conv]
                    Specify the timestamp column and optional
                    conversion function.
                    "datecol" can be either the column header name
                    or a numeric column index counting from 0.
                    If "conv" is omitted, the column should contain
                    a UNIX seconds timestamp.  Otherwise "conv"
                    should be either an identifier naming one of
                    the known conversion functions or an "arrow.get"
                    compatible time format string.
          import_columns
                    An optional list of column names or their derived
                    attribute names. The default is to import every
                    numeric column except for the datecol.
        info
          Report infomation about the time series stored at tspath.
        plot [-f] [-o imgpath.png] [--show] [--tz tzspec] start-time [stop-time] [{glob|fields}...]
          Plot the data from specified fields for the specified time range.
          Options:
            --bare          Strip axes and padding from the plot.
            -f              Force. -o will overwrite an existing image file.
            -o imgpath.png  File system path to which to save the plot.
            --show          Show the image in the GUI.
            --tz tzspec     Skew the UTC times presented on the graph
                            The default skew is 0 i.e. UTC.
                            to emulate the timezone specified by tzspec.
            --stacked       Stack the plot lines/areas.
            start-time      An integer number of days before the current time
                            or any datetime specification recognised by
                            dateutil.parser.parse.
            stop-time       Optional stop time, default now.
                            An integer number of days before the current time
                            or any datetime specification recognised by
                            dateutil.parser.parse.
            glob|fields     If glob is supplied, constrain the keys of
                            a TimeSeriesDataDir by the glob.
        test [testnames...]
          Run some tests of functionality.

## Class `TimeSeriesDataDir(TimeSeriesMapping, builtins.dict, cs.resources.MultiOpenMixin, cs.context.ContextManagerMixin, cs.fs.HasFSPath, cs.configutils.HasConfigIni, HasEpochMixin, TimeStepsMixin)`

A directory containing a collection of `TimeSeriesPartitioned` subdirectories.

## Class `TimeSeriesFile(TimeSeries, cs.resources.MultiOpenMixin, cs.context.ContextManagerMixin, HasEpochMixin, TimeStepsMixin, cs.fs.HasFSPath)`

A file containing a single time series for a single data field.

This provides easy access to a time series data file.
The instance can be indexed by UNIX time stamp for time based access
or its `.array` property can be accessed for the raw data.

The data file itself has a header indicating the file data big endianness,
the datum type and the time type (both `array.array` type codes).
Following these are the start and step sizes in the time type format.
This is automatically honoured on load and save.

A new file will use the native endianness, but files of other
endianness are correctly handled, making a `TimeSeriesFile`
portable between architectures.

Read only users can just instantiate an instance and access
its `.array` property, or use the `peek` and `peek_offset` methods.

Read/write users should use the instance as a context manager,
which will automatically update the file with the array data
on exit:

    with TimeSeriesFile(fspath) as ts:
        ... work with ts here ...

Note that the save-on-close is done with `TimeSeries.flush()`
which only saves if `self.modified`.
Use of the `__setitem__` or `pad_to` methods set this flag automatically.
Direct access via the `.array` will not set it,
so users working that way for performance should update the flag themselves.

A `TimeSeriesFile` has two underlying modes of operation:
in-memory `array.array` mode and direct-to-file `mmap` mode.

The in-memory mode reads the whole file into an `array.array` instance,
and all updates then modify the in-memory `array`.
The file is saved when the context manager exits or when `.save()` is called.
This maximises efficiency when many accesses are done.

The `mmap` mode maps the file into memory, and accesses operate
directly against the file contents.
This is more efficient for just a few accesses,
but every "write" access (setting a datum) will make the mmapped page dirty,
causing the OS to queue it for disc.
This mode is recommended for small accesses
such as updating a single datum, eg from polling a data source.

Presently the mode used is triggered by the access method.
Using the `peek` and `poke` methods uses `mmap` by default.
Other accesses default to use the in-memory mode.
Access to the `.array` property forces use of the `array` mode.
Poll/update operations should usually choose to use `peek`/`poke`.

*Method `TimeSeriesFile.__init__(self, fspath: str, typecode: Optional[cs.timeseries.TypeCode] = None, *, epoch: Optional[cs.timeseries.Epoch] = None, fill=None, fstags=None)`*:
Prepare a new time series stored in the file at `fspath`
containing machine native data for the time series values.

Parameters:
* `fspath`: the filename of the data file
* `typecode` optional expected `array.typecode` value of the data;
  if specified and the data file exists, they must match;
  if not specified then the data file must exist
  and the `typecode` will be obtained from its header
* `epoch`: optional `Epoch` specifying the start time and
  step size for the time series data in the file;
  if specified and the data file exists, they must match;
  if not specified then the data file must exist
  and the `epoch` will be obtained from its header
* `fill`: optional default fill values for `pad_to`;
  if unspecified, fill with `0` for `'q'`
  and `float('nan')` for `'d'`

## Class `TimeSeriesFileHeader(cs.binary.SimpleBinary, types.SimpleNamespace, cs.binary.AbstractBinary, cs.binary.BinaryMixin, HasEpochMixin, TimeStepsMixin)`

The binary data structure of the `TimeSeriesFile` file header.

This is 24 bytes long and consists of:
* the 4 byte magic number, `b'csts'`
* the file bigendian marker, a `struct` byte order indicator
  with a value of `b'>'` for big endian data
  or `b'<'` for little endian data
* the datum typecode, `b'd'` for double float
  or `b'q'` for signed 64 bit integer
* the time typecode, `b'd'` for double float
  or `b'q'` for signed 64 bit integer
* a pad byte, value `b'_'`
* the start UNIX time, a double float or signed 64 bit integer
  according to the time typecode and bigendian flag
* the step size, a double float or signed 64 bit integer
  according to the time typecode and bigendian flag

In addition to the header values tnd methods this also presents:
* `datum_type`: a `BinarySingleStruct` for the binary form of a data value
* `time_type`:  a `BinarySingleStruct` for the binary form of a time value

## Class `TimeSeriesMapping(builtins.dict, cs.resources.MultiOpenMixin, cs.context.ContextManagerMixin, HasEpochMixin, TimeStepsMixin)`

A group of named `TimeSeries` instances, indexed by a key.

This is the basis for `TimeSeriesDataDir`.

## Class `TimeSeriesPartitioned(TimeSeries, cs.resources.MultiOpenMixin, cs.context.ContextManagerMixin, HasEpochMixin, TimeStepsMixin, cs.fs.HasFSPath)`

A collection of `TimeSeries` files in a subdirectory.
We have one of these for each `TimeSeriesDataDir` key.

This class manages a collection of files
named by the partition from a `TimespanPolicy`,
which dictates which partition holds the datum for a UNIX time.

*Method `TimeSeriesPartitioned.__init__(self, dirpath: str, typecode: Optional[cs.timeseries.TypeCode] = None, *, epoch: Optional[cs.timeseries.Epoch] = None, policy, fstags: Optional[cs.fstags.FSTags] = None)`*:
Initialise the `TimeSeriesPartitioned` instance.

Parameters:
* `dirpath`: the directory filesystem path,
  known as `.fspath` within the instance
* `typecode`: the `array` type code for the data
* `epoch`: the time series `Epoch`
* `policy`: the partitioning `TimespanPolicy`

The instance requires a reference epoch
because the `policy` start times will almost always
not fall on exact multiples of `epoch.step`.
The reference allows for reliable placement of times
which fall within `epoch.step` of a partition boundary.
For example, if `epoch.start==0` and `epoch.step==6` and a
partition boundary came at `19` due to some calendar based
policy then a time of `20` would fall in the partion left
of the boundary because it belongs to the time slot commencing
at `18`.

If `epoch` or `typecode` are omitted the file's
fstags will be consulted for their values.
The `start` parameter will further fall back to `0`.
This class does not set these tags (that would presume write
access to the parent directory or its `.fstags` file)
when a `TimeSeriesPartitioned` is made by a `TimeSeriesDataDir`
instance it sets these flags.

## Class `TimespanPolicy(icontract._metaclass.DBC, HasEpochMixin, TimeStepsMixin, cs.deco.Promotable)`

A class implementing a policy allocating times to named time spans.

The `TimeSeriesPartitioned` uses these policies
to partition data among multiple `TimeSeries` data files.

Probably the most important methods are:
* `span_for_time`: return a `TimePartition` from a UNIX time
* `span_for_name`: return a `TimePartition` from a partition name

*Method `TimespanPolicy.__init__(self, epoch: cs.timeseries.Epoch)`*:
Initialise the policy.

## Class `TimespanPolicyAnnual(ArrowBasedTimespanPolicy, TimespanPolicy, icontract._metaclass.DBC, HasEpochMixin, TimeStepsMixin, cs.deco.Promotable)`

A annual time policy.
PARTITION_FORMAT = 'YYYY'
ARROW_SHIFT_PARAMS = {'years': 1}

## Class `TimespanPolicyDaily(ArrowBasedTimespanPolicy, TimespanPolicy, icontract._metaclass.DBC, HasEpochMixin, TimeStepsMixin, cs.deco.Promotable)`

A daily time policy.
PARTITION_FORMAT = 'YYYY-MM-DD'
ARROW_SHIFT_PARAMS = {'days': 1}

## Class `TimespanPolicyMonthly(ArrowBasedTimespanPolicy, TimespanPolicy, icontract._metaclass.DBC, HasEpochMixin, TimeStepsMixin, cs.deco.Promotable)`

A monthly time policy.
PARTITION_FORMAT = 'YYYY-MM'
ARROW_SHIFT_PARAMS = {'months': 1}

## Class `TimespanPolicyWeekly(ArrowBasedTimespanPolicy, TimespanPolicy, icontract._metaclass.DBC, HasEpochMixin, TimeStepsMixin, cs.deco.Promotable)`

A weekly time policy.
PARTITION_FORMAT = 'W'
ARROW_SHIFT_PARAMS = {'weeks': 1}

## Class `TimespanPolicyYearly(ArrowBasedTimespanPolicy, TimespanPolicy, icontract._metaclass.DBC, HasEpochMixin, TimeStepsMixin, cs.deco.Promotable)`

A annual time policy.
PARTITION_FORMAT = 'YYYY'
ARROW_SHIFT_PARAMS = {'years': 1}

## Class `TimeStepsMixin`

Methods for an object with `start` and `step` attributes.

## Class `TypeCode(builtins.str, cs.deco.Promotable)`

A valid `array` typecode with convenience methods.

*Method `TypeCode.__new__(cls, t)`*:
Return a new `TypeCode` instance from `t`, which may be:
* a `str`: expected to be an `array` type code
* `int`: `array` type code `q` (signed 64 bit)
* `float`: `array` type code `d` (double float)

## Function `tzfor(tzspec: Union[str, datetime.tzinfo, NoneType] = None) -> datetime.tzinfo`

Promote the timezone specification `tzspec` to a `tzinfo` instance.
If `tzspec` is an instance of `tzinfo` it is returned unchanged.
If `tzspec` is omitted or the string `'local'` this returns
`dateutil.tz.gettz()`, the local system timezone.
Otherwise it returns `dateutil.tz.gettz(tzspec)`.

# Release Log



*Release 20230217*:
* TimeSeriesFile.save_to: use atomic_filename() to create the updated file.
* Other small fixes and updates.

*Release 20220918*:
* TimeSeriesMapping.as_pd_dataframe: rename `keys` to `df_data`, and accept either a time series key or a `(key,series)` tuple.
* TimeSeriesMapping.as_pd_dataframe: default `key_map`: annotate columns with their original CSV headers if present.
* TimeSeriesMapping.plot: rename `keys` to `plot_data` as for `as_pd_dataframe`, add `stacked` and `kind` parameters so that we can derive `kind` from `stacked`.
* as_datetime64s: apply optional utcoffset timeshift.
* Plumb optional pad=False option through data, data2, as_pd_series.
* New PlotSeries namedtuple holding a label, a series and an extra dict as common carrier for data which will get plotted.

*Release 20220805*:
* Rename @plotrange to @timerange since it is not inherently associated with plotting, support both methods and functions.
* print_figure, save_figure and saved_figure now moved to cs.mplutils.
* plot_events: use the utcoffset parameter.
* TimeSeriesBaseCommand.cmd_plot: new --bare option for unadorned plots.

*Release 20220626*:
* New TypeCode(str) representing an array type code with associated properties and methods.
* New TimeSeriesMapping.read_csv wrapper for pandas.read_csv to import a CSV file into a TimeSeriesMapping.
* TimeSeriesFile.save,save_to: open the file for overwrite, not truncate, by default.
* TimeSeriesFile: new setitems(whens,values) method for fast batch updates.
* as_datetime64s: accept optional units parameter to trade off range versus precision.
* @plotrange: accept new optional tz/utcoffset parameters and pass the resulting utcoffset to the wrapped function along with a huge disclaimer about timezones and plots.
* New tzfor(tzspec) to return a tzinfo object from dateutil.tz.gettz, accepts 'local' for the system local default timezone.
* TimeSeriesMapping.as_pd_dataframe: accept optional utcoffset to skew the index for the DataFrame, used for time presentation in plots.
* New TimeSeriesMapping.to_csv(start,stop,f) method to write CSV data between start and stop to a file via DataFrame.to_csv.
* TimeSeriesBaseCommand: new parsetime and poptime methods, cmd_plot: update to expect start-time and optional stop-time.

*Release 20220606*:
Initial PyPI release.
