Signals to do:

* Decide on "sample" vs. "value" vs. "amplitude". I'm leaning against
  "amplitude" as too domain-specific. We often speak of sample types,
  values, and times in various contexts. I think I prefer "value axis"
  to "sample axis", and "value range", "min value", and "max value"
  to the corresponding "sample" terms. We speak of "real-valued" and
  "complex-valued" signals but not "real-sampled" and "complex-sampled"
  ones. We use the terms "sample rate" and "sample period" but not
  "value rate" and "value period". A "sample" has a "value", but not
  vice versa. A "sample" has various attributes, including a "value",
  a "time", a "channel", a "sample frame", and a "signal". A "value"
  does not have those things. A "sample" is more than just a number,
  but a "value" is not. "Value" is short for "sample value". Tentative
  conclusion: use the terms "value axis", "value calibration",
  "value range", "min value", and "max value", in which "value" is
  understood to be short for "sample value".
  
  Okay, so if we have a "value axis", why do we have "sample frames"
  and "sample arrays" instead of "value frames" and "value arrays"?
  I think the answer is that it's a matter of focus. The value axis
  relates specifically to sample values but not to sample times.
  Sample frames and sample arrays are not so specifically concerned
  with sample values: they also have times associated with them, for
  example. The time associated with a sample frame or sample array
  is the same time associated with each of its samples.
  
* Consider providing at least some time axis functionality on signals
  and channels, for example so that one can say `signal.sample_rate`
  in addition to `signal.time_axis.sample_rate`. Consider also
  whether it would make sense to do something similar for the value
  axis. [I'm inclined to hold off on this until I've gotten some
  experience using these classes.]
  
* Reconsider the class name `IndexedAxis`. Would `DiscreteAxis` make
  more sense, or perhaps `IndexAxis`? What distinguishes this axis
  from `Axis`? Is it really a subclass of `Axis`, i.e. a specialization
  of `Axis`? How do sample value ranges relate to this? An `IndexedAxis`
  has a length N and in one of the coordinate systems that it maps
  between values are often (but not necessarily) integers in the range
  [0, N - 1]. It seems to me that this does relate to sample value
  ranges. Sample values for many signals are integers in some range,
  though the range usually does not start at zero. Perhaps it would
  make sense to redesign the Axis class hierarchy so its members
  accurately capture the semantics of time axes, sample array axes,
  and sample value axes. Perhaps one of the coordinate systems of
  `DiscreteAxis` would be discrete with arbitrary integer limits
  and be useful for sample values, and the the lower discrete
  limit of a subclass `IndexAxis` of `DiscreteAxis` would be zero.
  But perhaps it isn't important that sample values are often
  discrete, with integer nominal limits.
  
  Axes can have names, units, and limits. We store integer signal
  frame counts and sample array dimensions as axis lengths. We use
  axes to map between different coordinate systems, including more
  than two for the time axis. Is this too much?
  
  Perhaps we should explicitly represent coordinate systems, and
  associate units with coordinate systems rather than axes.
  

* Thoughts about axes after thinking about several specific kinds of
  signals:
  
  1. It might not make sense for the top level `Signal` class to have
     sample array and sample value axes. Needs regarding these axes vary
     sufficiently over the class of signals we want to work with that any
     accurate generalization would be trivial (like maybe just `Named`).
     I think it probably will make sense for various `Signal` subclasses
     (like, say, `Tfr`) to offer additional axes.
     
     Some specific types of signals that I found it helpful to think about:
     
     * Waveforms.
     * Spectrograms.
     * More general TFRs with either linear or nonlinear frequency axes.
     * A signal obtained from another signal by computing one or more
       time-varying feature values with a hopped window. A TFR is one
       specific example of this, as is a time-varying detector score.
       There are many other examples. The sample arrays of the resulting
       signal might have any number of dimensions. Note that the units
       of the sample array values might vary within a dimension, for
       example if you compute a vector of values of disparate features,
       so that there is no single kind of unit for that dimension. The
       value ranges of different features might also differ.
     * Videos. Video sample array axes often have names, like "X" and "Y"
       or "Horizontal" and "Vertical", but (as with many sample value axes)
       we might not be able to relate these exactly to real world values
       like horizontal and vertical angles.
     
  2. It does make sense for `Signal` to offer time axis functionality.
     I'm inclined to keep this functionality inside a `time_axis` property
     in anticipation of having additional axes in subclasses. That makes
     the time axis functionality more cumbersome to access  (e.g.
     `signal.time_axis.frame_rate` instead of `signal.frame_rate`, but
     I think that dumping it all into the top level would be confusing.
     In any case, I'd like to start with things this way and see how it
     feels.


TimeAxis:

Linear or quasilinear time axis. Axis values are numbers of seconds elapsed
from time at index zero. A `TimeAxis` may be *datetime calibrated*, in which
case the date and time at each index is known.

a.name                     # 'Time'

a.length                   # signal length in sample frames

a.start_index
a.end_index

a.frame_rate               # signal frame rate, in hertz
a.frame_period             # signal frame period, in seconds

a.index_to_time_map
a.index_to_time(i)         # `i` in [0, length], scalar or array, int or float
a.time_to_index(t)         # `t` can be scalar or array. Result is float
                           # Maybe offer rounded int result as an option?
                           
a.start_time
a.end_time
      
a.get_span(i, j)           # `index_to_time[j] - index_to_time[i]`
a.span                     # `get_span(0, length - 1)`, or None if length zero
a.duration                 # `get_span(0, length)`, or zero if length zero

a.index_to_datetime_map
a.index_to_datetime(f)     # `f` can be scalar or array, int or float.
a.datetime_to_index(t)     # `t` can be scalar or array. Result is float.

a.start_datetime           # `datetime` at start index, `None` if unknown
a.end_datetime             # `datetime` at end index, `None` if unknown


2020-04-16 Notes:

Signal
------

[Add intro that motivates the signal abstraction. Why not just use
NumPy arrays?]

[Define *sample frame time*, *sample array time*, and *sample time*.]

[Discuss use of the terms *sample*, *sample time*, and *sample value*.]

[Describe time calibration.]

[Describe value calibration.]

A Vesper *signal* is an D-dimensional array of numeric *samples*, along
with some associated metadata. D is always at least two, with one of the
first two dimensions time (since Vesper is designed primarily for use
with audio and video data) and the other the signal *channel*.

The above definition leaves room for two distinct but useful
interpretations. The two interpretations differ according to whether the
time dimension or the channel dimension of a signal is thought of as
coming first.

In one interpretation, called the *frame interpretation*, the time
dimension of a signal precedes the channel dimension, and the signal is
thought of as a temporal sequence of *sample frames*. Each sample frame
comprises M *sample arrays* of dimension D - 2, where M is the number
of signal channels.

In the other interpretation, called the *channel interpretation*, the
channel dimension of a signal precedes the time dimension, and a
signal is thought of as a sequence of channels. Each channel is a
temporal sequence of N sample arrays of dimension D - 2, where N is the
number of signal sample frames.

In the two interpretations of a signal, the sample arrays of the signal
are the same, but they are grouped differently. In the frame
interpretation, they are grouped first by channel into frames and
then by time. In the channel interpretation, they are grouped first
by time into channels and then by channel.

Discussions of signals naturally make use of terms corresponding to
both signal interpretations, and we have tried to choose terms that
will help clarify which interpretation is in use at any given point
in a discussion. More generally, we have tried to choose terms that
respect those used in the signal processing literature, but also
the differences in the terminological needs of that literature and
signal processing software.

Many signals, such as monaural audio signals, have just one channel,
while others, such as stereo audio signals, have two or more channels.
Every Vesper signal has channels, however, even if the number of
channels is only one.

All of the samples of a Vesper signal are of the same numeric
*sample type*. A signal's sample type can be any of various sizes of
integer, floating point, or complex number types.

The temporal spacing between two consecutive sample frames of a signal
is constant (ideally, at least), and is called the *sample period* or
the *frame period* of the signal, whichever makes more sense in a
particular context. The inverse of this spacing is called the
*sample rate* or the *frame rate*.

The number of dimensions in the sample arrays of a signal determine
different categories of signals, some of which are common enough to
merit their own names. A signal with scalar (i.e. zero-dimensional)
sample arrays is called a *waveform*. The most common type of waveform
is an audio waveform, which is also called just an *audio* (we use
the word "audio" as both a countable noun and an uncountable one, by
analogy with the accepted use of the word "video"). In an audio, each
sample indicates the degree of disturbance to an acoustic medium.
There are non-audio waveforms as well, such as ones whose samples are
time-averaged powers or event detector scores.

A signal with vector (i.e. one-dimensional) sample arrays is called
a *gram*. The most common type of gram is a *spectrogram*, in which
each sample vector is a spectrum computed from a vector of waveform
samples. Other types of grams can be computed using other
time-frequency representations. Still another type of gram is called
a *beamogram*. A beamogram is computed from a multichannel audio
recorded from a microphone array. In the beamogram, the sequence of
samples at a particular sample vector index indicate the acoustic
power arriving from a particular direction at the microphone array.
Different sample vector indices correspond to different directions
of arrival.

A signal with two-dimensional sample arrays is called a *video*. The
most common type of video is just a sequence of images recorded by a
video camera. Another type of video might be computed by a locator
algorithm that processes a multichannel audio recorded from a
microphone array. Each sample array in such a video might attribute
acoustic power arriving at the microphone array to locations on a
two-dimensional spatial grid.


2019-11-12 Notes:

AudioFileInfo:

    format_description
    path
    length
    num_channels
    sample_rate
    sample_size
    
    
AudioFileReader(AudioFileInfo):

    static methods
    --------------
    is_recognized_file(path)
    create_reader(path)
    
    instance methods
    ----------------
    read(length=None, start_index=None, out=None)
    close()

* I think it might be best to rename `Signal` to `Channel` and
`MultichannelSignal` to `Signal`, so that a signal always has one
or more channels. This is inconsistent with how the term "signal" is
usually used in the field of signal processing (what is usually
called a "signal" there I propose calling a "channel", while
what I propose calling a "signal" is usually called a "signal array"
or something like that), but I believe it better represents the
reality of actual computational signal processing, which frequently
involves signals with multiple channels, e.g. stereo audio signals.

* I think it would be best to prefer the term "sample" to "sample value"
or "value". So, for example, we should prefer to talk about the
"sample range" of a signal rather than its "value range". I think I
would like to retain the term "amplitude axis", however, rather than
using "sample axis", since it is primarily for what is better called
"amplitude calibration" than "sample calibration".

* I think it would be best not to use the terms "array signal" and
"array channel", since "array" in those terms would refer to something
different (a NumPy array containing all of the samples of a signal)
than "array" in terms like "sample array" and "array axis" (an array
of samples at a particular time). Maybe "RAM signal" and
"RAM channel" instead?

* A signal has an optional sample range (minSample, maxSample) that
indicates the range in which its samples may be expected to fall.
The range may not be tight, and in some cases samples may even lie
outside of it: the range provides a guide to, but not necessarily a
guarantee of, the range of samples of the signal.

* The sample range for an audio file signal is determined by the
file's sample type. For example, the sample range for an audio file
with 16-bit, two's complement samples is (-32768, 32767).

* A signal with a sample range can be wrapped using the `ScaledSignal`
class to create a scaled version of that signal. The scaled signal has
all of the same properties as the wrapped signal except for its sample
range and perhaps dtype. The `ScaledSignal` scales samples on the fly
as they are read. For example, in:

    signal = AudioFileSignal('test.wav')
    scaled_signal = ScaledSignal(
        signal, sample_range=(-1, 1), dtype='float32'))
        
`signal` has a sample range determined by the format of the audio file
*test.wav*, while `scaled_signal` is a scaled version of `signal` with
a sample range of [-1, 1] and a float32 dtype.

* Since it will be very common to want to normalize the samples of
audio file signals, it might be a good idea to build sample scaling
into `AudioFileSignal` and `AudioFileWriter`. For example, instead
of the above we might write just:

    scaled_signal = AudioFileSignal(
        'test.wav', sample_range=(-1, 1), dtype='float32')
        
Scaling might happen by default when writing a signal to an audio file
(or an audio file sequence), according to the sample ranges of the signal
and the audio file format. If so it should also be possible to disable
the scaling, say via a `sample_scaling_enabled` keyword argument that is
`True` by default.

Something similar would presumably also be desirable for real-time
audio I/O.

* Signal sample types we will support and the corresponding sample ranges
are:
    8-bit unsigned integer: (0, 255)
    8-bit integer: (-128, 127)
    16-bit integer: (-32768, 32767)
    24-bit integer: (-(2**23), 2**23 - 1)
    32-bit float: (-1, 1) when read from an audio file with normalization
    64-bit float: (-1, 1) when read from an audio file with normalization
    
General points about signals:

* One important design goal is for `Signal` and `Channel` instances to
behave as much as possible like NumPy arrays. This will allow users
already familiar with NumPy arrays to use `Signal` and `Channel` more
easily. So, for example, `Signal` and `Channel` instances will have
shapes and lengths just like NumPy arrays, and indexing them will
behave just like indexing a NumPy array of the same shape.

* With the preceding point in mind, `Signal` and `Channel` instances
will support negative array indices, with such an index indexing an
instance relative to its end. We will *not* allow the use of negative
indices to index a `Signal` or `Channel` left of the origin. Such
indexing is common in the field of signal processing, but Vesper
signal index ranges will all start at zero.

* I'm a little uncomfortable with the idea that the length of a
signal is the number of its channels, while the length of a channel
is the number of its sample arrays. I think of signals both as
sequences of sample frames and as sets of channels. But the proposed
length semantics *are* consistent with the idea of making signals and
channels behave as much like NumPy arrays as possible. Perhaps `Signal`
and `Channel` should both have `num_frames` properties, and `Signal`
should have a `num_channels` property, and their use should be
encouraged for clarity.

* How might we support multichannel signals where the channels are
recorded by different instruments, perhaps with different start times
and sample rates?

* Can we support signals whose sample rates differ a little from their
nominal sample rates, and whose sample rates vary a little, for example
with temperature?


Signal

* Sequence of *sample frames*.
* Indexed like a NumPy array with frame, channel, and zero or more
  sample array indices, in that order.
* Often useful to think in terms of channels rather than sample frames,
  with each channel a sequence of sample arrays.
* `channel_first` property is indexed like a `Signal` object but with
  the first two indices swapped, so that the channel index precedes the
  frame index. That is, it is indexed like a NumPy array with channel,
  frame, and zero or more sample array indices, in that order.
* `channels` property is a sequence of `Channel` objects that can be indexed to
  retrieve sample arrays of individual channels. A `Channel` object is
  indexed like a `Signal` object except without a channel index. That
  is, it is indexed like a NumPy array with a frame index followed by
  zero or more sample array indices.
* Implement only immutable signals initially, then perhaps extensible
  and editable signals.

s.name

s.time_axis                  # `TimeAxis`

s.frame_count                # number of signal sample frames
len(s)                       # same as `frame_count`
s.channel_count              # number of signal channels
s.array_shape                # sample array shape
s.dtype                      # NumPy sample `dtype`

s[:]                         # all samples of all sample frames
s[10]                        # all samples of sample frame 10
s[10:20]                     # all samples of sample frames 10 through 19

s.channel_first              # indexed channel first to yield NumPy arrays

s.channels                   # `NamedSequence` of `Channel` objects


s.channel_first[:]           # all samples of all channels
s.channel_first[0]           # all samples of channel 0
s.channel_first[0, 10]       # sample array 10 of channel 0
s.channel_first[0, 10:20]    # sample arrays 10 through 19 of channel 0
s.channel_first[:, 10]       # sample frame 10
s.channel_first[:, 10:20]    # sample frames 10 through 19

len(s.channel_first)         # same as `channel_count`


for channel in audio.channels:
    samples = channel[...]
    
cf = audio.channel_first


audio = AudioFileUtils.openAudioFileSequence(file_paths)
spectrogram = Spectrogram(audio, spectrogram_settings)
graph = _create_averaging_graph(spectrogram, averaging_settings)
graph.run()

 

Channel

* Sequence of n-dimensional sample arrays.
* All sample arrays of a channel have the same dimensions.
* Indexing yields NumPy array of samples.
* First index is for time axis.
* Other indices are for sample array axes.

s.name                     # `str` or `None`

s.signal                   # `Signal` or `None`

s.time_axis                # `TimeAxis`
s.array_axes               # `NamedSequence` of `ArrayAxis` objects
s.amplitude_axis              # `SampleAxis`
s.axes                     # mapping from axis name to axis object

s.dtype                    # NumPy sample `dtype`

s.shape                    # signal shape, an integer tuple
len(s)                     # time axis length

s[:]                       # all signal samples
s[0]                       # sample array 0
s[1]                       # sample array 1
s[-1]                      # final sample array
s[0:10]                    # sample arrays 0 through 9
s[0, 10:20]                # part of sample array 0
s[0, 10:-20]               # part of sample array 0


Axis

a.name                     # e.g. 'Time', 'Frequency'
a.units                    # `Bunch` with `plural`, `singular`, and
                           # `abbreviation` attributes


IndexedAxis(Axis)

The range of valid indices for an `IndexedAxis` is [0, length - 1].

a.length                   # axis length in indices


TimeAxis(IndexedAxis)

Axis values have units of seconds. A `TimeAxis` may be *date/time calibrated*,
in which case the date and time of each sample is known. Note that we must
be careful when computing spans that we do not assume that the index to
seconds mapping is linear!

a.frame_rate               # sample frame rate, in hertz
a.frame_period             # sample frame period, in seconds

a.sample_rate              # same as `frame_rate`
a.sample_period            # same as `frame_period`

a.index_to_seconds_mapping
a.index_to_seconds(i)      # `i` in [0, length], scalar or array, int or float
a.seconds_to_index(t)      # `t` can be scalar or array. Result is float
                           # Maybe offer rounded int result as an option?
a.get_span(i, j)           # `index_to_seconds[j] - index_to_seconds[i]`
a.span                     # `get_span(0, length - 1)`, or zero if length zero
a.duration                 # `get_span(0, length)`

a.index_to_datetime_mapping
a.index_to_datetime(i)     # `i` can be scalar or array, int or float.
a.datetime_to_index(dt)    # `dt` can be scalar or array. Result is float.

a.start_datetime           # `datetime` at start index, `None` if unknown
a.end_datetime             # `datetime` at end index, `None` if unknown


ArrayAxis(IndexedAxis)

a.index_to_value_mapping
a.index_to_value(i)        # `i` can be scalar or array, int or float
a.value_to_index(v)        # `v` can b scalar or array. Result is float.

a.start_value              # value at index zero
a.end_value                # value at index length - 1, `None` if length zero
a.span                     # end value less start value, `None` if length zero


ValueAxis(Axis)

For now, `ValueAxis` is just a subclass of `Axis` with a specialized
initializer and no other methods. It may eventually provide calibration
functionality. Note that different channels will need to have different
calibrations.


* How do we support mutable signals? The two types of mutable signals
  I can think of are *extensible* signals and *editable* signals. The
  available extent of an extensible signal can change, but the sample
  values (i.e. the values of the samples of any particular sample array)
  of the signal never change. Both the available extent and the sample
  values of an editable signal can change via cut and paste operations.
  [Why have two classes of mutable signals? Why not have just editable
  signals (though we might just call them mutable), which subsume
  extensible signals?]
  
  Mutability complicates signal processing. I have in mind implementing
  signal processing lazily inside various `Signal` subclasses. For
  example, I would like to implement a `Spectrogram` subclass, an
  instance of which computes a spectrogram of a wrapped waveform.
  To support mutability, I have in mind making `Signal` instances
  observable. Then a `Signal` that wraps another signal can observe
  that signal, so that whenever the latter changes the former can
  notify its observers of how it, in turn, has changed. The
  notifications would include an indication of where along the time
  axis the changes occurred.
  
  [Some signal processors might have two or more output signals.
  For example, a spectrograph might offer complex, magnitude, and
  phase output signals. In this case the spectrograph would not
  itself be a `Signal` or a `MultichannelSignal`, but would be
  some other type of object (a `SignalProcessor`, perhaps) that
  would offer several output signals, and that might tailor its
  computation to which of those outputs are being observed. If
  only the magnitude output of a spectrograph is being observed,
  for example, it might use the SciPy `spectrogram` function
  differently (in particular, to compute only the spectrogram
  magnitude) than it would if its complex output was being
  observed.
  
  We will also have to concern ourselves with thread synchronization
  for mutable signals. We might just use locks for this, perhaps in
  conjunction with Python's `with` statement.
  
  It seems to me that it might be good for the signal classes to
  explicitly acknowledge the difference between real-time signals,
  for which change is steady and predictable and places certain
  constraints on processing, and editable signals, for which change
  is less predictable but doesn't constrain processing as heavily.
  The two cases are really quite different.

  
Audio File Utils
  
u.get_audio_file_type(file_path)
u.create_audio_file_ram_signal(file_)
u.create_audio_file_signal(file_)
  