A formula extension for tshistory, providing computed series.
`doc/readme: use the modern api`
`operators/time-shifted: do not crash when there is no keyword parameter`
`tests/operator: exhibit issue with `time-shifted``

tip
browse log
0.11.0
browse .tar.gz

### clone

https://hg.sr.ht/~pythonian/tshistory_formula
ssh://hg@hg.sr.ht/~pythonian/tshistory_formula

## #Purpose

This tshistory component provides a formula language to build computed series.

Formulas are defined using a simple lisp-like syntax, using a pre-defined function library.

Formulas are read-only series (you can't `update` or `replace` values).

They also have an history, which is built, time stamps wise, using the union of all constituent time stamps, and value wise, by applying the formula.

Because of this the `staircase` operator is available on formulae. Some `staircase` operations can have a very fast implementation if the formula obeys commutativity rules.

## #Formula

### #General Syntax

Formulas are expressed in a lisp-like syntax using `operators`, positional (mandatory) parameters and keyword (optional) parameters.

The general form is:

`(<operator> <param1> ... <paramN> #:<keyword1> <value1> ... #:<keywordN> <valueN>)`

Here are a couple examples:

• `(add (series "wallonie") (series "bruxelles") (series "flandres"))`

Here we see the two fundamental `add` and `series` operators at work.

This would form a new synthetic series out of three base series (which can be either raw series or formulas themselves).

Some notes:

• operator names can contain dashes or arbitrary caracters

• literal values can be: `3` (integer), `5.2` (float), `"hello"` (string) and `#t` or `#f` (true ot false)

### #Pre-defined operators

#### #*

Performs a scalar product on a series.

Example: `(* -1 (series "positive-things"))`

#### #+

Add a constant quantity to a series.

Example: `(+ 42 (series "i-feel-undervalued"))`

#### #/

Perform a scalar division between numbers or a series and a scalar.

Example: `(/ (series "div-me") (/ 3 2))`

Linear combination of two or more series. Takes a variable number of series as input.

Example: `(add (series "wallonie") (series "bruxelles") (series "flandres"))`

To specify the behaviour of the `add` operation in the face of missing data, the series can be built with the `fill` keyword. This option is only really applied when several series are combined. By default, if an input series has missing values for a given time stamp, the resulting series has no value for this timestamp (unless a fill rule is provided).

#### #clip

Set an upper/lower threashold for a series. Takes a series as positional parameter and accepts two optional keywords `min` and `max` which must be numbers (integers or floats).

Example: `(clip (series "must-be-positive") #:min 0)`

#### #date

Produces an utc timestamp from its input string date in iso format.

The `tz` keyword allows to specify an alternate time zone. The `naive` keyword forces production of a naive timestamp. Both `tz` and `naive` keywords are mutually exlcusive.

#### #div

Element wise division of two series.

Example: `(div (series "\$-to-€") (series "€-to-£"))`

#### #min

Computes the row-wise minimum of its input series.

Example: `(min (series "station0") (series "station1") (series "station2"))`

#### #max

Computes the row-wise maximum of its input series.

Example: `(max (series "station0") (series "station1") (series "station2"))`

#### #mul

Element wise multiplication of series. Takes a variable number of series as input.

Example: `(mul (series "banana-spot-price (\$)") (series "\$-to-€" #:fill 'ffill'))`

This might convert a series priced in dollars to a series priced in euros, using a currency exchange rate series with a forward-fill option.

#### #naive

Allow demoting a series from a tz-aware index (strongly recommended) to a tz-naive index (unfortunately sometimes unavoidable for interop with other tz-naive series).

One must provide a country code and a target timezone.

Example: `(naive (series "tz-aware-series-from-poland") "PL" "Europe/Warsaw")`

#### #priority

The priority operator combines its input series as layers. For each timestamp in the union of all series time stamps, the value comes from the first series that provides a value.

Example: `(priority (series "realized") (series "nominated") (series "forecasted"))`

Here `realized` values show up first, and any missing values come from `nominated` first and then only from `forecasted`.

#### #resample

Resamples its input series using `freq` and the aggregation method `method` (as described in the pandas documentation).

Example: `(resample (series "hourly") "D")`

#### #row-mean

This operator computes the row-wise mean of its input series using the series `weight` option if present. The missing points are handled as if the whole series were absent.

Example: `(row-mean (series "station0") (series "station1" #:weight 2) (series "station2"))`

Weights are provided as a keyword to `series`. No weight is interpreted as 1.

#### #series

The `series` operator accepts several keywords:

• `fill` to specify a filling policy to avoid `nans` when the series will be `add`ed with others; accepted values are `"ffill"` (forward-fill), `"bfill"` (backward-fill) or any floating value.

For instance in `(add (series "a" #:fill 0) (series "b")` will make sure that series `a`, if shorter than series `b` will get zeroes instead of nans where `b` provides values.

#### #slice

This allows cutting a series at date points. It takes one positional parameter (the series) and two optional keywords `fromdate` and `todate` which must be strings in the iso8601 format.

Example: `(slice (series "cut-me") #:fromdate "2018-01-01")`

#### #std

Computes the standard deviation over its input series.

Example: `(std (series "station0") (series "station1") (series "station2"))`

#### #timedelta

Takes a timestamp and a number of years, months, weekds, days, hours, minutes (int) and computes a new date according to the asked delta elements.

Example: `(timedelta (date "2020-1-1") #:weeks 1 #:hours 2)`

#### #today

Produces a timezone-aware timestamp as of today

The `tz` keyword allows to specify an alternate time zone. The `naive` keyword forces production of a naive timestamp. Both `tz` and `naive` keywords are mutually exlcusive.

Example: `(today)`

## #Registering new operators

This is a fundamental need. Operators are fixed python functions exposed through a lispy syntax. Applications need a variety of fancy operators.

### #declaring a new operator

One just needs to decorate a python with the `func` decorator:

```from tshistory_formula.registry import func

@func('identity')
def identity(series):
return series
```

The operator will be known to the outer world by the name given to `@func`, not the python function name (which can be arbitrary).

This is enough to get a working transformation operator. However operators built to construct series rather than just transform pre-existing series are more complicated.

### #custom series operator

We start with an example, a `shifted` operator that gets a series with shifted from_value_date/to_value_date boundaries by a constant `delta` amount.

We would use it like this: `(shifted "shiftme" #:days -1)`

As we can see the standard `series` operator won't work there, that is applying a shift operator (`(shift (series "shiftme"))`) after the call to series is too late. The from/to implicit parameters have already been handled by `series` itself and there is nothing left to shift.

Hence `shifted` must be understood as an alternative to `series` itself. Here is a possible implementation:

```from tshistory_formula.registry import func, finder

@func('shifted')
def shifted(__interpreter__, name, days=0):
args = __interpreter__.getargs.copy()
fromdate = args.get('from_value_date')
todate = args.get('to_value_date')
if fromdate:
args['from_value_date'] = fromdate + timedelta(days=days)
if todate:
args['to_value_date'] = todate + timedelta(days=days)

return __interpreter__.get(name, args)

@finder('shifted')
def find_series(cn, tsh, tree):
return {
}
```

As we can see, we use a new `finder` protocol. But first let's examine how the `shiftme` operator is implemented.

First it takes a special `__interpreter__` parameter, which will receive the formula interpreter object, providing access to an important internal API of the evaluation process.

Indeed from the interpreter we can read the `getargs` attribute, which contains a dictionary of the actual query mapping. We are specially interested in the `from_value_date` and `to_value_date` items in our example, but all the parameters of `tshistory.get` are available there.

Once we have shifted the from/to value date parameter we again use the interpreter to make a call to `get` which will in turn perform a call to the underlying `tshistory.get` (which, we don't know in advance, may yield a primary series or another formula computed series).

Implementing the operator this way, we actually miss two important pieces of information:

• the system cannot determine a series is produced by the `shifted` operator like it can with `series`

• and because of this it cannot know the technical metadata of the produced series (e.g. the `tzaware` attribute)

This is where the `finder` protocol and its decorator function comes into play. For `shifted` we define a finder. It is a function that takes the db connection (`cn`), time series protocol handler (`tsh`) and formula syntax tree (`tree`), and must return a mapping from series name to its metadata.

The tree is an obvious Python data structure representing a use of the operator in a formula.

For instance because of the `shifted` python signature, any use will be like that:

• in lisp `... (shifted "shift-me" #:hours +1) ...` (the dots indicate that it can be part of a larger formula)

• tree in python: `['shifted', "shift-me", 'hours', 1]`

The name is always in position 1 in the list. Hence the implementation of the shifted finder:

```return {
}
```

For the metadata we delegate the computation to the underlying series metadata.

We might want to provide an ad-hoc metadata dictionary if we had a proxy operator that would forward the series from an external source:

```@func('proxy')
def proxy(
__interpreter__,
series_uid: str,
default_start: date,
default_end : date) -> pd.Series:
i = __interpreter__
args = i.getargs.copy()
from_value_date = args.get('from_value_date') or default_start
to_value_date = args.get('to_value_date') or default_end

proxy = ProxyClient()
return proxy.get(
series_uid,
from_value_date,
to_value_date,
)

@finder('proxy')
def proxy(cn, tsh, tree):
return {
tree: {
'index_type': 'datetime64[ns]',
'tzaware': False,
'value_type': 'float64'
}
}
```

Here, because we have no other means to know (and the proxy provides some useful documentation), we write the metadata ourselves explicitly.

Also note how accessing the `__interpreter__` again is used to forward the query arguments.

### #Editor Infos

The `tshistory_formula` package provides a custom callback for the `editor` capabilities of tshistory_editor.

A dedicated protocol is available to inform the editor on the way to decompose/display a formula.

Example of such a function:

```from tshistory_formula.registry import editor_info

@editor_info
def operator_with_series(builder, expr):
for subexpr in expr[1:]:
with builder.series_scope(subexpr):
builder.buildinfo_expr(subexpr)
```

The exact ways to use the builder will be provided soon.

## #Series API

A few api calls are added to the `tshistory` base:

• `.register_formula` to define a formula

• `.eval_formula` to evaluate on-the-fly a formula (useful to check that it computes before registering it)

### #register_formula

Exemple:

```tsa.register_formula(
'my-sweet-formula',
'(* 3.14 (series "going-round"))'
)
```

## #eval_formula

Example:

```>>> tsa.eval_formula('(* 3.14 (series "going-round"))')
...
2020-01-01    3.14
2020-01-02    6.28
2020-01-03    9.42
dtype: float64
```

## #Command line

The `tsh` command carries formula specific subcommands. The output below shows only the specific formula subcommands:

```\$ tsh
Usage: tsh [OPTIONS] COMMAND [ARGS]...

Options:
--help  Show this message and exit.

Commands:
ingest-formulas           ingest a csv file of formulas Must be a...
```