ResultCollector#
- class fourinsight.engineroom.utils.ResultCollector(headers, handler=None, indexing_mode='auto')[source]#
Collect and store indexed results.
This class provides a simple interface to collect, store, and index intermediate results. The results are stored in a pandas.DataFrame internally. Using a handler, the results can be pushed or *pulled from a remote source.
- Parameters:
headers (dict) – Header names and data types as key/value pairs;
int,float, andstrare allowed as data types. The collector will only accept intermediate results defined here.handler (object) – Handler extended from
BaseHandler. Default handler isNullHandler, which does not provide any push or pull functionality.indexing_mode (str) – Indexing mode. Should be ‘auto’ (default) or ‘timestamp’.
Notes
The data types are casted to Pandas equivalent data types, so that missing values are handled correctly.
- append(dataframe)[source]#
Append rows of dataframe to the results.
Columns of dataframe must be in the headers.
- Parameters:
dataframe (pandas.DataFrame) – The results to append.
- collect(**results)[source]#
Collect and store results under the current index.
- Parameters:
results (keyword arguments) – The results are passed as keyword arguments, where the keyword must be one of the ‘headers’. Provided values must be of correct data type (defined during instantiation).
- property dataframe#
Return a (deep) copy of the internal dataframe
- delete_rows(index)[source]#
Delete rows.
The index will be reset if ‘indexing_mode’ is set to ‘auto’.
- Parameters:
index (single label or list-like) – Index labels to drop.
- new_row(index=None)[source]#
Make a new row.
- Parameters:
index (None or datetime-like) – The new index value. If indexing_mode is set to ‘auto’, index should be
None. If indexing_mode is set to ‘timestamp’, index should be a unique datetime that is passed on topandas.to_datetime().
- pull(raise_on_missing=True, strict=True)[source]#
Pull results from source. Remote source overwrites existing values.
- Parameters:
raise_on_missing (bool) – Raise exception if results can not be pulled from source.
strict (bool) – Whether to be strict with respect to headers in the source or not. Setting strict=True (default) will require that the source has the exact same headers as the ResultCollector. Setting strict=False will allow pulling of partial results (i.e., headers that does not have results in source, will be populated with None values).