BaseDataSource#

class fourinsight.engineroom.utils.BaseDataSource(index_converter, index_sync=False, tolerance=None, cache=None, cache_size=None)[source]#

Abstract class for data sources.

Parameters:

index_converter (obj) – Index converter (see Notes).
index_sync (bool, optional) – If the index should be synced. If True, a valid tolerance must be given.
tolerance – Tolerance limit for syncing. Should be given in anything that index_converter.to_universal_delta() can parse. If index_sync is set to True, datapoints that are closer than the tolerance are merged so that they share a common index. The common index will be the first index of the neighboring datapoints.
cache (str, optional) – Cache folder. If None (default), caching is disabled.
cache_size – Cache size as an index partition (see Notes).

Notes

The index_converter is used to convert index values to a universal type. For datetime-like indices, use a DatetimeIndexConverter. For integer-like indices, use a IntegerIndexConverter. Other index converters can be set up by inheriting from BaseIndexConverter.
Caching will speed-up the data downloading, if the same data is requested multiple times. First time some data is retrieved from the source, it will be split up in ‘chunks’ and stored in a local folder. Then, the data is more readily available next time it is requested.

The cache_size determines how to partition the data in chunks. It describes the size of each cache chunk by providing a index span. The cache_size should be given as a dtype that the index_converter.to_universal_delta() can parse.

get(start, end, refresh_cache=False)[source]#

Get data from source.

Parameters:

start – Start index of the data. Will be passed on to the _get() method.
end – End index of the data. Will be passed on to the _get() method.
refresh_cache (bool, optional) – Refresh cache data.

Returns:

Source data.

Return type:

pandas.DataFrame

iter(start, end, index_mode='start', refresh_cache=False)[source]#

Iterate over source data as (index, data) pairs.

Parameters:

start (array-like) – Sequence of start indexes.
end (array-like) – Sequence of end indexes.
index_mode (str, optional) – How to index/label the data. Must be ‘start’, ‘end’ or ‘mid’. If ‘start’, start is used as index. If ‘end’, end is used as index. If ‘mid’, the index is set to start + (end - start) / 2.0. Then, the start and end objects must be of such type that this operation is possible.

Yields:

index (label) – The index/label.
data (pandas.DataFrame) – The source data.