BaseDataSource#
- class fourinsight.engineroom.utils.BaseDataSource(index_converter, index_sync=False, tolerance=None, cache=None, cache_size=None)[source]#
Abstract class for data sources.
- Parameters:
index_converter (obj) – Index converter (see Notes).
index_sync (bool, optional) – If the index should be synced. If
True, a valid tolerance must be given.tolerance – Tolerance limit for syncing. Should be given in anything that
index_converter.to_universal_delta()can parse. If index_sync is set toTrue, datapoints that are closer than the tolerance are merged so that they share a common index. The common index will be the first index of the neighboring datapoints.cache (str, optional) – Cache folder. If
None(default), caching is disabled.cache_size – Cache size as an index partition (see Notes).
Notes
The index_converter is used to convert index values to a universal type. For datetime-like indices, use a
DatetimeIndexConverter. For integer-like indices, use aIntegerIndexConverter. Other index converters can be set up by inheriting fromBaseIndexConverter.Caching will speed-up the data downloading, if the same data is requested multiple times. First time some data is retrieved from the source, it will be split up in ‘chunks’ and stored in a local folder. Then, the data is more readily available next time it is requested.
The cache_size determines how to partition the data in chunks. It describes the size of each cache chunk by providing a index span. The cache_size should be given as a dtype that the
index_converter.to_universal_delta()can parse.
- get(start, end, refresh_cache=False)[source]#
Get data from source.
- Parameters:
start – Start index of the data. Will be passed on to the
_get()method.end – End index of the data. Will be passed on to the
_get()method.refresh_cache (bool, optional) – Refresh cache data.
- Returns:
Source data.
- Return type:
pandas.DataFrame
- iter(start, end, index_mode='start', refresh_cache=False)[source]#
Iterate over source data as (index, data) pairs.
- Parameters:
start (array-like) – Sequence of start indexes.
end (array-like) – Sequence of end indexes.
index_mode (str, optional) – How to index/label the data. Must be ‘start’, ‘end’ or ‘mid’. If ‘start’, start is used as index. If ‘end’, end is used as index. If ‘mid’, the index is set to
start + (end - start) / 2.0. Then, the start and end objects must be of such type that this operation is possible.
- Yields:
index (label) – The index/label.
data (pandas.DataFrame) – The source data.
See also
fourinsight.engineroom.utils.iter_indexConvenience functions for generating ‘start’ and ‘end’ index lists.
- abstract property labels#
Data source labels.