You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 29, 2024. It is now read-only.
(Looked around and couldn't find this issue, so I'm posting here)
The current dataframe client allows tags to be specified in its write_points method through the tags keyword argument. However, it seems that the current 'tags' keyword argument applies the tags globally to each datapoint. It would be useful to allow certain columns to be treated as tags in the write_points method through a tag_columns keyword argument. In other words, columns in the tag_columns list would be treated as tags, while all other columns are treated as fields.
One big advantage of this approach is that it makes the dataframe client's read and write methods symmetrical. The dataframe client's query method currently returns dataframes with tags included as columns. Allowing tag columns to be specified in the write_points method would make it easy to copy one database to another by calling the query and write methods in succession.
Fortunately, the fix for this is super easy. All that would need to be done is to make the following changes to _convert_dataframe_to_json:
def _convert_dataframe_to_json(self, dataframe, measurement, tags=None,
tag_columns=[], time_precision=None):
if not isinstance(dataframe, pd.DataFrame):
raise TypeError('Must be DataFrame, but type was: {0}.'
.format(type(dataframe)))
if not (isinstance(dataframe.index, pd.tseries.period.PeriodIndex) or
isinstance(dataframe.index, pd.tseries.index.DatetimeIndex)):
raise TypeError('Must be DataFrame with DatetimeIndex or \
PeriodIndex.')
# Make sure tags and tag columns are correctly typed
tag_columns = tag_columns if tag_columns else []
tags = tags if tags else {}
# Assume field columns are all columns not included in tag columns
rec_columns = list(set(dataframe.columns).difference(set(tag_columns)))
dataframe.index = dataframe.index.to_datetime()
if dataframe.index.tzinfo is None:
dataframe.index = dataframe.index.tz_localize('UTC')
# Convert column to strings
dataframe.columns = dataframe.columns.astype('str')
# Convert dtype for json serialization
dataframe = dataframe.astype('object')
precision_factor = {
"n": 1,
"u": 1e3,
"ms": 1e6,
"s": 1e9,
"m": 1e9 * 60,
"h": 1e9 * 3600,
}.get(time_precision, 1)
points = [
{'measurement': measurement,
'tags': dict(list(tag.items()) + list(tags.items())),
'fields': rec,
'time': int(ts.value / precision_factor)
}
for ts, tag, rec in zip(dataframe.index,
dataframe[tag_columns].to_dict('record'),
dataframe[rec_columns].to_dict('record'))]
return points
This implementation allows columns specified in the tag_columns keyword argument to be treated as tags instead of fields. It also maintains backwards compatibility with the previous tags behavior (applying tags globally to each row of the dataframe).*
That said, the current dataframe write operation is probably not as efficient as it could be: the dataframe is first converted to a json-like dict, and then written to line protocol. Converting the dataframe to line protocol directly would avoid a lot of overhead associated with iterating through dicts. I'll open a separate issue on this.
The text was updated successfully, but these errors were encountered:
it seems if tag_columns is not specified it would failed:
File "C:\Python27\lib\site-packages\influxdb_dataframe_client.py", line 125, in write_points
field_columns=field_columns)
File "C:\Python27\lib\site-packages\influxdb_dataframe_client.py", line 220, in _convert_dataframe_to_json
dataframe[field_columns].to_dict('record'))
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1958, in getitem
return self._getitem_array(key)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2002, in _getitem_array
indexer = self.loc._convert_to_indexer(key, axis=1)
File "C:\Python27\lib\site-packages\pandas\core\indexing.py", line 1231, in _convert_to_indexer
raise KeyError('%s not in index' % objarr[mask])
KeyError: '[0] not in index'
(Looked around and couldn't find this issue, so I'm posting here)
The current dataframe client allows tags to be specified in its
write_points
method through thetags
keyword argument. However, it seems that the current 'tags' keyword argument applies the tags globally to each datapoint. It would be useful to allow certain columns to be treated as tags in thewrite_points
method through atag_columns
keyword argument. In other words, columns in thetag_columns
list would be treated as tags, while all other columns are treated as fields.One big advantage of this approach is that it makes the dataframe client's read and write methods symmetrical. The dataframe client's query method currently returns dataframes with tags included as columns. Allowing tag columns to be specified in the
write_points
method would make it easy to copy one database to another by calling the query and write methods in succession.Fortunately, the fix for this is super easy. All that would need to be done is to make the following changes to
_convert_dataframe_to_json
:This implementation allows columns specified in the
tag_columns
keyword argument to be treated as tags instead of fields. It also maintains backwards compatibility with the previous tags behavior (applyingtags
globally to each row of the dataframe).*Here's an example usage:
I can start a PR if anyone is interested.
The text was updated successfully, but these errors were encountered: