Skip to content
This repository was archived by the owner on Oct 29, 2024. It is now read-only.

Filter by tags appears to be broken? #251

Closed
drmclean opened this issue Oct 7, 2015 · 9 comments
Closed

Filter by tags appears to be broken? #251

drmclean opened this issue Oct 7, 2015 · 9 comments
Labels

Comments

@drmclean
Copy link

drmclean commented Oct 7, 2015

I've written some data points (listed at the bottom) to my instance in a measurement called 'abc':

>>> indb.write_points(data, tags = {"imei" : "test_imei"}, time_precision='ms')
True

I can get the data-points out:

>>> rs = indb.query("select * from abc")

And print them:

>>> print list(rs.get_points())
[
        {u'imei': u'test_imei', u'value': 1111, u'time': u'2015-10-06T19:50:29.007Z'}, 
        {u'imei': u'test_imei', u'value': 2222, u'time': u'2015-10-06T19:50:29.008Z'}, 
        {u'imei': u'test_imei', u'value': 3333, u'time': u'2015-10-06T19:50:29.009Z'}, 
        {u'imei': u'test_imei', u'value': 4444, u'time': u'2015-10-06T19:50:29.01Z'}
]

When I specify the tag value the result is empty:

>>> print list(rs.get_points(tags={"imei" : "test_imei"}))
[]

Data Points:

time = 1444161029007  
data = [
    { "measurement" : "abc",  "time": time, "fields" : {"value" : 1111} },
    { "measurement" : "abc",  "time": time+1, "fields" : {"value" : 2222} },
    { "measurement" : "abc",  "time": time+2, "fields" : {"value" : 3333} },
    { "measurement" : "abc",  "time": time+3, "fields" : {"value" : 4444} }
]

>>> indb.write_points(data, tags = {"imei" : "test_imei"}, time_precision='ms')
True
@drmclean
Copy link
Author

drmclean commented Oct 7, 2015

Also related, the whole handling of tags doesn't appear to work correctly:

I have a measurement called 'current'.
I can find the tags inside current using:

>>> rs = indb.query("SHOW TAG KEYS FROM current")
>>> print list(rs.get_points())
[{u'tagKey': u'imei'}]

This correctly shows one tag called "imei".

I can find the tag values inside current using:

>>> rs = indb.query("SHOW TAG VALUES FROM current WITH KEY = imei")
>>> print list(rs.get_points())
[{u'imei': u'imei_001'}, {u'imei': u'imei_002'}, {u'imei': u'imei_003'}, {u'imei': u'imei_004'}]

Clearly showing a working tag called "imei" and 4 different tagValues.
I can get only the results for imei = imei_004 using a direct query:

>>> rs = indb.query("select * from current where imei='imei_004'")
>>> print rs.raw
[{u'imei': u'imei_004', u'value': 4444, u'time': u'2015-10-07T13:34:52.769Z'}, {u'imei': u'imei_004', u'value': 4444, u'time': u'2015-10-07T13:35:56.46Z'}]

But if I use the rs.keys() option is shows no tags:

>>> rs = indb.query("select * from current")
>>> print list(rs.keys()), '\r\n!
[(u'current', None)] 

According to the docs rs.keys() ought to return a tuple with (serie_name, tags) but it seems to think that there are no tags despite happily querying tags from the db.

@drmclean drmclean closed this as completed Oct 7, 2015
@drmclean drmclean reopened this Oct 7, 2015
@aviau
Copy link
Collaborator

aviau commented Oct 13, 2015

On 13/10/15 10:37 AM, David McLean wrote:

Hi Alexandre,

Hello David,

If I do no include the group by then no tags are present in the response
and therefore filtering inside get_points doesn't work. Is this intended
behaviour?

Absolutely, and this has nothing to do with influxbd-python. The InfluxDB server does not return tags if you don't ask it to.

However, this can be misguiding because there was a recent change to the InfluxDB API:

The implicit GROUP BY * that was added to every SELECT * has been
removed. Instead any tags in the data are now part of the columns in
the returned query.

The "tags" that you see in #251 are actually columns!

The fact that tags are now part of the columns makes if impossible for us to differentiate tags and columns :(!

My changes don't work when "group by" includes the tag values so I'm not
sure that my contribution is working correctly?

It should work when using "group by", I cannot break that.

What I am getting out of this is that there are no bugs in influxdb-python. To get things working like you want, all you have to do is request the right tags by using the "GROUP BY" keyword.

So, there are two types of tags:

  • top-level tags that are returned when using group-by. Filtering works with these.
  • column-level tags. Filtering does not work with these, as they are new.

However, it looks like one would want to filter by the tags that are now included in the columns. What do you think?

Once again, I think that this discussion should be on GitHub so that everyone can see, I will post this here:

thank you for your work David

Best regards,

Alexandre Viau
alexandre@alexandreviau.net

@drmclean
Copy link
Author

Hi Alex,

The above makes sense although it should be noted that the "bug" is now just that the documentation is no longer accurate. In the current docs the following:

rs = cli.query("SELECT * from cpu")
cpu_influxdb_com_points = list(rs.get_points(tags={"host_name": "influxdb.com"}))

Is suggested as working code but due to the API changes it runs without error but no longer filters correctly, something which is a bit confusing for the first-time user!

I think its possible to implement filtering by tags included in the columns but I'm not sure how to do so in a way which maintains the previously functionality on group by queries but doesn't break other parts of the library. I'll remove my pull request as it neither solves the problems correctly nor passes the build!

@aviau aviau added the bug label Oct 13, 2015
@aviau
Copy link
Collaborator

aviau commented Oct 13, 2015

the documentation is no longer accurate

You are right! That is a bug :)

I think its possible to implement filtering by tags included in the columns but I'm not sure how to do so in a way which maintains the previously functionality on group by queries but doesn't break other parts of the library

I'll think about this, we should use this issue to discuss how to do that.

@3fr61n
Copy link

3fr61n commented Mar 2, 2016

We had the same problem :(

With NO tags (works)

list(latest_datapoints.get_points(measurement = 're.memory.rpd-CPU'))
[{u'value_str': None, u'time': u'2016-03-02T11:44:23.995058481Z', u'product-model': u'mx480', u'value': 0, u'version': u'20150617.306001_builder_stable_10', u'key': None, u'delta': 0, u'device': u'PE4-tf-mx480-3-re0', u'kpi': u're.memory.rpd-CPU', u'delta_str': None}]

With EMPTY tags (works)

list(latest_datapoints.get_points(measurement = 're.memory.rpd-CPU',tags={}))
[{u'value_str': None, u'time': u'2016-03-02T11:44:23.995058481Z', u'product-model': u'mx480', u'value': 0, u'version': u'20150617.306001_builder_stable_10', u'key': None, u'delta': 0, u'device': u'PE4-tf-mx480-3-re0', u'kpi': u're.memory.rpd-CPU', u'delta_str': None}]

With ANY specific tags (does NOT works)

list(latest_datapoints.get_points(measurement = 're.memory.rpd-CPU', tags={'device': 'PE4-tf-mx480-3-re0'}))
[]
list(latest_datapoints.get_points(tags={"kpi": "route-table.summary.destinations"}))
[]

etc

@anoopkhandelwal
Copy link

Hi,
I am also facing the same issue.I need to filter out data by using 3-4 tags.
Any other alternative solution/wrapper function which we can use to achieve our objective?

@anoopkhandelwal
Copy link

Hi,
I wrote a wrapper function -
def filter_fun(data, key, allowed):
return filter(lambda x: key in x and x[key] in allowed, data)

def filter_data(data, fitered_tags):
response_list = data
for tag_key, tag_value in fitered_tags.iteritems():
response_list = filter_fun(response_list, tag_key, tag_value)
return response_list

now all we need to filter is to pass the data_list into filter_data function and pass tags(dict) so that it will filter all the data and return you list of dict elements.
e.g
data_list = filter_data(data_list, fitered_tags={'key_1': value1, 'key_2': 'value2'})
Since the function get_points also filtered after getting the data i.e. it is not executed on the query level,so performance wise it is same as get_points function.
Let me know,if we can make it more correct.

@TwitchChen
Copy link

this bug has been resolved now?

@xginn8
Copy link
Collaborator

xginn8 commented Nov 25, 2017

This bug should be fixed in the latest release -- if not, we can revisit.

@xginn8 xginn8 closed this as completed Nov 25, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants