Skip to content
This repository was archived by the owner on Oct 29, 2024. It is now read-only.
This repository was archived by the owner on Oct 29, 2024. It is now read-only.

Filter by tags appears to be broken? #251

Closed
@drmclean

Description

@drmclean

I've written some data points (listed at the bottom) to my instance in a measurement called 'abc':

>>> indb.write_points(data, tags = {"imei" : "test_imei"}, time_precision='ms')
True

I can get the data-points out:

>>> rs = indb.query("select * from abc")

And print them:

>>> print list(rs.get_points())
[
        {u'imei': u'test_imei', u'value': 1111, u'time': u'2015-10-06T19:50:29.007Z'}, 
        {u'imei': u'test_imei', u'value': 2222, u'time': u'2015-10-06T19:50:29.008Z'}, 
        {u'imei': u'test_imei', u'value': 3333, u'time': u'2015-10-06T19:50:29.009Z'}, 
        {u'imei': u'test_imei', u'value': 4444, u'time': u'2015-10-06T19:50:29.01Z'}
]

When I specify the tag value the result is empty:

>>> print list(rs.get_points(tags={"imei" : "test_imei"}))
[]

Data Points:

time = 1444161029007  
data = [
    { "measurement" : "abc",  "time": time, "fields" : {"value" : 1111} },
    { "measurement" : "abc",  "time": time+1, "fields" : {"value" : 2222} },
    { "measurement" : "abc",  "time": time+2, "fields" : {"value" : 3333} },
    { "measurement" : "abc",  "time": time+3, "fields" : {"value" : 4444} }
]

>>> indb.write_points(data, tags = {"imei" : "test_imei"}, time_precision='ms')
True

Activity

drmclean

drmclean commented on Oct 7, 2015

@drmclean
Author

Also related, the whole handling of tags doesn't appear to work correctly:

I have a measurement called 'current'.
I can find the tags inside current using:

>>> rs = indb.query("SHOW TAG KEYS FROM current")
>>> print list(rs.get_points())
[{u'tagKey': u'imei'}]

This correctly shows one tag called "imei".

I can find the tag values inside current using:

>>> rs = indb.query("SHOW TAG VALUES FROM current WITH KEY = imei")
>>> print list(rs.get_points())
[{u'imei': u'imei_001'}, {u'imei': u'imei_002'}, {u'imei': u'imei_003'}, {u'imei': u'imei_004'}]

Clearly showing a working tag called "imei" and 4 different tagValues.
I can get only the results for imei = imei_004 using a direct query:

>>> rs = indb.query("select * from current where imei='imei_004'")
>>> print rs.raw
[{u'imei': u'imei_004', u'value': 4444, u'time': u'2015-10-07T13:34:52.769Z'}, {u'imei': u'imei_004', u'value': 4444, u'time': u'2015-10-07T13:35:56.46Z'}]

But if I use the rs.keys() option is shows no tags:

>>> rs = indb.query("select * from current")
>>> print list(rs.keys()), '\r\n!
[(u'current', None)] 

According to the docs rs.keys() ought to return a tuple with (serie_name, tags) but it seems to think that there are no tags despite happily querying tags from the db.

aviau

aviau commented on Oct 13, 2015

@aviau
Collaborator

On 13/10/15 10:37 AM, David McLean wrote:

Hi Alexandre,

Hello David,

If I do no include the group by then no tags are present in the response
and therefore filtering inside get_points doesn't work. Is this intended
behaviour?

Absolutely, and this has nothing to do with influxbd-python. The InfluxDB server does not return tags if you don't ask it to.

However, this can be misguiding because there was a recent change to the InfluxDB API:

The implicit GROUP BY * that was added to every SELECT * has been
removed. Instead any tags in the data are now part of the columns in
the returned query.

The "tags" that you see in #251 are actually columns!

The fact that tags are now part of the columns makes if impossible for us to differentiate tags and columns :(!

My changes don't work when "group by" includes the tag values so I'm not
sure that my contribution is working correctly?

It should work when using "group by", I cannot break that.

What I am getting out of this is that there are no bugs in influxdb-python. To get things working like you want, all you have to do is request the right tags by using the "GROUP BY" keyword.

So, there are two types of tags:

  • top-level tags that are returned when using group-by. Filtering works with these.
  • column-level tags. Filtering does not work with these, as they are new.

However, it looks like one would want to filter by the tags that are now included in the columns. What do you think?

Once again, I think that this discussion should be on GitHub so that everyone can see, I will post this here:

thank you for your work David

Best regards,

Alexandre Viau
alexandre@alexandreviau.net

drmclean

drmclean commented on Oct 13, 2015

@drmclean
Author

Hi Alex,

The above makes sense although it should be noted that the "bug" is now just that the documentation is no longer accurate. In the current docs the following:

rs = cli.query("SELECT * from cpu")
cpu_influxdb_com_points = list(rs.get_points(tags={"host_name": "influxdb.com"}))

Is suggested as working code but due to the API changes it runs without error but no longer filters correctly, something which is a bit confusing for the first-time user!

I think its possible to implement filtering by tags included in the columns but I'm not sure how to do so in a way which maintains the previously functionality on group by queries but doesn't break other parts of the library. I'll remove my pull request as it neither solves the problems correctly nor passes the build!

aviau

aviau commented on Oct 13, 2015

@aviau
Collaborator

the documentation is no longer accurate

You are right! That is a bug :)

I think its possible to implement filtering by tags included in the columns but I'm not sure how to do so in a way which maintains the previously functionality on group by queries but doesn't break other parts of the library

I'll think about this, we should use this issue to discuss how to do that.

3fr61n

3fr61n commented on Mar 2, 2016

@3fr61n

We had the same problem :(

With NO tags (works)

list(latest_datapoints.get_points(measurement = 're.memory.rpd-CPU'))
[{u'value_str': None, u'time': u'2016-03-02T11:44:23.995058481Z', u'product-model': u'mx480', u'value': 0, u'version': u'20150617.306001_builder_stable_10', u'key': None, u'delta': 0, u'device': u'PE4-tf-mx480-3-re0', u'kpi': u're.memory.rpd-CPU', u'delta_str': None}]

With EMPTY tags (works)

list(latest_datapoints.get_points(measurement = 're.memory.rpd-CPU',tags={}))
[{u'value_str': None, u'time': u'2016-03-02T11:44:23.995058481Z', u'product-model': u'mx480', u'value': 0, u'version': u'20150617.306001_builder_stable_10', u'key': None, u'delta': 0, u'device': u'PE4-tf-mx480-3-re0', u'kpi': u're.memory.rpd-CPU', u'delta_str': None}]

With ANY specific tags (does NOT works)

list(latest_datapoints.get_points(measurement = 're.memory.rpd-CPU', tags={'device': 'PE4-tf-mx480-3-re0'}))
[]
list(latest_datapoints.get_points(tags={"kpi": "route-table.summary.destinations"}))
[]

etc

anoopkhandelwal

anoopkhandelwal commented on Mar 14, 2016

@anoopkhandelwal

Hi,
I am also facing the same issue.I need to filter out data by using 3-4 tags.
Any other alternative solution/wrapper function which we can use to achieve our objective?

anoopkhandelwal

anoopkhandelwal commented on Mar 15, 2016

@anoopkhandelwal

Hi,
I wrote a wrapper function -
def filter_fun(data, key, allowed):
return filter(lambda x: key in x and x[key] in allowed, data)

def filter_data(data, fitered_tags):
response_list = data
for tag_key, tag_value in fitered_tags.iteritems():
response_list = filter_fun(response_list, tag_key, tag_value)
return response_list

now all we need to filter is to pass the data_list into filter_data function and pass tags(dict) so that it will filter all the data and return you list of dict elements.
e.g
data_list = filter_data(data_list, fitered_tags={'key_1': value1, 'key_2': 'value2'})
Since the function get_points also filtered after getting the data i.e. it is not executed on the query level,so performance wise it is same as get_points function.
Let me know,if we can make it more correct.

TwitchChen

TwitchChen commented on Jan 17, 2017

@TwitchChen

this bug has been resolved now?

xginn8

xginn8 commented on Nov 25, 2017

@xginn8
Collaborator

This bug should be fixed in the latest release -- if not, we can revisit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Filter by tags appears to be broken? · Issue #251 · influxdata/influxdb-python