Skip to content

Fixed #35462 -- Added support for JSON_ArrayAgg aggregate function. #19406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

lufafajoshua
Copy link
Contributor

Trac ticket number

ticket-35462

Branch description

This tickect adds support for the JSON_ARRAYAGG and JSON_GROUP_ARRAY for SQLite , these functions aggregate the contents of a specified column or SQL expressions converting each expression into a JSON value and returns a single JSON array containing those JSON values.

Checklist

  • This PR targets the main branch.
  • The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable.
  • I have attached screenshots in both light and dark modes for any UI changes.

@lufafajoshua
Copy link
Contributor Author

buildbot, test on oracle.

@kevingill1966
Copy link

General Comments

There seems to be some drift between the Postgres version and this new general version. The Postgres version ArrayAgg has the ordering parameter changed to order_by for 5.2.

I tried this pull request out with sqllite, and it accepted the ArrayAgg ordering parameter (not in documentation). However, that did not affect the order of the results.

I found the use of the output_field non-intuitive. It casts the whole aggregate, not the values. However, this may be an issue with aggregates in general rather than this specific case. JSON removes the type of the data, so a query on the test data of price returns a mix of integers and floats.

filter parameter worked well.

sample - I cannot see what this parameter does.

As an aside, I could not see the purpose of this aggregation. values_list provides the same functionality without losing type information. If there is a use case, it might be worth documenting.

Checklist

Tests - tests are provided for the functionality

Documentation - seems incorrect

  • the documentation says that it accepts a list of field names or expressions. It only accepted one field name for me.
  • I don't know what sample does
  • The output_field is non-intuitive and perhaps should be disabled.

Release Documentation - To be provided

@lufafajoshua
Copy link
Contributor Author

@charettes , some review here.

Copy link
Member

@charettes charettes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could also set allow_order_by and make sure order_by works.

It will require some adjustments on MySQL because while it supports ORDER BY it requires the usage of an over clause.

)
cls.b2.authors.add(cls.a2)

def test(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test(self):
def test_text(self):

Comment on lines 441 to 458
extra_context.setdefault(
"template", "%(function)s(%(distinct)s%(expressions)s RETURNING JSONB)"
)
Copy link
Member

@charettes charettes Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work with filter, notice how it's in Aggregate.template but not here.

Suggested change
extra_context.setdefault(
"template", "%(function)s(%(distinct)s%(expressions)s RETURNING JSONB)"
)
extra_context.setdefault(
"template",
"%(function)s(%(distinct)s%(expressions)s%(order_by)s RETURNING JSONB)%(filter)s",
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment this still relevant, test_filter likely only passes on CI because they are running the tests against Postgres < 16, the filter (and order_by) are completely elided and result in this failure

======================================================================
ERROR: test_filter (aggregation.tests.JSONArrayAggTests.test_filter)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/django/source/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/psycopg/cursor.py", line 97, in execute
    raise ex.with_traceback(None)
psycopg.ProgrammingError: the query has 0 placeholders but 1 parameters were passed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/django/source/tests/aggregation/tests.py", line 2706, in test_filter
    vals = Book.objects.aggregate(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/django/source/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/django/source/django/db/models/query.py", line 589, in aggregate
    return self.query.chain().get_aggregation(self.db, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/django/source/django/db/models/sql/query.py", line 626, in get_aggregation
    result = compiler.execute_sql(SINGLE)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/django/source/django/db/models/sql/compiler.py", line 1623, in execute_sql
    cursor.execute(sql, params)
  File "/django/source/django/db/backends/utils.py", line 79, in execute
    return self._execute_with_wrappers(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/django/source/django/db/backends/utils.py", line 92, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/django/source/django/db/backends/utils.py", line 100, in _execute
    with self.db.wrap_database_errors:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/django/source/django/db/utils.py", line 94, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/django/source/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/psycopg/cursor.py", line 97, in execute
    raise ex.with_traceback(None)
django.db.utils.ProgrammingError: the query has 0 placeholders but 1 parameters were passed

Comment on lines 450 to 474
if internal_type == "DateField":
extra_context.setdefault(
"template", "%(function)s(TO_CHAR(%(expressions)s, 'YYYY-MM-DD'))"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I'm surprised that Oracle doesn't have a proper DATE data type. I'm not a fan of this logic but I also can't think of a way around it. Maybe it's not a huge deal because JSONObject is affected by the same problem and we don't specialize it?

Again the template must include filter and distinct

Suggested change
if internal_type == "DateField":
extra_context.setdefault(
"template", "%(function)s(TO_CHAR(%(expressions)s, 'YYYY-MM-DD'))"
)
if internal_type == "DateField":
extra_context.setdefault(
"template", "%(function)s(%(distinct)sTO_CHAR(%(expressions)s, 'YYYY-MM-DD'))%(filter)s"
)

An alternative way to achieve the same thing without leaning into template too much would be to do

expression = self.get_source_expressions()[0]
internal_type = expression.output_field.get_internal_type()
if internal_type == "DateField":
    clone = self.copy()
    clone.set_source_expressions([Func(expression, Value("YYYY-MM-DD"), function="TO_CHAR")])
    return clone.as_sql(compiler, connection, **extra_context)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @charettes , thanks for your suggestion not to use template on datefield for oracle, though am getting the following error

Traceback (most recent call last):
File "/home/django/tests/aggregation/tests.py", line 2685, in test_datefield
vals = Author.objects.aggregate(jsonarrayagg=JSONArrayAgg("book__pubdate"))
File "/home/django/django/db/models/manager.py", line 87, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/django/django/db/models/query.py", line 589, in aggregate
return self.query.chain().get_aggregation(self.db, kwargs)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/django/django/db/models/sql/query.py", line 626, in get_aggregation
result = compiler.execute_sql(SINGLE)
File "/home/django/django/db/models/sql/compiler.py", line 1610, in execute_sql
sql, params = self.as_sql()
~~~~~~~~~~~^^
File "/home/django/django/db/models/sql/compiler.py", line 766, in as_sql
extra_select, order_by, group_by = self.pre_sql_setup(
~~~~~~~~~~~~~~~~~~^
with_col_aliases=with_col_aliases or bool(combinator),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/django/django/db/models/sql/compiler.py", line 85, in pre_sql_setup
self.setup_query(with_col_aliases=with_col_aliases)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/django/django/db/models/sql/compiler.py", line 74, in setup_query
self.select, self.klass_info, self.annotation_col_map = self.get_select(
~~~~~~~~~~~~~~~^
with_col_aliases=with_col_aliases,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/django/django/db/models/sql/compiler.py", line 316, in get_select
sql, params = self.compile(col)
~~~~~~~~~~~~^^^^^
File "/home/django/django/db/models/sql/compiler.py", line 575, in compile
sql, params = vendor_impl(self, self.connection)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
File "/home/django/django/db/models/aggregates.py", line 460, in as_oracle
clone.set_source_expressions(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
[Func(expression, Value("YYYY-MM-DD"), function="TO_CHAR")])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/django/django/db/models/aggregates.py", line 116, in set_source_expressions
*exprs, self.filter, self.order_by = exprs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to include the actual exception in here which makes it impossible to diagnose unfortunately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the exception ValueError: not enough values to unpack (expected at least 2, got 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @charettes , any updates or we stick to the template parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code above was meant to point you in the right direction so you can figure it out yourself by moving things around. The full code is

def as_oracle(self, compiler, connection, **extra_context):
        # Oracle turns DATE columns into ISO 8601 timestamp including T00:00:00
        # suffixed when converting to JSON while other backends only include
        # the date part.
        source_expressions = self.get_source_expressions()
        expression = source_expressions[0]
        if isinstance(expression.output_field, DateField):
            clone = self.copy()
            clone.set_source_expressions(
                [
                    Func(expression, Value("YYYY-MM-DD"), function="TO_CHAR"),
                    *source_expressions[1:],
                ]
            )
            return clone.as_sql(compiler, connection, **extra_context)
        return self.as_sql(compiler, connection, **extra_context)


.. versionadded:: 6.0

.. class:: JSONArrayAgg(expression, output_field=None, sample=False, filter=None, default=None, **extra)
Copy link
Member

@charettes charettes Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where sample is coming from? Make sure to also include order_by.

Suggested change
.. class:: JSONArrayAgg(expression, output_field=None, sample=False, filter=None, default=None, **extra)
.. class:: JSONArrayAgg(expression, output_field=None, filter=None, order_by=None, default=None, **extra)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is still relevant.

@charettes
Copy link
Member

charettes commented Apr 30, 2025

You can forget the mention of trying to support order_by on MySQL it's simply too bugged.

It's not possible to implement using OVER() like they document it and using a subquery to work around this limitation yields the same result and is known to be broken.

One thing we could do is set allow_order_by = True and have the as_mysql implementation raise a NotSupportedError when self.order_by is present.

class JSONArrayAgg(Aggregate):
function = "JSON_ARRAYAGG"
output_field = JSONField()
arity = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark as allow_order_by = True and add tests for it.

Comment on lines 441 to 458
extra_context.setdefault(
"template", "%(function)s(%(distinct)s%(expressions)s RETURNING JSONB)"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment this still relevant, test_filter likely only passes on CI because they are running the tests against Postgres < 16, the filter (and order_by) are completely elided and result in this failure

======================================================================
ERROR: test_filter (aggregation.tests.JSONArrayAggTests.test_filter)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/django/source/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/psycopg/cursor.py", line 97, in execute
    raise ex.with_traceback(None)
psycopg.ProgrammingError: the query has 0 placeholders but 1 parameters were passed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/django/source/tests/aggregation/tests.py", line 2706, in test_filter
    vals = Book.objects.aggregate(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/django/source/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/django/source/django/db/models/query.py", line 589, in aggregate
    return self.query.chain().get_aggregation(self.db, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/django/source/django/db/models/sql/query.py", line 626, in get_aggregation
    result = compiler.execute_sql(SINGLE)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/django/source/django/db/models/sql/compiler.py", line 1623, in execute_sql
    cursor.execute(sql, params)
  File "/django/source/django/db/backends/utils.py", line 79, in execute
    return self._execute_with_wrappers(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/django/source/django/db/backends/utils.py", line 92, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/django/source/django/db/backends/utils.py", line 100, in _execute
    with self.db.wrap_database_errors:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/django/source/django/db/utils.py", line 94, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/django/source/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/psycopg/cursor.py", line 97, in execute
    raise ex.with_traceback(None)
django.db.utils.ProgrammingError: the query has 0 placeholders but 1 parameters were passed

Comment on lines 450 to 474
if internal_type == "DateField":
extra_context.setdefault(
"template", "%(function)s(TO_CHAR(%(expressions)s, 'YYYY-MM-DD'))"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code above was meant to point you in the right direction so you can figure it out yourself by moving things around. The full code is

def as_oracle(self, compiler, connection, **extra_context):
        # Oracle turns DATE columns into ISO 8601 timestamp including T00:00:00
        # suffixed when converting to JSON while other backends only include
        # the date part.
        source_expressions = self.get_source_expressions()
        expression = source_expressions[0]
        if isinstance(expression.output_field, DateField):
            clone = self.copy()
            clone.set_source_expressions(
                [
                    Func(expression, Value("YYYY-MM-DD"), function="TO_CHAR"),
                    *source_expressions[1:],
                ]
            )
            return clone.as_sql(compiler, connection, **extra_context)
        return self.as_sql(compiler, connection, **extra_context)

Supported the JSON_ARRAYAGG aggregate function
and JSON_GROUP_ARRAY as a variant to add support
for sqlite database. Added tests for the JSON_ARRAYAGG
and JSON_GROUP_ARRAY aggregate functions.
Documented JSONArrayAgg and added a release note.
…mentation to mention support for order_by on JSONArrayAgg.
@lufafajoshua lufafajoshua requested a review from charettes July 29, 2025 11:05
…ified order_by tests to include ascending and descending order.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants