-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
[HttpFoundation] Add documentation for StreamedJsonResponse
#17301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
javiereguiluz
merged 1 commit into
symfony:6.3
from
alexander-schranz:feature/streamed-json-response
Jun 6, 2023
Merged
[HttpFoundation] Add documentation for StreamedJsonResponse
#17301
javiereguiluz
merged 1 commit into
symfony:6.3
from
alexander-schranz:feature/streamed-json-response
Jun 6, 2023
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1e14c85
to
a414b97
Compare
a414b97
to
ec5fa8f
Compare
OskarStark
reviewed
Sep 28, 2022
OskarStark
reviewed
Sep 28, 2022
OskarStark
reviewed
Sep 29, 2022
StreamedJsonResponse
OskarStark
reviewed
Sep 29, 2022
OskarStark
reviewed
Sep 29, 2022
OskarStark
reviewed
Sep 29, 2022
OskarStark
reviewed
Sep 29, 2022
chalasr
added a commit
to symfony/symfony
that referenced
this pull request
Dec 29, 2022
…ent JSON streaming (alexander-schranz) This PR was squashed before being merged into the 6.3 branch. Discussion ---------- [HttpFoundation] Add `StreamedJsonResponse` for efficient JSON streaming | Q | A | ------------- | --- | Branch? | 6.2 | Bug fix? | no | New feature? | yes <!-- please update src/**/CHANGELOG.md files --> | Deprecations? | no <!-- please update UPGRADE-*.md and src/**/CHANGELOG.md files --> | Tickets | Fix #... <!-- prefix each issue number with "Fix #", no need to create an issue if none exist, explain below instead --> | License | MIT | Doc PR | symfony/symfony-docs#17301 When big data are streamed via JSON API it can sometimes be difficult to keep the resources usages low. For this I experimented with a different way of streaming data for JSON responses. It uses combination of `structured array` and `generics` which did result in a lot better result. More can be read about here: [https://github.com/alexander-schranz/efficient-json-streaming-with-symfony-doctrine](https://github.com/alexander-schranz/efficient-json-streaming-with-symfony-doctrine). I thought it maybe can be a great addition to Symfony itself to make this kind of responses easier and that APIs can be made more performant. ## Usage <details><summary>First Version (replaced)</summary> ```php class ArticleListAction { public function __invoke(EntityManagerInterface $entityManager): Response { $articles = $this->findArticles($entityManager); return new StreamedJsonResponse( // json structure with replacers identifiers [ '_embedded' => [ 'articles' => '__articles__', ], ], // array of generator replacer identifier used as key [ '__articles__' => $this->findArticles('Article'), ] ); } private function findArticles(EntityManagerInterface $entityManager): \Generator { $queryBuilder = $entityManager->createQueryBuilder(); $queryBuilder->from(Article::class, 'article'); $queryBuilder->select('article.id') ->addSelect('article.title') ->addSelect('article.description'); return $queryBuilder->getQuery()->toIterable(); } } ``` </details> Update Version (thx to `@ro0NL` for the idea): ```php class ArticleListAction { public function __invoke(EntityManagerInterface $entityManager): Response { $articles = $this->findArticles($entityManager); return new StreamedJsonResponse( // json structure with generators in it which are streamed [ '_embedded' => [ 'articles' => $this->findArticles('Article'), // returns a generator which is streamed ], ], ); } private function findArticles(EntityManagerInterface $entityManager): \Generator { $queryBuilder = $entityManager->createQueryBuilder(); $queryBuilder->from(Article::class, 'article'); $queryBuilder->select('article.id') ->addSelect('article.title') ->addSelect('article.description'); return $queryBuilder->getQuery()->toIterable(); } } ``` ---- As proposed by `@OskarStark` the Full Content of Blog about ["Efficient JSON Streaming with Symfony and Doctrine"](https://github.com/alexander-schranz/efficient-json-streaming-with-symfony-doctrine/edit/main/README.md): # Efficient JSON Streaming with Symfony and Doctrine After reading a tweet about we provide only a few items (max. 100) over our JSON APIs but providing 4k images for our websites. I did think about why is this the case. The main difference first we need to know about how images are streamed. On webservers today is mostly the sendfile feature used. Which is very efficient as it can stream a file chunk by chunk and don't need to load the whole data. So I'm asking myself how we can achieve the same mechanisms for our JSON APIs, with a little experiment. As an example we will have a look at a basic entity which has the following fields defined: - id: int - title: string - description: text The response of our API should look like the following: ```json { "_embedded": { "articles": [ { "id": 1, "title": "Article 1", "description": "Description 1\nMore description text ...", }, ... ] } } ``` Normally to provide this API we would do something like this: ```php <?php namespace App\Controller; use App\Entity\Article; use Doctrine\ORM\EntityManagerInterface; use Symfony\Component\HttpFoundation\JsonResponse; use Symfony\Component\HttpFoundation\Response; class ArticleListAction { public function __invoke(EntityManagerInterface $entityManager): Response { $articles = $this->findArticles($entityManager); return JsonResponse::fromJsonString(json_encode([ 'embedded' => [ 'articles' => $articles, ], 'total' => 100_000, ], JSON_THROW_ON_ERROR | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE)); } // normally this method would live in a repository private function findArticles(EntityManagerInterface $entityManager): iterable { $queryBuilder = $entityManager->createQueryBuilder(); $queryBuilder->from(Article::class, 'article'); $queryBuilder->select('article.id') ->addSelect('article.title') ->addSelect('article.description'); return $queryBuilder->getQuery()->getResult(); } } ``` In most cases we will add some pagination to the endpoint so our response are not too big. ## Making the api more efficient But there is also a way how we can stream this response in an efficient way. First of all we need to adjust how we load the articles. This can be done by replace the `getResult` with the more efficient [`toIterable`](https://www.doctrine-project.org/projects/doctrine-orm/en/2.9/reference/batch-processing.html#iterating-results): ```diff - return $queryBuilder->getQuery()->getResult(); + return $queryBuilder->getQuery()->toIterable(); ``` Still the whole JSON need to be in the memory to send it. So we need also refactoring how we are creating our response. We will replace our `JsonResponse` with the [`StreamedResponse`](https://symfony.com/doc/6.0/components/http_foundation.html#streaming-a-response) object. ```php return new StreamedResponse(function() use ($articles) { // stream json }, 200, ['Content-Type' => 'application/json']); ``` But the `json` format is not the best format for streaming, so we need to add some hacks so we can make it streamable. First we will create will define the basic structure of our JSON this way: ```php $jsonStructure = json_encode([ 'embedded' => [ 'articles' => ['__REPLACES_ARTICLES__'], ], 'total' => 100_000, ], JSON_THROW_ON_ERROR | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE); ``` Instead of the `$articles` we are using a placeholder which we use to split the string into a `$before` and `$after` variable: ```php [$before, $after] = explode('"__REPLACES_ARTICLES__"', $jsonStructure, 2); ``` Now we are first sending the `$before`: ```php echo $before . PHP_EOL; ``` Then we stream the articles one by one to it here we need to keep the comma in mind which we need to add after every article but not the last one: ```php foreach ($articles as $count => $article) { if ($count !== 0) { echo ',' . PHP_EOL; // if not first element we need a separator } echo json_encode($article, JSON_THROW_ON_ERROR | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE); } ``` Also we will add an additional `flush` after every 500 elements: ```php if ($count % 500 === 0 && $count !== 100_000) { // flush response after every 500 flush(); } ``` After that we will also send the `$after` part: ```php echo PHP_EOL; echo $after; ``` ## The result So at the end the whole action looks like the following: ```php <?php namespace App\Controller; use App\Entity\Article; use Doctrine\ORM\EntityManagerInterface; use Symfony\Component\HttpFoundation\Response; use Symfony\Component\HttpFoundation\StreamedResponse; class ArticleListAction { public function __invoke(EntityManagerInterface $entityManager): Response { $articles = $this->findArticles($entityManager); return new StreamedResponse(function() use ($articles) { // defining our json structure but replaces the articles with a placeholder $jsonStructure = json_encode([ 'embedded' => [ 'articles' => ['__REPLACES_ARTICLES__'], ], 'total' => 100_000, ], JSON_THROW_ON_ERROR | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE); // split by placeholder [$before, $after] = explode('"__REPLACES_ARTICLES__"', $jsonStructure, 2); // send first before part of the json echo $before . PHP_EOL; // stream article one by one as own json foreach ($articles as $count => $article) { if ($count !== 0) { echo ',' . PHP_EOL; // if not first element we need a separator } if ($count % 500 === 0 && $count !== 100_000) { // flush response after every 500 flush(); } echo json_encode($article, JSON_THROW_ON_ERROR | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE); } // send the after part of the json as last echo PHP_EOL; echo $after; }, 200, ['Content-Type' => 'application/json']); } private function findArticles(EntityManagerInterface $entityManager): iterable { $queryBuilder = $entityManager->createQueryBuilder(); $queryBuilder->from(Article::class, 'article'); $queryBuilder->select('article.id') ->addSelect('article.title') ->addSelect('article.description'); return $queryBuilder->getQuery()->toIterable(); } } ``` The metrics for 100000 Articles (nginx + php-fpm 7.4 - Macbook Pro 2013): | | Old Implementation | New Implementation | |---------------------------|--------------------|--------------------| | Memory Usage | 49.53 MB | 2.10 MB | | Memory Usage Peak | 59.21 MB | 2.10 MB | | Time to first Byte | 478ms | 28ms | | Time | 2.335 s | 0.584 s | This way we did not only reduce the memory usage on our server also we did make the response faster. The memory usage was measured here with `memory_get_usage` and `memory_get_peak_usage`. The "Time to first Byte" by the browser value and response times over curl. **Updated 2022-10-02 - (symfony serve + php-fpm 8.1 - Macbook Pro 2021)** | | Old Implementation | New Implementation | |---------------------------|--------------------|--------------------| | Memory Usage | 64.21 MB | 2.10 MB | | Memory Usage Peak | 73.89 MB | 2.10 MB | | Time to first Byte | 0.203 s | 0.049 s | | Updated Time (2022-10-02) | 0.233 s | 0.232 s | While there is not much different for a single response in the time, the real performance is the lower memory usage. Which will kick in when you have a lot of simultaneously requests. On my machine >150 simultaneously requests - which is a high value but will on a normal server be a lot lower. While 150 simultaneously requests crashes in the old implementation the new implementation still works with 220 simultaneously requests. Which means we got about ~46% more requests possible. ## Reading Data in javascript As we stream the data we should also make our JavaScript on the other end the same way - so data need to read in streamed way. Here I'm just following the example from the [Fetch API Processing a text file line by line](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch#processing_a_text_file_line_by_line) So if we look at our [`script.js`](public/script.js) we split the object line by line and append it to our table. This method is definitely not the way how JSON should be read and parsed. It should only be shown as example how the response could be read from a stream. ## Conclusion The implementation looks a little hacky for maintainability it could be moved into its own Factory which creates this kind of response. Example: ```php return StreamedResponseFactory::create( [ 'embedded' => [ 'articles' => ['__REPLACES_ARTICLES__'], ], 'total' => 100_000, ], ['____REPLACES_ARTICLES__' => $articles] ); ``` The JavaScript part something is definitely not ready for production and if used you should probably creating your own content-type e.g.: `application/json+stream`. So you are parsing the json this way only when you know it is really in this line by line format. There maybe better libraries like [`JSONStream`](https://www.npmjs.com/package/JSONStream) to read data but at current state did test them out. Let me know if somebody has experience with that and has solutions for it. Atleast what I think everybody should use for providing lists is to use [`toIterable`](https://www.doctrine-project.org/projects/doctrine-orm/en/2.9/reference/batch-processing.html#iterating-results) when possible for your lists when loading your data via Doctrine and and select specific fields instead of using the `ORM` to avoid hydration process to object. Let me know what you think about this experiment and how you currently are providing your JSON data. The whole experiment here can be checked out and test yourself via [this repository](https://github.com/alexander-schranz/efficient-json-streaming-with-symfony-doctrine). Attend the discussion about this on [Twitter](https://twitter.com/alex_s_/status/1488314080381313025). ## Update 2022-09-27 Added a [StreamedJsonRepsonse](src/Controller/StreamedJsonResponse.php) class and try to contribute this implementation to the Symfony core. [https://github.com/symfony/symfony/pull/47709](https://github.com/symfony/symfony/pull/47709) ## Update 2022-10-02 Updated some statistics with new machine and apache benchmark tests for concurrency requests. Commits ------- ecc5355 [HttpFoundation] Add `StreamedJsonResponse` for efficient JSON streaming
The code PR has been merged, it would be nice to finish this for 6.3 :) |
MatTheCat
reviewed
Dec 29, 2022
993420e
to
461cb53
Compare
StreamedJsonResponse
StreamedJsonResponse
461cb53
to
ad83f47
Compare
The documentation was updated and ready for review. |
StreamedJsonResponse
StreamedJsonResponse
ad83f47
to
858fd59
Compare
OskarStark
reviewed
May 16, 2023
94noni
reviewed
May 16, 2023
MatTheCat
reviewed
May 16, 2023
StreamedJsonResponse
StreamedJsonResponse
d3c074d
to
8a285e3
Compare
Alexander, thanks a lot for this nice contribution! Thanks to reviewers too! I did most of the requested changes while merging. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Docs for: symfony/symfony#47709
TODO