-
Notifications
You must be signed in to change notification settings - Fork 5.3k
drivers: media: pisp_be: Fix for job queue removal in stop_streaming() #6999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The existing code unconditionally removes jobs from the job_queue list when all the nodes in a node group have stopped streaming. This will also remove jobs for any other node groups as the job_queue is a common list. Fix this by only conditionally deleting jobs that belong to the current node group. Additionally, delete jobs as soon as the first node in the node group stops streaming. Running a job with an incomplete set of active nodes is invalid. Fixes: 880153e ("media: pisp_be: Split jobs creation and scheduling") Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Is it possible to be in a situation where jobs are queued for this node-group, which don't reference the node that was stopped? (It seems unlikely.) |
I think that would be a malformed job and probably fail validation? |
Sadly, this does not seem to be enough to fix raspberrypi/rpicam-apps#849. |
pispbe_schedule() currently takes a node group as a parameter, which is left over from before the job prepare/scheduling refactoring. This is now invalid, as jobs are executed in the order which they were queued. As part of this old code, there was a check if the current node group mached the node group of the job, and if unmathched, use a "continue" statement. This is invalid as there is no loop to iterate over any more. The reason this was not a compile bug is because of the for loop used as part of the scoped_guard macro. A consequence of breaking out of the scoped_guard loop early is that the job structure gets freed, but not actually removed from the queue and may be accessed after freeing. Fix this by removing the node group test in pispbe_schedule() as it is no longer valid to use. Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
With 1506a5e, I'm now pretty confident that raspberrypi/rpicam-apps#849 ought to be resolved. |
@njhollinghurst this is ready to be reviewed and merged if you are happy with it. |
LGTM. I assume the checkpatch error is just an artifact of a shallow clone? |
Yes, it's one of the current limitations of the checkpatch job. Looks like we can squash that one by adding |
Hi Naush, thanks for fixing this. Indeed I introduced this when moving the patches from the non-multi-group mainline version to the multi-group BSP version of the driver. Sorry about that, I recall I tested with 2 cameras, but I don't recall seeing this (however, it should happen all the times one camera is stopped while the other is left running ?) 1506a5e is certainly good, I don't think there is anything to upport to mainline right ? I'm a bit less sure about 3e4f99e. Thanks for fixing this, and sorry if you caught me on the week I was on leave. |
Not to worry! This bug did take some time to trigger - and in very specific timing conditions. So most of out testing will not have caught it.
Correct, I checked and there is no need to mainline this change.
My feeling is that it's best if the test is removed entirely. Since we now rely on a single job queue across all users, we should unconditionally run the first job in the queue to maintain strict FIFO ordering. There is no reason to jump to the "current" user's job in such cases. Again, there is no need to upstream this as the logic is different, i.e. only handles a single user.
No problem at all - hope you had a good rest 😄 |
kernel: INA260 driver support 70fb84a109c639637f0636281dbdb21ed8ffb000 upstream See: raspberrypi/linux#7004 kernel: drivers: media: pisp_be: Fix for job queue removal in stop_streaming See: raspberrypi/linux#6999 kernel: Improve PIO DMA performance See: raspberrypi/linux#6994
kernel: INA260 driver support 70fb84a109c639637f0636281dbdb21ed8ffb000 upstream See: raspberrypi/linux#7004 kernel: drivers: media: pisp_be: Fix for job queue removal in stop_streaming See: raspberrypi/linux#6999 kernel: Improve PIO DMA performance See: raspberrypi/linux#6994
The existing code unconditionally removes jobs from the job_queue list when all the nodes in a node group have stopped streaming. This will also remove jobs for any other node groups as the job_queue is a common list. Fix this by only conditionally deleting jobs that belong to the current node group.
Additionally, delete jobs as soon as the first node in the node group stops streaming. Running a job with an incomplete set of active nodes is invalid.
Fixes: 880153e ("media: pisp_be: Split jobs creation and scheduling")