Skip to content

Is there a way to get the response iterator directly when downloading artifacts? #1955

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TCatshoek opened this issue Mar 30, 2022 · 3 comments
Labels

Comments

@TCatshoek
Copy link
Contributor

Description of the problem, including code/CLI snippet

The artifact downloads support a streaming mode which wraps the iterator provided by the requests library and allows the user to provide an action callable that is called with the chunks provided by the iterator as arguments.

I would like to be able to access the response.iter_content() iterator directly.

Expected Behavior

My use case would be something like this:

iter_response = project.artifacts.download(
    ref_name="main", job="build", streamed=True, action='iterator'
)

do_things_with_iterator(iter_response)

( In my real project do_things_with_iterator is actually a fastapi StreamingResponse which takes an iterator or generator. This would allow downloads of artifacts without first having to download the entire file server side and then forwarding it to the client)

Actual Behavior

As far as I can tell this is currently not possible.

I thought this would be relatively simple to implement, so I adapted the response_content function in utils.py to

def response_content(
    response: requests.Response,
    streamed: bool,
    action: Optional[Union[Callable, Literal["iterator"]]],
    chunk_size: int,
) -> Optional[Union[bytes, Iterator[Any]]]:
    if streamed is False:
        return response.content

    if action is None:
        action = _StdoutStream()

    if action == "iterator":
        return response.iter_content(chunk_size=chunk_size)

    for chunk in response.iter_content(chunk_size=chunk_size):
        if chunk:
            action(chunk)
    return None

But I ran into issues adapting the type annotations in other places, as some functions are not designed to handle iterators and will only handle bytes. I did not have a lot of time to spend on figuring this out though so if it seems worth the effort I can look into this further and create a pull request later.

Specifications

  • python-gitlab version: 3.2.0
  • API version you are using (v3/v4): v4
  • Gitlab server version (or gitlab.com): gitlab.com
@TCatshoek
Copy link
Contributor Author

Created a pull request #1956

Please close if there's a better way to do this that I missed.

@TCatshoek
Copy link
Contributor Author

In the meantime I found a (slightly ugly) workaround to get the desired behavior from python-gitlab 3.2.0 and fastapi. It needs a supporting class that python-gitlab can write the chunks to in a different thread, which gets wrapped in a generator that fastapi can use.

class BytesLoop:
    def __init__(self, s=b''):
        self.buffer = s
        self.lock = threading.Lock()

    def read(self, n=-1):
        with self.lock:
            chunk = self.buffer if n == -1 else self.buffer[:n]
            self.buffer = b'' if n == -1 else self.buffer[n:]
            return chunk

    def write(self, s):
        with self.lock:
            self.buffer += s

    def is_empty(self):
        return len(self.buffer) == 0

And then we can read and write in our route:

@router.get("/artifact/{id}")
async def get_artifact(id: int):
    project = gl.projects.get(id)

    buffer = BytesLoop()

    def fill_buffer():
        project.artifacts.download(
            ref_name="main", job="build", streamed=True, action=buffer.write, chunk_size=1024*1024
        )

    fill_buffer_thread = Thread(target=fill_buffer)
    fill_buffer_thread.start()

    def iter_buffer():
        while fill_buffer_thread.is_alive() or not buffer.is_empty():
            yield buffer.read()

    return StreamingResponse(iter_buffer(), media_type="application/zip")

Please use with caution, it seems to work fine for me but I have not exhaustively tested it. Performance could probably be improved too.

@nejch
Copy link
Member

nejch commented Jun 26, 2022

Done with #1956.

@nejch nejch closed this as completed Jun 26, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants