Skip to content

Conversation

Zadigo
Copy link

@Zadigo Zadigo commented Aug 1, 2025

Hi everyone, here is my code proposition to update sync_s3 to use boto3 in response to the #1876 issue. The reason for this update is mostly technical because when trying to call sync_s3 with the actual code, the logic breaks down under certain conditions. This code proposition with boto3 allows:

  • More flexible configuration with Config objects
  • Better support advanced features like addressing styles
  • Use botocore.exceptions.ClientError for all AWS errors
  • Better error handling and retry logic
  • Session-based configuration management

Opening a connection to S3

One of the main changes occurs in the manner in which a connection is created in order to transfer files to the bucket. Creating a new connection is made via the Session class:

session = boto3.Session(**{
    'aws_access_key_id': self.AWS_S3_ACCESS_KEY_ID,
    'aws_secret_access_key': self.AWS_S3_SECRET_ACCESS_KEY,
    'region_name': self.AWS_S3_REGION_NAME
})

And uploading by calling upload_file on the client:

client.upload_file(**{
    'filename': str(fullpath),
    'bucket': self.AWS_STORAGE_BUCKET_NAME,
    'key': file_key,
    'extra_args': extra_args
})

However since we do not know exactly what kind of files are being uploaded, in case the file is large, use memory efficient file processing by taking benefit of multipart upload:

if file_size > 100 * 1024 * 1024: # 100MB threshold
    config = TransferConfig(
        multipart_threshold=1024 * 25,  # 25MB
        max_concurrency=10,
        multipart_chunksize=1024 * 25,
        use_threads=True
    )

    client.upload_file(
        Filename=str(fullpath),
        Bucket=self.AWS_STORAGE_BUCKET_NAME,
        Key=file_key,
        ExtraArgs=extra_args
    )
else:
    client.upload_file(
        Filename=str(fullpath),
        Bucket=self.AWS_STORAGE_BUCKET_NAME,
        Key=file_key,
        ExtraArgs=extra_args
    )

Overriding s3host option

As far as the previous s3host option, boto3 uses endpoint_url parameter in client/resource configuration. So overriding this parameter requires us to use the new session.Config class:

if self.s3host:
  if not self.s3host.startswith(('http://', 'https://')):
      self.s3host = f'https://{self.s3host}'

  client_config['endpoint_url'] = self.s3host
  # Configure for S3-compatible services
  # that use path-style addressing
  client_config['config'] = boto3.session.Config(
      s3={
          'addressing_style': 'path'
      }
  )

Variables and settings

Boto3 also requires new settings variables: AWS_S3_ACCESS_KEY_ID and AWS_S3_SECRET_ACCESS_KEY (note the S3) which are breaking changes compared to the previous Boto implementation.

  • AWS_BUCKET_NAME was renamed AWS_STORAGE_BUCKET_NAME
  • FILTER_LIST to AWS_S3_UPLOAD_FILTER_LIST

And the new list of settings would be: AWS_S3_REGION_NAME, AWS_S3_ACCESS_KEY_ID, AWS_S3_SECRET_ACCESS_KEY, AWS_STORAGE_BUCKET_NAME, AWS_CLOUDFRONT_DISTRIBUTION

Enhancements

User experience

Now the user understands better what is happening under the hood which also alternatively sticks better with the global Django experience. This obtained essentially by using the internal self.stdout.write and self.style helper functions for commands:

self.stdout.write(
    self.style.SUCCESS(
        f"    + OK Uploaded {filename} to {file_key}"
    )
)

Dry Run Mode

Is a new option that was added to allow the user to preview that files that would be uploaded to S3 without actually uploading any files:

python manage.py sync_s3 --dry-run

I'd be delighted to discuss about these changes 🙂

Next steps

These are some of next steps I think I'm working on moving forward:

  • Impove testing and test functions for the new functionnalities
  • Check for compability issues with certain versions of Django

Zadigo added 30 commits April 14, 2025 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant