GitList

Browse code

don't use attrs md5 when file was gpg-encrypted by us

Since 5fc2bbcc (2013-05-20 23:31:50 -0500), the value we stored into
the s3cmd-attrs header for md5 contains the value for the plaintext,
not the encrypted, instance of the file. But after download we're
(incorrectly) checking the md5 of the encrypted file. This patch fixes
this.

We started storing the md5 value in s3cmd-attrs header in 1703df7009
(Fri Jun 15 23:43:00 2012). So it's likely been broken for a couple
years, and we'll have to deal with it (check both before and after
decryption I suppose, in case it matches either). That'll be another patch.

With the new Content-MD5 branch too, calculating md5 before encrypting
is a really really bad idea - that's just broken design. Encryption
should be done before we calculate the MD5 of the thing being
uploaded. It was the filename swizzle that caught me off guard.

Matt Domsch authored on 2014/04/23 09:37:26
Showing 1 changed files

S3/S3.py

History View file @ 6d13879

@@ -1143,10 +1143,14 @@ class S3(object):
                                      response["md5"] = response["headers"]["etag"]
                              md5_hash = response["headers"]["etag"]
                     -        try:
                     -            md5_hash = response["s3cmd-attrs"]["md5"]
                     -        except KeyError:
                     -            pass
                     +        if not 'x-amz-meta-s3tools-gpgenc' in response["headers"]:
                     +            # we can't trust our stored md5 because we
                     +            # encrypted the file after calculating it but before
                     +            # uploading it.
                     +            try:
                     +                md5_hash = response["s3cmd-attrs"]["md5"]
                     +            except KeyError:
                     +                pass
                              response["md5match"] = md5_hash.find(response["md5"]) >= 0
                              response["elapsed"] = timestamp_end - timestamp_start