Browse code

don't use attrs md5 when file was gpg-encrypted by us

Since 5fc2bbcc (2013-05-20 23:31:50 -0500), the value we stored into
the s3cmd-attrs header for md5 contains the value for the plaintext,
not the encrypted, instance of the file. But after download we're
(incorrectly) checking the md5 of the encrypted file. This patch fixes
this.

We started storing the md5 value in s3cmd-attrs header in 1703df7009
(Fri Jun 15 23:43:00 2012). So it's likely been broken for a couple
years, and we'll have to deal with it (check both before and after
decryption I suppose, in case it matches either). That'll be another patch.

With the new Content-MD5 branch too, calculating md5 before encrypting
is a really really bad idea - that's just broken design. Encryption
should be done before we calculate the MD5 of the thing being
uploaded. It was the filename swizzle that caught me off guard.

Matt Domsch authored on 2014/04/23 09:37:26
Showing 1 changed files
... ...
@@ -1143,10 +1143,14 @@ class S3(object):
1143 1143
                 response["md5"] = response["headers"]["etag"]
1144 1144
 
1145 1145
         md5_hash = response["headers"]["etag"]
1146
-        try:
1147
-            md5_hash = response["s3cmd-attrs"]["md5"]
1148
-        except KeyError:
1149
-            pass
1146
+        if not 'x-amz-meta-s3tools-gpgenc' in response["headers"]:
1147
+            # we can't trust our stored md5 because we
1148
+            # encrypted the file after calculating it but before
1149
+            # uploading it.
1150
+            try:
1151
+                md5_hash = response["s3cmd-attrs"]["md5"]
1152
+            except KeyError:
1153
+                pass
1150 1154
 
1151 1155
         response["md5match"] = md5_hash.find(response["md5"]) >= 0
1152 1156
         response["elapsed"] = timestamp_end - timestamp_start