Browse code

Metadata for copy should now be consistent for multipart or not and for aws or not cases

Note: A little memo description of the nightmare for performance here:
** FOR AWS, 2 cases:
- COPY will copy the metadata of the source to dest, but you can't
modify them. Any additional header will be ignored anyway.
- REPLACE will set the additional metadata headers that are provided
but will not copy any of the source headers.
So, to add to existing meta during copy, you have to do an
object_info to get original source headers, then modify, then use
REPLACE for the copy operation.

** For Minio and maybe other implementations:
- if additional headers are sent, they will be set to the
destination on top of source original meta in all cases COPY and
REPLACE.
It is a nice behavior except that it is different of the aws one.

As it was still too easy, there is another catch:
In all cases, for multipart copies, metadata data are never copied
from the source.

But normally automatically handled now, despite having the extra
object_info in worst cases.

Florent Viard authored on 2020/04/18 10:49:49
Showing 1 changed files
... ...
@@ -822,6 +822,27 @@ class S3(object):
822 822
 
823 823
     def object_copy(self, src_uri, dst_uri, extra_headers=None,
824 824
                     src_size=None, extra_label="", replace_meta=False):
825
+        """Remote copy an object and eventually set metadata
826
+
827
+        Note: A little memo description of the nightmare for performance here:
828
+        ** FOR AWS, 2 cases:
829
+        - COPY will copy the metadata of the source to dest, but you can't
830
+        modify them. Any additional header will be ignored anyway.
831
+        - REPLACE will set the additional metadata headers that are provided
832
+        but will not copy any of the source headers.
833
+        So, to add to existing meta during copy, you have to do an object_info
834
+        to get original source headers, then modify, then use REPLACE for the
835
+        copy operation.
836
+
837
+        ** For Minio and maybe other implementations:
838
+        - if additional headers are sent, they will be set to the destination
839
+        on top of source original meta in all cases COPY and REPLACE.
840
+        It is a nice behavior except that it is different of the aws one.
841
+
842
+        As it was still too easy, there is another catch:
843
+        In all cases, for multipart copies, metadata data are never copied
844
+        from the source.
845
+        """
825 846
         if src_uri.type != "s3":
826 847
             raise ValueError("Expected URI type 's3', got '%s'" % src_uri.type)
827 848
         if dst_uri.type != "s3":
... ...
@@ -837,8 +858,12 @@ class S3(object):
837 837
                 acl = None
838 838
 
839 839
         multipart = False
840
-
841 840
         headers = None
841
+
842
+        if extra_headers or self.config.mime_type:
843
+            # Force replace, that will force getting meta with object_info()
844
+            replace_meta = True
845
+
842 846
         if replace_meta:
843 847
             src_info = self.object_info(src_uri)
844 848
             headers = src_info['headers']
... ...
@@ -865,9 +890,8 @@ class S3(object):
865 865
                 threshold = self.config.multipart_copy_chunk_size_mb * SIZE_1MB
866 866
 
867 867
             if src_size > threshold:
868
-                # Sadly, s3 is badly done as metadata will not be copied in
869
-                # multipart copy unlike what is done in the case of direct
870
-                # copy.
868
+                # Sadly, s3 has a bad logic as metadata will not be copied for
869
+                # multipart copy unlike what is done for direct copies.
871 870
                 # TODO: Optimize by re-using the object_info request done
872 871
                 # earlier earlier at fetch remote stage, and preserve headers.
873 872
                 if src_headers is None:
... ...
@@ -883,6 +907,7 @@ class S3(object):
883 883
         else:
884 884
             headers = SortedDict(ignore_case=True)
885 885
 
886
+        # Following meta data are updated even in COPY by aws
886 887
         if self.config.acl_public:
887 888
             headers["x-amz-acl"] = "public-read"
888 889
 
... ...
@@ -898,6 +923,7 @@ class S3(object):
898 898
             headers['x-amz-server-side-encryption-aws-kms-key-id'] = \
899 899
                 self.config.kms_key
900 900
 
901
+        # Following meta data are not updated in simple COPY by aws.
901 902
         if extra_headers:
902 903
             headers.update(extra_headers)
903 904