Adolph
Engineer
Engineer
  • UID623
  • Fans2
  • Follows1
  • Posts72
Reads:1074Replies:0

Description of Content-MD5 in Relation with AliyunCLL for OSS Operations

Created#
More Posted time:Oct 8, 2016 9:33 AM
Abstract:  The articles describes the problems that may be encountered during uploading of the files containing Content-MD5 when the AliyunCLI tool is used for OSS operations
Recently during the support process, some users were found to use the highly integrated tool, AliyunCLI for OSS-related operations. Although the OSS team strongly recommends that users use OSS internal tools (such as osscmd) to reduce the cost of learning and maintaining the tools, as is necessary for the users who purchase and use a variety of cloud products, it seems to be more popular to use Aliyun CLI. Of course, we are also willing to solve all technical problems for users.

The usage scenario is to upload local files containing Content-MD5 (to check file integrity) through AliyunCLI. It can be seen from the above link that the requirement can be achieved through the following commands:



(Careful users may find that there is some problem with the put command of AliyunCLI, have you found it? Yes, the put command should be followed by a "localfile")


The usage commands for users are as follows:
aliyuncli oss Put localfile oss://user_bucket_name --header "Content-MD5:userfile_md5"

The result is that the localfile can be uploaded regardless of what “userfile_md5” is. Is this attributed to the failure of OSS’s md5 check function? Of course not. Careful users may find again that it is attributed to the wrong “--header” tab, which should be “--headers”. AliyunCLI does not give an error prompt for wrong tabs but ignores them directly, so the MD5 value uploaded here is not actually written into the OSS request header and uploaded to OSS. Please pay attention to it.

Next, we will describe in detail how to generate the Content-MD5 value:
As to the definition of Content-MD5, the official OSS files give a detailed explanation. As the HTTP header cannot record binary values, the values should be converted to strings through base64 code. In OSS server, the same method is used to calculate the message body and obtain and compare the corresponding value with the Content-MD5 of the message header, so as to check data validity.


 It is clearly explained that the MD5 value is obtained from two steps:

The most convenient and ready-made shell command is used to do this thing:



Do you know what is wrong with the Content-MD5 generated through MD5 and base64 as described above?


MD5 checksum code is a 128-bit binary number.  In memory, 128 bits = 16 octets.  After encoding with Base64, the length increases by about 33% and reaches 4*⌈16/3⌉ = 24 octets. How can the result in the figure above be 44 octets?!


There is an issue here!


Let's look at the details of the algorithm first:
HTTP/1.1 (RFC2616#14.15) gives the syntax rules of entity header field Content-MD5:
Content-MD5   = "Content-MD5" ":" md5-digest
md5-digest   = <base64 of 128 bit MD5 digest as per RFC 1864>


That is, the checksum encoding is based on RFC1864: the result of the MD5 algorithm output is 128-bits long. When network byte order (big-endian) is used for analysis, a 16-byte binary data sequence can be obtained. Next, encode the 16 bytes with the base64 algorithm to finally obtain the result which can be taken as the “Content-MD5” field value.


In the shell command example above, we first use the MD5 command to execute the MD5 algorithm for the pom.xml file contents and obtain a 128-bit binary number which can be expressed as 0xOGI4NWYzYWZkNWY2OTRmMzQzMmM5YzQ5YWM1N2Q3ZGYK Next, we use base64 to encode and obtain the final result. Then, what is the issue?
The issue is: base64 encodes “OGI4NWYzYWZkNWY2OTRmMzQzMmM5YzQ5YWM1N2Q3ZGYK” string rather than 0xOGI4NWYzYWZkNWY2OTRmMzQzMmM5YzQ5YWM1N2Q3ZGYK!


Here, it should be emphasized that the encoding is made for 128-bit. A script for calculating the Content-MD5 file is provided below:
#-*-coding:utf-8-*-
#!/bin/env python

import md5
import sys
import base64

def md5file(fobj):
    m = md5.new()
    while True:
        d = fobj.read(8096)
        if not d:
            break
        m.update(d)
    return (str)(base64.b64encode(m.digest()))


if __name__ == '__main__':
    fname = sys.argv[1]
    f = file(fname, 'rb')
    print '%s' % (md5file(f))
    f.close()
Guest