Implement a large file slice upload breakpoint resume by yourself

Implement a large file slice upload + breakpoint resume upload by yourself


PM: Hey, that Chetu boy, I have a 100G video to upload here, you can help me make an upload background, and give it to me before get off work, thank you.
I:. . .
I believe that every cutting engineer has been exposed to the needs of file uploading. For general small files, we directly use the input file, and then construct a new FormData() object and throw it to the backend. If you use a ui library such as Ant design or element ui, it is simpler, just call the api directly. Of course, for more complicated ones, there are also many excellent third-party plug-ins on the market, such as WebUploader. But as an aspiring engineer, how can we just be satisfied with using plug-ins? Today we will implement one by ourselves.
First, let's analyze the requirements
An upload component that needs to have the following functions:

Need to verify file format
Any file can be uploaded, including very large video files (slices)
After the network is disconnected during uploading, you can continue uploading if you connect to the Internet again (resuming the upload from a breakpoint)
Have a progress bar
After the same file has been uploaded, the upload is completed directly (second upload)

Front-end and back-end division of labor:

front end:


file format check
File slicing, md5 calculation
Initiate a check request, send the hash of the current file to the server, and check whether there is a file with the same hash
Upload progress calculation
Notify the backend to merge the slices after the upload is complete


rear end:


Check if the received hash has the same file, and notify the front end if the current hash has an unfinished upload
receive slice
Merge all slices

The architecture diagram is as follows

Next, start the specific implementation
1. Format verification
For uploaded files, in general, we need to verify the format of the file, and we only need to obtain the suffix (extension) of the file to determine whether it meets our upload restrictions:
//file path
var filePath = "file://upload/test.png";
//Get the position of the last.
var index= filePath.lastIndexOf(".");
//get suffix
var ext = filePath.substr(index+1);
// output result
console.log(ext);
// output: png

However, this method has a drawback, that is, we can tamper with the suffix name of the file, such as: test.mp4, we can modify its suffix name: test.mp4 -> test.png, so that we can bypass the restriction. upload. Are there stricter restrictions? Of course there is.
That is to identify the real file type by looking at the binary data of the file, because when the computer recognizes the file type, it is not really identified by the suffix name of the file, but by the "Magic Number" (Magic Number) to distinguish, For some types of files, the contents of the first few bytes are fixed, and the type of the file can be determined according to the contents of these bytes. With the help of a hexadecimal editor, you can view the binary data of the picture. Let's take test.png as an example:

As can be seen from the above figure, the first 8 bytes of the PNG type image are 0x89 50 4E 47 0D 0A 1A 0A. Based on this result, we can check the file format accordingly, taking the vue project as an example:




The above is the method of verifying the file type. For other types of files, such as mp4, xsl, etc., if you are interested, you can also use tools to view their binary data to do format verification.
The following are the binary identifiers of some of the aggregated files:
1.JPEG/JPG - File header identifier (2 bytes): ff, d8 File end identifier (2 bytes): ff, d9
2.TGA - uncompressed first 5 bytes 00 00 02 00 00 - RLE compressed first 5 bytes 00 00 10 00 00
3.PNG - File Header Identification (8 bytes) 89 50 4E 47 0D 0A 1A 0A
4.GIF - File Header Identification (6 bytes) 47 49 46 38 39(37) 61
5.BMP - File Header Identification (2 bytes) 42 4D B M
6.PCX - File Header Identification (1 bytes) 0A
7.TIFF - file header identifier (2 bytes) 4D 4D or 49 49
8.ICO - File Header Identification (8 bytes) 00 00 01 00 01 00 20 20
9.CUR - File Header Identification (8 bytes) 00 00 02 00 01 00 20 20
10.IFF - File Header Identification (4 bytes) 46 4F 52 4D
11.ANI - File Header Identification (4 bytes) 52 49 46 46

2. File Slicing
Suppose we want to divide a 1G video into 1MB slices, define DefualtChunkSize = 1 * 1024 * 1024, and use spark-md5 to calculate the hash value of the file content. Then how to split the file, use the method File.prototype.slice of the file object File.
It should be noted that when cutting a larger file, such as 10G, if it is divided into 1Mb size, 10,000 slices will be generated. As we all know, js is a single-threaded model. If this calculation process is in the main thread, then we 's page will inevitably crash directly. At this time, it is time for our Web Worker to come on stage.
The role of Web Worker is to create a multi-threaded environment for JavaScript, allowing the main thread to create Worker threads and assign some tasks to the latter to run. While the main thread is running, the worker thread is running in the background without interfering with each other. Students who do not know the specific role can learn it by themselves. It will not be discussed here.
The following are some key codes:
// upload.js

// create a worker object
const worker = new worker('worker.js')
// Send a message to the child thread, pass in the file object and slice size, and start calculating the split slice
worker.postMessage(file, DefualtChunkSize)

// After the calculation of the child thread is completed, the slice will be returned to the main thread
worker.onmessage = (chunks) => {
...
} }

Child thread code:
// worker.js

// Receive file object and slice size
onmessage (file, DefualtChunkSize) => {
let blobSlice = File.prototype.slice || File.prototype.mozSlice || File.prototype.webkitSlice,
Chunks = Math.ceil(file.size / DefualtChunkSize),
currentChunk = 0,
spark = new SparkMD5.ArrayBuffer(),
fileReader = new FileReader();

fileReader.onload = function (e) {
console.log('read chunk nr', currentChunk + 1, 'of');

const chunk = e.target.result;
spark.append(chunk);
currentChunk++;

if (currentChunk < chunks) {
loadNext();
} else {
let fileHash = spark.end();
console.info('finished computed hash', fileHash);
// This is the key point. After the calculation is completed, the main thread is still notified through postMessage
postMessage({ fileHash, fileReader })
} }
};

fileReader.onerror = function () {
console.warn('oops, something went wrong.');
};

function loadNext() {
let start = currentChunk * DefualtChunkSize,
end = ((start + DefualtChunkSize) >= file.size) ? file.size : start + DefualtChunkSize;
let chunk = blobSlice.call(file, start, end);
fileReader.readAsArrayBuffer(chunk);
} }

loadNext();
} }

Using the worker thread above, we can get the calculated slice and the md5 value.
3. Breakpoint resume + second upload + upload progress
After getting the slice and md5, we first go to the server to check whether the current file already exists.

If it already exists and the file has been uploaded successfully, it will directly return to the front-end that the upload is successful, and "second upload" can be realized.
If it already exists and some of the slices fail to upload, it will return the slice name that has been uploaded successfully to the front end. After the front end gets it, it will calculate the remaining slices that have not been uploaded successfully according to the returned slices, and then continue to upload the remaining slices.Upload, you can realize "breakpoint resume".
If it does not exist, start uploading. It should be noted here that when uploading slices concurrently, you need to control the amount of concurrency to avoid uploading too many slices at one time, causing crashes.

// Check if the same file already exists
async function checkAndUploadChunk(chunkList, fileMd5Value) {
const requestList = []
// If it doesn't exist, upload it
for (let i = 0; i < chunkList; i++) {
requestList.push(upload({ chunkList[i], fileMd5Value, i }))
} }

// concurrent upload
if (requestList?.length) {
await Promise.all(requestList)
} }
} }

// upload chunk
function upload({ chunkList, chunk, fileMd5Value, i }) {
current = 0
let form = new FormData()
form.append("data", chunk) //slice stream
form.append("total", chunkList.length) //total number of pieces
form.append("index", i) //the current number of pieces
form.append("fileMd5Value", fileMd5Value)
return axios({
method: 'post',
url: BaseUrl + "/upload",
data: form
}).then(({ data }) => {
if (data.stat) {
current = current + 1
// Get the progress of uploading
const uploadPercent = Math.ceil((current / chunkList.length) * 100)
} }
})
} }

After all the slices are uploaded, send an upload completion request to the backend, that is, notify the backend to merge all slices, and finally complete the entire upload process.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00