Methods to improve efficiency for large repositories - Alibaba Cloud DevOps

If your code repositories have large size or your network connections are unstable, pushing and pulling repositories can be time-consuming, thereby potentially causing issues such as timeouts. Codeup supports features, such as Git Large File Storage (LFS), partial clone, and shallow clone, to mitigate these issues, reduce local disk space usage, shorten the time required for repository operations, and enhance the efficiency of managing large repositories. This topic describes how to manage large repositories.

Git LFS

Scenarios

Git-LFS is highly recommended for code repositories with the following characteristics:

Rapid repository size increase due to non-text files committed to a code repository

In some scenarios, you must commit non-text files to your code repository, including art resources, algorithm models, and compilation artifacts. Additionally, binary files in the repository are larger and harder to be compressed, and Git stores each historical version of these files. These factors cause a rapid repository size increase. When you clone the repository, all historical versions of all files are automatically downloaded, leading to the longer cloning time.

By using Git LFS, you can efficiently manage files with specific extensions and avoid the download of their historical versions during cloning, thereby significantly reducing the data volume and time required for the cloning process.

Usage notes

For more information about how to use Git LFS, see the following topics:

Shallow clone scenarios: application building

Application building typically requires only the latest code without the need to take a long time for downloading the entire code. The shallow clone feature allows you to truncate the historical repository data and download only the commits within a specified depth for building the latest version of applications, thereby significantly reducing pull time. For more information about how to use shallow clone, see git clone help documentation.

Partial clone scenarios

The partial clone feature allows you to clone only some repositories. Unlike shallow clone, partial clone retains the full repository history data. Missing files are automatically downloaded as needed. You can use partial clone in the following scenarios:

Large repository size

Even though no binary files are committed to repositories, the repository size gradually increases as developers make continuous commits, leading to the longer cloning time. Developers often work with the latest code and only occasionally need historical data. Partial clone allows you to filter out historical versions during cloning for expediting the process and to download historical data if necessary.

Monorepo for microservices

In monorepo mode, all code is stored in a single large repository. For example, Android development uses this mode. Partial clone and sparse checkout enable team members to download only some desired code without the need to download the entire code each time.

For more information about partial clone, see Introduction to partial clone.

Summary

The following features are not limited to the preceding scenarios and can be combined flexibly based on your business requirements:

Git LFS is suitable for scenarios where the repository size rapidly increases due to various binary files. If your code repository exceeds the Git capacity threshold in size, you must move binary files to Git LFS for management.
Shallow clone is suitable for scenarios where you must clone a repository within a specified depth without the need for the full repository history data, especially for accelerating building.
Partial clone retains the repository history data and allows you to download objects by type or directory in combination with sparse checkout. Missing objects are automatically downloaded as needed.