To build a knowledge base, you must first upload your documents from local storage or Object Storage Service (OSS) to Model Studio. This topic describes how to upload documents using the API.
RAM user: RAM users must first obtain the AliyunBailianDataFullAccess system policy and join a workspace. If you are using the Alibaba Cloud account, this is not required.
If you are not familiar with the concept of RAM user, see Permissions.
API does not support structured data tables: You must use the Application Data of the console to create structured data tables and upload data. You can also associate your knowledge base with ApsaraDB RDS to implement automatic update. For more information, see Step 2: Create a knowledge base.
Maximum upload limit: You can upload up to 100,000 documents in each workspace.
We recommend that you use the latest version of GenAI Service Platform SDK to call the interfaces in this topic.
Procedure
Follow the steps below to upload a file to Model Studio:
1. Request upload lease
Use the SDK to call ApplyFileUploadLease, which provides the URL (lease) for upload and the required parameters.
About online debugging and sample code
You can debug online and generate multi-language code samples in real-time.
The FileName field must match the document name
The
FileName
in the request parameters of ApplyFileUploadLease must match the actual document name, including the suffix. Otherwise an error will occur.About the MD5 field value
The
Md5
in the request parameters of ApplyFileUploadLease represents the MD5 hash of your document. It is used for integrity verification. You can use hashlib of Python or MessageDigest of Java to calculate this value. Use similar methods for other languages.Expand the panel below to view sample code for generating document MD5 value in Python and Java.
The above sample code returns the MD5 value of the target document. For example:
The MD5 value of the document is: c0c07fb456057128540a91ea9b06666c
Sample response for a successful ApplyFileUploadLease call
Keep the values of
Data.FileUploadLeaseId
,Data.Param.Method
,Data.Param.Url
, as well asX-bailian-extra
andContent-Type
ofData.Param.Headers
from the response for later use.Data.Param.Url
is the lease. It is valid for a few minutes. Upload the document promptly.To update a document in the knowledge base, call ApplyFileUploadLease again for a new set of request parameters. Take the following steps to upload the updated document and then, import it into the knowledge base.
{ "RequestId": "778C0B3B-59C2-5FC1-A947-36EDD1xxxxxx", "Success": true, "Message": "", "Code": "success", "Status": "200", "Data": { "FileUploadLeaseId": "1e6a159107384782be5e45ac4759b247.1719325231035", "Type": "HTTP", "Param": { "Method": "PUT", "Url": "https://bailian-datahub-data-origin-prod.oss-cn-hangzhou.aliyuncs.com/1005426495169178/10024405/68abd1dea7b6404d8f7d7b9f7fbd332d.1716698936847.pdf?Expires=1716699536&OSSAccessKeyId=TestID&Signature=HfwPUZo4pR6DatSDym0zFKVh9Wg%3D", "Headers": " \"X-bailian-extra\": \"MTAwNTQyNjQ5NTE2OTE3OA==\",\n \"Content-Type\": \"application/pdf\"" } } }
2. Upload the document to temporary storage
Use Data.FileUploadLeaseId
, Data.Param.Method
, Data.Param.Url
, as well as X-bailian-extra
and Content-Type
of Data.Param.Headers
, provided in step 1, to upload your document from a local source or OSS to the temporary storage of Model Studio. Below is the sample code.
No online debugging
This step's code is not included in the SDK, so online debugging and automatic generation of multi-language sample code are not available. You need to manually write the code, referring to the sample code below.
About pre_signed_url
The
pre_signed_url
shown in the sample code below represents theUrl
field inData.Param
returned by ApplyFileUploadLease in step 1.
Python
Sample code
# Sample code is for reference only. Do not use it directly in production environments
import requests
from urllib.parse import urlparse
def upload_file(pre_signed_url, file_path):
try:
# Set request headers
headers = {
"X-bailian-extra": "Replace with the value of the X-bailian-extra field in Data.Param.Headers returned by the ApplyFileUploadLease interface in the previous step",
"Content-Type": "Replace with the value of the Content-Type field in Data.Param.Headers returned by the ApplyFileUploadLease interface in the previous step"
}
# Read and upload the document
with open(file_path, 'rb') as file:
# Set the request method for document upload, which must match the value of the Method field in Data.Param returned by the ApplyFileUploadLease interface in the previous step
response = requests.put(pre_signed_url, data=file, headers=headers)
# Check response status code
if response.status_code == 200:
print("File uploaded successfully.")
else:
print(f"Failed to upload the file. ResponseCode: {response.status_code}")
except Exception as e:
print(f"An error occurred: {str(e)}")
def upload_file_link(pre_signed_url, source_url_string):
try:
# Set request headers
headers = {
"X-bailian-extra": "Replace with the value of the X-bailian-extra field in Data.Param.Headers returned by the ApplyFileUploadLease interface in the previous step",
"Content-Type": "Replace with the value of the Content-Type field in Data.Param.Headers returned by the ApplyFileUploadLease interface in the previous step"
}
# Set the request method for accessing OSS to GET
source_response = requests.get(source_url_string)
if source_response.status_code != 200:
raise RuntimeError("Failed to get source file.")
# Set the request method for document upload, which must match the value of the Method field in Data.Param returned by the ApplyFileUploadLease interface in the previous step
response = requests.put(pre_signed_url, data=source_response.content, headers=headers)
# Check response status code
if response.status_code == 200:
print("File uploaded successfully.")
else:
print(f"Failed to upload the file. ResponseCode: {response.status_code}")
except Exception as e:
print(f"An error occurred: {str(e)}")
if __name__ == "__main__":
pre_signed_url_or_http_url = "Replace with the value of the Url field in Data.Param returned by the ApplyFileUploadLease interface in the previous step"
# The document source can be local, upload local documents to Model Studio temporary storage
file_path = "Replace with the actual local path of the document you need to upload"
upload_file(pre_signed_url_or_http_url, file_path)
# The document source can also be Alibaba Cloud Object Storage Service (OSS)
# file_path = "Replace with the actual publicly accessible address of the document you need to upload from Alibaba Cloud OSS"
# upload_file_link(pre_signed_url_or_http_url, file_path)
Java
Sample code
// Sample code is for reference only. Do not use it directly in production environments
import java.io.BufferedInputStream;
import java.io.DataOutputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.URL;
public class UploadFile{
public static void uploadFile(String preSignedUrl, String filePath) {
HttpURLConnection connection = null;
try {
// Create URL object
URL url = new URL(preSignedUrl);
connection = (HttpURLConnection) url.openConnection();
// Set the request method for document upload, which must match the value of the Method field in Data.Param returned by the ApplyFileUploadLease interface in the previous step
connection.setRequestMethod("PUT");
// Allow output to the connection, as this connection is used for document upload
connection.setDoOutput(true);
connection.setRequestProperty("X-bailian-extra", "Replace with the value of the X-bailian-extra field in Data.Param.Headers returned by the ApplyFileUploadLease interface in the previous step");
connection.setRequestProperty("Content-Type", "Replace with the value of the Content-Type field in Data.Param.Headers returned by the ApplyFileUploadLease interface in the previous step");
// Read and upload the document through the connection
try (DataOutputStream outStream = new DataOutputStream(connection.getOutputStream());
FileInputStream fileInputStream = new FileInputStream(filePath)) {
byte[] buffer = new byte[4096];
int bytesRead;
while ((bytesRead = fileInputStream.read(buffer)) != -1) {
outStream.write(buffer, 0, bytesRead);
}
outStream.flush();
}
// Check response
int responseCode = connection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
// Document upload successful
System.out.println("File uploaded successfully.");
} else {
// Document upload failed
System.out.println("Failed to upload the file. ResponseCode: " + responseCode);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
if (connection != null) {
connection.disconnect();
}
}
}
public static void uploadFileLink(String preSignedUrl, String sourceUrlString) {
HttpURLConnection connection = null;
try {
// Create URL object
URL url = new URL(preSignedUrl);
connection = (HttpURLConnection) url.openConnection();
// Set the request method for document upload, which must match the value of the Method field in Data.Param returned by the ApplyFileUploadLease interface in the previous step
connection.setRequestMethod("PUT");
// Allow output to the connection, as this connection is used for document upload
connection.setDoOutput(true);
connection.setRequestProperty("X-bailian-extra", "Replace with the value of the X-bailian-extra field in Data.Param.Headers returned by the ApplyFileUploadLease interface in the previous step");
connection.setRequestProperty("Content-Type", "Replace with the value of the Content-Type field in Data.Param.Headers returned by the ApplyFileUploadLease interface in the previous step");
URL sourceUrl = new URL(sourceUrlString);
HttpURLConnection sourceConnection = (HttpURLConnection) sourceUrl.openConnection();
// Set the request method for accessing OSS to GET
sourceConnection.setRequestMethod("GET");
// Get response code, 200 indicates a successful request
int sourceFileResponseCode = sourceConnection.getResponseCode();
// Read the document from OSS and upload through the connection
if (sourceFileResponseCode != HttpURLConnection.HTTP_OK){
throw new RuntimeException("Failed to get source file.");
}
try (DataOutputStream outStream = new DataOutputStream(connection.getOutputStream());
InputStream in = new BufferedInputStream(sourceConnection.getInputStream())) {
byte[] buffer = new byte[4096];
int bytesRead;
while ((bytesRead = in.read(buffer)) != -1) {
outStream.write(buffer, 0, bytesRead);
}
outStream.flush();
}
// Check response
int responseCode = connection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
// Document upload successful
System.out.println("File uploaded successfully.");
} else {
// Document upload failed
System.out.println("Failed to upload the file. ResponseCode: " + responseCode);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
if (connection != null) {
connection.disconnect();
}
}
}
public static void main(String[] args) {
String preSignedUrlOrHttpUrl = "Replace with the value of the Url field in Data.Param returned by the ApplyFileUploadLease interface in the previous step";
// The document source can be local, upload local documents to Model Studio temporary storage
String filePath = "Replace with the actual local path of the document you need to upload";
uploadFile(preSignedUrlOrHttpUrl, filePath);
// The document source can also be OSS
// String filePath = "Replace with the actual publicly accessible address of the document you need to upload from OSS";
// uploadFileLink(preSignedUrlOrHttpUrl, filePath);
}
}
3. Add the document to Data Management
After uploading your document to Model Studio's temporary storage, it will be retained there for 12 hours. Use the SDK to call AddFile and upload the document to Application Data promptly.
About online debugging and sample code
You can debug online and generate multi-language code samples in real-time.
About LeaseId in the request parameters
The
LeaseId
in the request parameters of AddFile is theData.FileUploadLeaseId
returned by ApplyFileUploadLease.About CategoryType in the request parameters
Because the document is for knowledge base building,
CategoryType
can be omitted or set to UNSTRUCTURED.Do not submit again
Once AddFile is successfully called, the
LeaseId
becomes invalid immediately. Do not resubmit using the same lease ID.
4. Check document parsing status
After successfully calling AddFile, Model Studio will initiate the document upload and parsing process. This may take some time (up to hours during peak periods). In the meantime, you can go to Application Data to check its status or use the SDK to call DescribeFile. Once the document is uploaded, the Data.Status
returned by DescribeFile becomes PARSE_SUCCESS. Then, you can import it into your knowledge base.
About online debugging and sample code
You can debug online and generate multi-language code samples in real-time.