All Products
Search
Document Center

Intelligent Speech Interaction:RESTFUL API

Last Updated:Sep 26, 2023

The RESTful API of the short speech recognition service allows you to use the POST method to upload an audio file whose duration is up to 1 minute. The server returns the recognition result in JSON format in a response. You must make sure that the connection is not interrupted before the recognition result is returned.

Background information

  • Supports pulse-code modulation (PCM) encoded 16-bit mono audio files.

  • Supports the audio sampling rates of 8,000 Hz and 16,000 Hz.

  • Allows you to specify whether to add punctuation marks during post-processing and whether to convert Chinese numbers to Arabic numbers.

  • Allows you to configure hotwords and customize linguistic models in the Intelligent Speech Interaction console.

  • Supports multiple languages. You can specify the language to be recognized by selecting a model when you modify a project in the Intelligent Speech Interaction console. For more information, see Manage projects.

Prerequisites

  • A project is created in the Intelligent Speech Interaction console and the appkey of the project is obtained. For more information, see Create a project.

  • An access token that is obtained. For more information, see Obtain an access token.

Interaction process

The client sends an HTTP RESTful POST request with audio data to the server. Then, the server returns an HTTP response with the recognition result.

Interaction flowchart
Note

The server adds the task_id field to the response if the server encounters an error. The task_id field indicates the ID of the recognition task. You can record the task ID and submit a ticket to report the task ID and error message.

Service addresses

Type

Description

URL

Host

Access from external networks

You can use the URL to access the short speech recognition service from all clients over the Internet.

http://nls-gateway-ap-southeast-1.aliyuncs.com/stream/v1/asr

nls-gateway-ap-southeast-1.aliyuncs.com

Important

The following section describes how to access the short speech recognition service over the Internet. If your ECS instance is deployed in the China (Shanghai) region and you want to access the short speech recognition service over an internal network, use HTTP and replace the URL and host for Internet access with the URL and host for internal access.

Upload an audio file

The following example shows an HTTP request that is used to call the short speech recognition RESTful API:

POST /stream/v1/asr?appkey=23f5****&format=pcm&sample_rate=16000&enable_punctuation_prediction=true&enable_inverse_text_normalization=true HTTP/1.1
X-NLS-Token: 450372e4279bcc2b3c793****
Content-type: application/octet-stream
Content-Length: 94616
Host: nls-gateway-ap-southeast-1.aliyuncs.com

[audio data]

A complete request to call the short speech recognition RESTful API must contain the following elements: an HTTP request line, an HTTP request header, and an HTTP request body.

  • HTTP request line

    The HTTP request line contains a URL and request parameters, as shown in the following example:

    http://nls-gateway-ap-southeast-1.aliyuncs.com/stream/v1/asr?appkey=Yu1******uncS&format=pcm&sample_rate=16000&vocabulary_id=a17******d6b&customization_id=abd******ed8&enable_punctuation_prediction=true&enable_inverse_text_normalization=true&enable_voice_detection=true
    • URL

      Protocol

      URL

      Method

      HTTP/1.1

      nls-gateway-ap-southeast-1.aliyuncs.com/stream/v1/asr

      POST

    • Request parameters

      Parameter

      Type

      Required

      Description

      appkey

      String

      Yes

      The appkey of your project.

      format

      String

      No

      The audio encoding format. Valid values: pcm and opus. Default value: pcm.

      sample_rate

      Integer

      No

      The audio sampling rate. Valid values: 16,000 and 8,000. Default value: 16,000.

      vocabulary_id

      String

      No

      The ID of your hotword list. This parameter is not specified by default.

      customization_id

      String

      No

      The ID of your custom linguistic model. This parameter is not specified by default.

      enable_punctuation_prediction

      Boolean

      No

      Specifies whether to add punctuation marks during post-processing. Valid values: true and false. Default value: false.

      enable_inverse_text_normalization

      Boolean

      No

      Specifies whether to enable inverse text normalization (ITN) during post-processing. Valid values: true and false. Default value: false.

      enable_voice_detection

      Boolean

      No

      Specifies whether to enable voice detection. Valid values: true and false. Default value: false.

  • HTTP request header

    The HTTP request header consists of key-value pairs. Each key-value pair is a field that occupies a line. The key and the value in each pair are separated by a colon (:). The following table describes the fields of the HTTP request header.

    Field

    Type

    Required

    Description

    X-NLS-Token

    String

    Yes

    The token that is used for service authentication.

    Content-Type

    String

    Yes

    The content type. The value is application/octet-stream, which indicates that the data in the HTTP request body is a binary stream.

    Content-Length

    long

    Yes

    The length of the data in the HTTP request body. The length of the data is the same as the length of the audio file.

    Host

    String

    Yes

    The domain name of the server to which the HTTP request is sent. The value is nls-gateway-ap-southeast-1.aliyuncs.com.

  • The HTTP request body.

    The HTTP request body contains binary audio data. Therefore, you must set the Content-Type parameter in the HTTP request header to application/octet-stream.

Sample responses

After the client sends an HTTP request with audio data, the server returns a response with the recognition result in JSON format.

  • Success response

    {
        "task_id": "cf7b0c5339244ee29cd4e43fb97f****",
        "result": "Weather in Beijing",
        "status":20000000,
        "message":"SUCCESS"
    }

  • Error response

    The following example indicates an authentication token error:

    {
        "task_id": "8bae3613dfc54ebfa811a17d8a7a****",
        "result": "",
        "status": 40000001,
        "message": "Gateway:ACCESS_DENIED:The token 'c0c1e860f3*******de8091c68a' is invalid!"
    }

    Response parameters

    Parameter

    Type

    Description

    task_id

    String

    The task ID that must be 32 characters in length. You can record the task ID for troubleshooting.

    result

    String

    The speech recognition result.

    status

    Integer

    The status code.

    message

    String

    The description of the status code.

Status codes

Status code

Description

Solution

20000000

The request is successful.

None.

40000000

A client error occurred. This is the default status code for client errors.

Troubleshoot the error based on the error message or submit a ticket.

40000001

The client fails to pass the authentication.

Check whether the token that is used by the client is valid or expired.

40000002

The request is invalid.

Check whether the request that is sent by the client meets the requirements.

40000003

One or more parameters are invalid.

Check whether the parameter values are valid.

40000004

The client timed out.

Check whether the client did not send data to the server for a long period of time.

40000005

The number of requests exceeds the upper limit.

Check whether the number of concurrent connections or queries per second (QPS) value exceeds the upper limit.

41010101

The sampling rate of the uploaded audio file is not supported.

Check whether the sampling rate (8,000 or 16,000) that is specified in the code matches the sampling rate (8k or 16k) of the model to which the appkey is bound in the Intelligent Speech Interaction console.

50000000

A server error occurred. This is the default status code for server errors.

If the status code is returned a few times, ignore it. If the status code is returned multiple times, submit a ticket.

50000001

An internal gRPC call error occurred.

If the status code is returned a few times, ignore it. If the status code is returned multiple times, submit a ticket.

Quick test

Note

This demo uses a .wav audio file and a generic model. To use another audio file for the test, specify the audio coding format and sampling rate as required and select an appropriate model in the Intelligent Speech Interaction console.

  1. Download the nls-sample-16k.wav file.

  2. You can run the following cURL command to test the short speech recognition RESTful API:

    curl -X POST -H "X-NLS-Token: ${token}" http://nls-gateway-ap-southeast-1.aliyuncs.com/stream/v1/asr?appkey=${appkey} --data-binary @${audio_file}

    Example:

    curl -X POST -H "X-NLS-Token: 4a036531cfdd****" http://nls-gateway-ap-southeast-1.aliyuncs.com/stream/v1/asr?appkey=tt43P2u**** --data-binary @./nls-sample-16k.wav

Demo for Java

Dependencies:

<dependency>
    <groupId>com.squareup.okhttp3</groupId>
    <artifactId>okhttp</artifactId>
    <version>3.9.1</version>
</dependency>

<!-- http://mvnrepository.com/artifact/com.alibaba/fastjson -->
<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>fastjson</artifactId>
    <version>1.2.83</version>
</dependency>

Request and response:

import com.alibaba.fastjson.JSONPath;
import com.alibaba.nls.client.example.utils.HttpUtil;
import java.util.HashMap;
public class SpeechRecognizerRESTfulDemo {
    private String accessToken;
    private String appkey;

    public SpeechRecognizerRESTfulDemo(String appkey, String token) {
        this.appkey = appkey;
        this.accessToken = token;
    }

    public void process(String fileName, String format, int sampleRate,
                        boolean enablePunctuationPrediction,
                        boolean enableInverseTextNormalization,
                        boolean enableVoiceDetection) {

        /**
         * Configure an HTTP RESTful POST request:
         * 1. Specify the HTTP protocol. 
         * 2. Set the domain name of the short speech recognition service to nls-gateway-ap-southeast-1.aliyuncs.com. 
         * 3. Set the request path to /stream/v1/asr for short speech recognition API requests. 
         * 4. Configure the following required request parameters: appkey, format, and sample_rate. 
         * 5. Configure the following optional request parameters: enable_punctuation_prediction, enable_inverse_text_normalization, and enable_voice_detection. 
         */
        String url = "http://nls-gateway-ap-southeast-1.aliyuncs.com/stream/v1/asr";
        String request = url;
        request = request + "?appkey=" + appkey;
        request = request + "&format=" + format;
        request = request + "&sample_rate=" + sampleRate;
        if (enablePunctuationPrediction) {
            request = request + "&enable_punctuation_prediction=" + true;
        }
        if (enableInverseTextNormalization) {
            request = request + "&enable_inverse_text_normalization=" + true;
        }
        if (enableVoiceDetection) {
            request = request + "&enable_voice_detection=" + true;
        }

        System.out.println("Request: " + request);

        /**
         * Configure the HTTP request header:
         * 1. Configure authentication parameters. 
         * 2. Set the Content-Type parameter to application/octet-stream. 
         */
        HashMap<String, String> headers = new HashMap<String, String>();
        headers.put("X-NLS-Token", this.accessToken);
        headers.put("Content-Type", "application/octet-stream");

        /**
         * Send the HTTP POST request and process the response that is returned by the server. 
         */
        String response = HttpUtil.sendPostFile(request, headers, fileName);

        if (response != null) {
            System.out.println("Response: " + response);
            String result = JSONPath.read(response, "result").toString();
            System.out.println("Recognition result:" + result);
        }
        else {
            System.err.println("Recognition failed.");
        }

    }

    public static void main(String[] args) {
        if (args.length < 2) {
            System.err.println("SpeechRecognizerRESTfulDemo need params: <token> <app-key>");
            System.exit(-1);
        }

        String token = args[0];
        String appkey = args[1];

        SpeechRecognizerRESTfulDemo demo = new SpeechRecognizerRESTfulDemo(appkey, token);

        String fileName = SpeechRecognizerRESTfulDemo.class.getClassLoader().getResource("./nls-sample-16k.wav").getPath();
        String format = "pcm";
        int sampleRate = 16000;
        boolean enablePunctuationPrediction = true;
        boolean enableInverseTextNormalization = true;
        boolean enableVoiceDetection = false;

        demo.process(fileName, format, sampleRate, enablePunctuationPrediction, enableInverseTextNormalization, enableVoiceDetection);
    }
}

HttpUtils class:

import okhttp3.*;
import java.io.File;
import java.io.IOException;
import java.net.SocketTimeoutException;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.TimeUnit;

public class HttpUtil {

    private static String getResponseWithTimeout(Request q) {
        String ret = null;

        OkHttpClient.Builder httpBuilder = new OkHttpClient.Builder();
        OkHttpClient client = httpBuilder.connectTimeout(10, TimeUnit.SECONDS)
                .readTimeout(60, TimeUnit.SECONDS)
                .writeTimeout(60, TimeUnit.SECONDS)
                .build();

        try {
            Response s = client.newCall(q).execute();
            ret = s.body().string();
            s.close();
        } catch (SocketTimeoutException e) {
            ret = null;
            System.err.println("get result timeout");
        } catch (IOException e) {
            System.err.println("get result error " + e.getMessage());
        }

        return ret;
    }

    public static String sendPostFile(String url, HashMap<String, String> headers, String fileName) {
        RequestBody body;

        File file = new File(fileName);
        if (!file.isFile()) {
            System.err.println("The filePath is not a file: " + fileName);
            return null;
        } else {
            body = RequestBody.create(MediaType.parse("application/octet-stream"), file);
        }

        Headers.Builder hb = new Headers.Builder();
        if (headers != null && !headers.isEmpty()) {
            for (Map.Entry<String, String> entry : headers.entrySet()) {
                hb.add(entry.getKey(), entry.getValue());
            }
        }

        Request request = new Request.Builder()
                .url(url)
                .headers(hb.build())
                .post(body)
                .build();

        return getResponseWithTimeout(request);
    }

    public static String sendPostData(String url, HashMap<String, String> headers, byte[] data) {
        RequestBody body;

        if (data.length == 0) {
            System.err.println("The send data is empty.");
            return null;
        } else {
            body = RequestBody.create(MediaType.parse("application/octet-stream"), data);
        }

        Headers.Builder hb = new Headers.Builder();
        if (headers != null && !headers.isEmpty()) {
            for (Map.Entry<String, String> entry : headers.entrySet()) {
                hb.add(entry.getKey(), entry.getValue());
            }
        }

        Request request = new Request.Builder()
                .url(url)
                .headers(hb.build())
                .post(body)
                .build();

        return getResponseWithTimeout(request);
    }
}

Demo for C++

The demo for C++ uses the third-party library cURL to process HTTP requests and responses. Download the cURL library and demo.

The following table describes the content in the downloaded ZIP file.

File name or folder name

Description

CMakeLists.txt

The CMakeList file of the demo project.

demo

The restfulAsrDemo.cpp file is the demo of the short speech recognition RESTful API.

include

The curl folder is the header file directory of the cURL library.

lib

The lib folder contains the dynamic library curl-7.60. You can select one of the following versions based on the operating system:

  • Linux: Glibc 2.5, Gcc 4, or Gcc 5

  • Windows: Visual Studio 2013 or Visual Studio 2015

readme.txt

The description file.

release.log

The release notes.

version

The version number.

build.sh

The demo compilation script.

Compile and run the demo:

Assume that the demo is decompressed to the path/to directory. Run the following commands on the Linux terminal to compile and run the demo.

  • If the operating system supports CMake, run the following commands in sequence:

    1. Make sure that CMake 2.4 or later is installed in the Linux terminal.

    2. cd path/to/sdk/lib

    3. tar -zxvpf linux.tar.gz

    4. cd path/to/sdk

    5. ./build.sh

    6. cd path/to/sdk/demo

    7. ./restfulAsrDemo <your-token> <your-appkey>

  • If the operating system does not support CMake, run the following commands in sequence:

    1. cd path/to/sdk/lib

    2. tar -zxvpf linux.tar.gz

    3. cd path/to/sdk/demo

    4. g++ -o restfulAsrDemo restfulAsrDemo.cpp -I path/to/sdk/include -L path/to/sdk/lib/linux -lssl -lcrypto -lcurl -D_GLIBCXX_USE_CXX11_ABI=0

    5. export LD_LIBRARY_PATH=path/to/sdk/lib/linux/

    6. ./restfulAsrDemo <your-token> <your-appkey>

#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include "curl/curl.h"

using namespace std;

#ifdef _WIN32
string UTF8ToGBK(const string& strUTF8) {
    int len = MultiByteToWideChar(CP_UTF8, 0, strUTF8.c_str(), -1, NULL, 0);
    unsigned short * wszGBK = new unsigned short[len + 1];
    memset(wszGBK, 0, len * 2 + 2);

    MultiByteToWideChar(CP_UTF8, 0, (char*)strUTF8.c_str(), -1, (wchar_t*)wszGBK, len);

    len = WideCharToMultiByte(CP_ACP, 0, (wchar_t*)wszGBK, -1, NULL, 0, NULL, NULL);

    char *szGBK = new char[len + 1];
    memset(szGBK, 0, len + 1);
    WideCharToMultiByte(CP_ACP, 0, (wchar_t*)wszGBK, -1, szGBK, len, NULL, NULL);

    string strTemp(szGBK);
    delete[] szGBK;
    delete[] wszGBK;

    return strTemp;
}

#endif

/**
 * Configure the response callback function for HTTP requests that are sent to call the short speech recognition RESTful API.
 * The recognition result is a JSON string.
 */
size_t responseCallback(void* ptr, size_t size, size_t nmemb, void* userData) {
    string* srResult = (string*)userData;

    size_t len = size * nmemb;
    char *pBuf = (char*)ptr;
    string response = string(pBuf, pBuf + len);
#ifdef _WIN32
    response = UTF8ToGBK(response);
#endif
    cout << "current result: " << response << endl;

    *srResult += response;
    cout << "total result: " << *srResult << endl;

    return len;
}

int sendAsrRequest(const char* request, const char* token, const char* fileName, string* srResult) {
    CURL* curl = NULL;
    CURLcode res;

    /**
    * Read the audio file.
    */
    ifstream fs;
    fs.open(fileName, ios::out  ios::binary);
    if (!fs.is_open()) {
        cerr << "The audio file is not exist!" << endl;
        return -1;
    }
    stringstream buffer;
    buffer << fs.rdbuf();
    string audioData(buffer.str());

    curl = curl_easy_init();
    if (curl == NULL) {
        return -1;
    }

    /**
    * Configure the HTTP request line.
    */
    curl_easy_setopt(curl, CURLOPT_CUSTOMREQUEST, "POST");
    curl_easy_setopt(curl, CURLOPT_URL, request);

    /**
    * Configure the HTTP request header.
    */
    struct curl_slist* headers = NULL;
    // token
    string X_NLS_Token = "X-NLS-Token:";
    X_NLS_Token += token;
    headers = curl_slist_append(headers, X_NLS_Token.c_str());
    // Content-Type
    headers = curl_slist_append(headers, "Content-Type:application/octet-stream");
    // Content-Length
    string content_Length = "Content-Length:";
    ostringstream oss;
    oss << content_Length << audioData.length();
    content_Length = oss.str();
    headers = curl_slist_append(headers, content_Length.c_str());

    curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);

    /**
    * Configure the HTTP request body.
    */
    curl_easy_setopt(curl, CURLOPT_POSTFIELDS, audioData.c_str());
    curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, audioData.length());

    /**
    * Configure the response callback function for HTTP requests.
    */
    curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, responseCallback);
    curl_easy_setopt(curl, CURLOPT_WRITEDATA, srResult);

    /**
    * Send the HTTP request.
    */
    res = curl_easy_perform(curl);

    // Release resources.
    curl_slist_free_all(headers);
    curl_easy_cleanup(curl);

    if (res != CURLE_OK) {
        cerr << "curl_easy_perform failed: " << curl_easy_strerror(res) << endl;
        return -1;
    }

    return 0;
}

int process(const char* request, const char* token, const char* fileName) {
    // The demo is initialized only once.
    curl_global_init(CURL_GLOBAL_ALL);

    string srResult = "";
    int ret = sendAsrRequest(request, token, fileName, &srResult);

    curl_global_cleanup();

    return ret;
}

int main(int argc, char* argv[]) {
    if (argc < 3) {
        cerr << "params is not valid. Usage: ./demo your_token your_appkey" << endl;
        return -1;
    }

    string token = argv[1];
    string appKey = argv[2];

    string url = "http://nls-gateway-ap-southeast-1.aliyuncs.com/stream/v1/asr";
    string format = "pcm";
    int sampleRate = 16000;
    bool enablePunctuationPrediction = true;
    bool enableInverseTextNormalization = true;
    bool enableVoiceDetection = false;
    string fileName = "sample.pcm";

    /**
    * Configure the RESTful request parameters.
    */
    ostringstream oss;
    oss << url;
    oss << "?appkey=" << appKey;
    oss << "&format=" << format;
    oss << "&sample_rate=" << sampleRate;
    if (enablePunctuationPrediction) {
        oss << "&enable_punctuation_prediction=" << "true";
    }
    if (enableInverseTextNormalization) {
        oss << "&enable_inverse_text_normalization=" << "true";
    }
    if (enableVoiceDetection) {
        oss << "&enable_voice_detection=" << "true";
    }

    string request = oss.str();
    cout << "request: " << request << endl;

    process(request.c_str(), token.c_str(), fileName.c_str());

    return 0;
}

Demo for Python

Note

Use the httplib module for Python 2.x, and use the http.client module for Python 3.x.

# -*- coding: UTF-8 -*-
# Import the httplib module for Python 2.x.
# import httplib
# Import the http.client module for Python 3.x.
import http.client

import json

def process(request, token, audioFile) :

    # Read the audio file.
    with open(audioFile, mode = 'rb') as f:
        audioContent = f.read()

    host = 'nls-gateway-ap-southeast-1.aliyuncs.com'

    * Configure the HTTP request header.
    httpHeaders = {
        'X-NLS-Token': token,
        'Content-type': 'application/octet-stream',
        'Content-Length': len(audioContent)
        }


    # Use the httplib module for Python 2.x.
    # conn = httplib.HTTPConnection(host)

    # Use the http.client module for Python 3.x.
    conn = http.client.HTTPConnection(host)

    conn.request(method='POST', url=request, body=audioContent, headers=httpHeaders)

    response = conn.getresponse()
    print('Response status and response reason:')
    print(response.status ,response.reason)

    body = response.read()
    try:
        print('Recognize response is:')
        body = json.loads(body)
        print(body)

        status = body['status']
        if status == 20000000 :
            result = body['result']
            print('Recognize result: ' + result)
        else :
            print('Recognizer failed!')

    except ValueError:
        print('The response is not json format string')

    conn.close()



appKey = 'Your appkey'
token = 'Your token'

# Specify the service request address.
url = 'http://nls-gateway-ap-southeast-1.aliyuncs.com/stream/v1/asr'

# Specify the audio file.
audioFile = '/path/to/nls-sample-16k.wav'
format = 'pcm'
sampleRate = 16000
enablePunctuationPrediction  = True
enableInverseTextNormalization = True
enableVoiceDetection  = False

# Configure the RESTful request parameters.
request = url + '?appkey=' + appKey
request = request + '&format=' + format
request = request + '&sample_rate=' + str(sampleRate)

if enablePunctuationPrediction :
    request = request + '&enable_punctuation_prediction=' + 'true'

if enableInverseTextNormalization :
    request = request + '&enable_inverse_text_normalization=' + 'true'

if enableVoiceDetection :
    request = request + '&enable_voice_detection=' + 'true'

print('Request: ' + request)


process(request, token, audioFile)

Demo for PHP

Note

The PHP demo uses cURL functions. Make sure that you have installed a PHP version later than 4.0.2 and cURL extensions.

<?php

function process($token, $request, $audioFile) {
    /**
     * Read the audio file.
     */
    $audioContent = file_get_contents($audioFile);
    if ($audioContent == FALSE) {
        print "The audio file is not exist!\n";
        return;
    }

    $curl = curl_init();
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_TIMEOUT, 120);

    /**
     * Configure the HTTP request line.
     */
    curl_setopt($curl, CURLOPT_URL, $request);
    curl_setopt($curl, CURLOPT_POST,TRUE);

    /**
     * Configure the HTTP request header.
     */
    $contentType = "application/octet-stream";
    $contentLength = strlen($audioContent);
    $headers = array(
        "X-NLS-Token:" . $token,
        "Content-type:" . $contentType,
        "Content-Length:" . strval($contentLength)
    );
    curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);

    /**
     * Configure the HTTP request body.
     */
    curl_setopt($curl, CURLOPT_POSTFIELDS, $audioContent);
    curl_setopt($curl, CURLOPT_NOBODY, FALSE);

    /**
     * Send the HTTP request.
     */
    $returnData = curl_exec($curl);

    curl_close($curl);

    if ($returnData == FALSE) {
        print "curl_exec failed!\n";
        return;
    }

    print $returnData . "\n";

    $resultArr = json_decode($returnData, true);

    $status = $resultArr["status"];
    if ($status == 20000000) {
        $result = $resultArr["result"];
        print "The audio file recognized result: " . $result . "\n";
    }
    else {
        print "The audio file recognized failed.\n";
    }

}


$appkey = "Your appkey";
$token = 'Your token';

$url = "http://nls-gateway-ap-southeast-1.aliyuncs.com/stream/v1/asr";
$audioFile = "/path/to/nls-sample-16k.wav";
$format = "pcm";
$sampleRate = 16000;
$enablePunctuationPrediction = TRUE;
$enableInverseTextNormalization = TRUE;
$enableVoiceDetection = FALSE;

/**
 * Configure the RESTful request parameters.
 */
$request = $url;
$request = $request . "?appkey=" . $appkey;
$request = $request . "&format=" . $format;
$request = $request . "&sample_rate=" . strval($sampleRate);
if ($enablePunctuationPrediction) {
    $request = $request . "&enable_punctuation_prediction=" . "true";
}
if ($enableInverseTextNormalization) {
    $request = $request . "&enable_inverse_text_normalization=" . "true";
}
if ($enableVoiceDetection) {
    $request = $request . "&enable_voice_detection=" . "true";
}

print "Request: " . $request . "\n";

process($token, $request, $audioFile);

?>

Demo for Node.js

Note

To install the request dependencies, run the npm install request --save command in the directory of the demo.

const request = require('request');
const fs = require('fs');


function callback(error, response, body) {
        if (error != null) {
                console.log(error);
        }
        else {
                console.log('The audio file recognized result:');
                console.log(body);
                if (response.statusCode == 200) {
                        body = JSON.parse(body);
                        if (body.status == 20000000) {
                                console.log('result: ' + body.result);
                                console.log('The audio file recognized succeed!');
                        } else {
                                console.log('The audio file recognized failed!');
                        }
                } else {
                        console.log('The audio file recognized failed, http code: ' + response.statusCode);
                }
        }
}

function process(requestUrl, token, audioFile) {
        /**
         * Read the audio file.
        */
        var audioContent = null;
        try {
                audioContent = fs.readFileSync(audioFile);
        } catch(error) {
                if (error.code == 'ENOENT') {
                        console.log('The audio file is not exist!');
                }
                return;
        }

        /**
         * Configure the HTTP request header.
        */
        var httpHeaders = {
                'X-NLS-Token': token,
                'Content-type': 'application/octet-stream',
                'Content-Length': audioContent.length
        };

        var options = {
                url: requestUrl,
                method: 'POST',
                headers: httpHeaders,
                body: audioContent
        };

        request(options, callback);
}


var appkey = 'Your appkey';
var token = 'Your token';

var url = 'http://nls-gateway-ap-southeast-1.aliyuncs.com/stream/v1/asr';
var audioFile = '/path/to/nls-sample-16k.wav';
var format = 'pcm';
var sampleRate = '16000';
var enablePunctuationPrediction = true;
var enableInverseTextNormalization = true;
var enableVoiceDetection  = false;

/**
 * Configure the RESTful request parameters.
 */
var requestUrl = url;
requestUrl = requestUrl + '?appkey=' + appkey;
requestUrl = requestUrl + '&format=' + format;
requestUrl = requestUrl + '&sample_rate=' + sampleRate;
if (enablePunctuationPrediction) {
        requestUrl = requestUrl + '&enable_punctuation_prediction=' + 'true';
}
if (enableInverseTextNormalization) {
        requestUrl = requestUrl + '&enable_inverse_text_normalization=' + 'true';
}
if (enableVoiceDetection) {
        requestUrl = requestUrl + '&enable_voice_detection=' + 'true';
}

process(requestUrl, token, audioFile);

Demo for .NET

Note

The demo relies on System.Net.Http and Newtonsoft.Json.Linq.

using System;
using System.Net.Http;
using System.IO;
using Newtonsoft.Json.Linq;

namespace RESTfulAPI
{
    class SpeechRecognizerRESTfulDemo
    {
        private string token;
        private string appkey;

        public SpeechRecognizerRESTfulDemo(string appkey, string token)
        {
            this.appkey = appkey;
            this.token = token;
        }

        public void process(string fileName, string format, int sampleRate,
            bool enablePunctuationPrediction,
            bool enableInverseTextNormalization,
            bool enableVoiceDetection)
        {
            /**
             * Configure an HTTP REST POST request.
             * 1. Specify the HTTP protocol.
             * 2. Set the domain name of the short speech recognition service to nls-gateway-ap-southeast-1.aliyuncs.com.
             * 3. Set the request path to /stream/v1/asr for short speech recognition API requests.
             * 4. Configure the following required request parameters: appkey, format, and sample_rate. 
             * 5. Configure the following optional request parameters: enable_punctuation_prediction, enable_inverse_text_normalization, and enable_voice_detection.
             */
            string url = "http://nls-gateway-ap-southeast-1.aliyuncs.com/stream/v1/asr";
            url = url + "?appkey=" + appkey;
            url = url + "&format=" + format;
            url = url + "&sample_rate=" + sampleRate;
            if (enablePunctuationPrediction)
            {
                url = url + "&enable_punctuation_prediction=" + true;
            }
            if (enableInverseTextNormalization)
            {
                url = url + "&enable_inverse_text_normalization=" + true;
            }
            if (enableVoiceDetection)
            {
                url = url + "&enable_voice_detection=" + true;
            }

            System.Console.WriteLine("URL: " + url);


            HttpClient client = new HttpClient();
            /**
             * Configure the HTTP request header.
             * Configure authentication parameters.
             */
            client.DefaultRequestHeaders.Add("X-NLS-Token", token);
            if (!File.Exists(fileName))
            {
                System.Console.WriteLine("The audio file dose not exist");
                return;
            }
            byte[] audioData = File.ReadAllBytes(fileName);

            /**
             * Configure the HTTP request Body.
             * The binary data of the audio file.
             * Set the Content-Type parameter to application/octet-stream.
             */
            ByteArrayContent content = new ByteArrayContent(audioData);
            content.Headers.Add("Content-Type", "application/octet-stream");

            /**
             * Send the HTTP POST request and process the response that is returned by the server. 
             */
            HttpResponseMessage response = client.PostAsync(url, content).Result;

            string responseBodyAsText = response.Content.ReadAsStringAsync().Result;
            System.Console.WriteLine("Response: " + responseBodyAsText);

            if (response.IsSuccessStatusCode)
            {
                JObject obj = JObject.Parse(responseBodyAsText);
                string result = obj["result"].ToString();
                System.Console.WriteLine("Recognition result:" + result);
            }
            else
            {
                System.Console.WriteLine("Response status code and reason phrase: " +
                    response.StatusCode + " " + response.ReasonPhrase);
                System.Console.WriteLine("Recognition failed.");
            }

        }

        static void Main(string[] args)
        {
            if (args.Length < 2)
            {
                System.Console.WriteLine("SpeechRecognizerRESTfulDemo need params: <token> <app-key>");
                return;
            }

            string token = args[0];
            string appkey = args[1];

            SpeechRecognizerRESTfulDemo demo = new SpeechRecognizerRESTfulDemo(appkey, token);

            string fileName = "nls-sample-16k.wav";
            string format = "pcm";
            int sampleRate = 16000;
            bool enablePunctuationPrediction = true;
            bool enableInverseTextNormalization = true;
            bool enableVoiceDetection = false;

            demo.process(fileName, format, sampleRate,
                enablePunctuationPrediction, enableInverseTextNormalization, enableVoiceDetection);

        }
    }
}

Demo for Go

package main

import (
    "fmt"
        "encoding/json"
        "net/http"
        "io/ioutil"
        "strconv"
        "bytes"
)

func process(appkey string, token string, fileName string, format string, sampleRate int,
             enablePunctuationPrediction bool, enableInverseTextNormalization bool, enableVoiceDetection bool) {
                         
    /**
     * Configure an HTTP REST POST request:
     * 1. Specify the HTTP protocol.
     * 2. Set the domain name of the short speech recognition service to nls-gateway-ap-southeast-1.aliyuncs.com.
     * 3. Set the request path to /stream/v1/asr for short speech recognition API requests.
     * 4. Configure the following required request parameters: appkey, format, and sample_rate. 
     * 5. Configure the following optional request parameters: enable_punctuation_prediction, enable_inverse_text_normalization, and enable_voice_detection.
     */                 
        var url string = "http://nls-gateway-ap-southeast-1.aliyuncs.com/stream/v1/asr"
        url = url + "?appkey=" + appkey
        url = url + "&format=" + format
        url = url + "&sample_rate=" + strconv.Itoa(sampleRate)
        if (enablePunctuationPrediction) {
            url = url + "&enable_punctuation_prediction=" + "true"
        }
        if (enableInverseTextNormalization) {
            url = url + "&enable_inverse_text_normalization=" + "true"
        }
        if (enableVoiceDetection) {
            url = url + "&enable_voice_detection=" + "false"
        }

    fmt.Println(url)
        
        /**
         * Retrieve the audio data and add the audio data to the HTTP request body. 
         . */
        audioData, err := ioutil.ReadFile(fileName)
        if err != nil {
            panic(err)
        }
        
        request, err := http.NewRequest("POST", url, bytes.NewBuffer(audioData))
        if err != nil {
            panic(err)
        }
        
        /**
     * Configure the HTTP request header:
     * 1. Configure authentication parameters. 
     * 2. Set the Content-Type parameter to application/octet-stream. 
     */
        request.Header.Add("X-NLS-Token", token)
        request.Header.Add("Content-Type", "application/octet-stream")
        
        /**
     * Send the HTTP POST request and process the response that is returned by the server. 
     */
        client := &http.Client{}
        response, err := client.Do(request)
        if err != nil {
            panic(err)
        }
        
        defer response.Body.Close()
        
        body, _ := ioutil.ReadAll(response.Body)
        fmt.Println(string(body))
        
        statusCode := response.StatusCode
        if (statusCode == 200) {
            var resultMap map[string]interface{}
            err = json.Unmarshal([]byte(body), &resultMap)
            if err != nil {
                panic(err)
            }
        
            var result string = resultMap["result"].(string)
            fmt.Println("Recognition succeeded:" + result)
        } else {
            fmt.Println("Recognition failed. HTTP StatusCode: " + strconv.Itoa(statusCode))
        }
        
}

func main() {

    var appkey string = "Your appkey"
        var token string = "Your token"
        
        var fileName string = "nls-sample-16k.wav"
        var format string = "pcm"
        var sampleRate int = 16000
        var enablePunctuationPrediction bool = true
        var enableInverseTextNormalization bool = true
        var enableVoiceDetection = false
        
        process(appkey, token, fileName, format, sampleRate, enablePunctuationPrediction, enableInverseTextNormalization, enableVoiceDetection)
        

}