All Products
Search
Document Center

Intelligent Speech Interaction:NUI SDK for iOS

更新时间:Apr 06, 2023

The real-time speech recognition service provides a Natural User Interaction (NUI) SDK for iOS. This topic describes how to download the NUI SDK for iOS, lists the key methods in the SDK, and provides sample code for you to use the SDK.

Prerequisites

  • You understand how the SDK works. For more information, see Overview.

  • The appkey of your project is obtained. For more information, see Create a project.

  • A token used to access the service is obtained. For more information, see Obtain a token.

Download and install the SDK

  1. Download the NUI SDK for iOS and sample code.

  2. Decompress the downloaded package to obtain the demo project and use the nuisdk.framework to integrate the demo project with the iOS system.

    Note

    The demo project is written in the Objective-C and C++ programming languages. You must use the files with the .mm file extension.

  3. Use Xcode to open the demo project.

    The sample code for the real-time speech recognition service is stored in the SpeechTranscriberViewController.mm file.

Key methods

  • nui_initialize: initializes the SDK.

    /**
         * Initialize the SDK. The SDK uses a singleton pattern. To initialize the SDK again, you must first release the SDK. Do not call the SDK on the user interface (UI) thread. Otherwise, the process may be blocked.
         * @param parameters: the parameters used in the initialization. For more information, see Overview.
         * @param listener: the event listener callback. For more information, see the following callback methods.
         * @param async_listener: the asynchronous callback mode. A value of nullptr specifies that the callback is executed synchronously.
         * @param level: the log level to use. The smaller the parameter value is, the more logs are recorded.
         * @param save_log: specifies whether to store logs in files. The debug_path parameter specifies the directory where log files are stored.
         * @return: the returned error code. For more information, see Error codes.
         */
    NuiResultCode nui_initialize(const char *parameters,
                        const NuiSdkListener *listener,
                        const NuiAsyncCallback *async_listener = nullptr,
                        NuiSdkLogLevel level = LOG_LEVEL_VERBOSE,
                        bool save_log = false);

    The following table lists the parameters in the NuiSdkListener method.

    Name

    Type

    Description

    event_callback

    FuncDialogListenerOnEvent

    Reports the occurred event to the server.

    user_data_callback

    FuncDialogUserProvideData

    Sends the captured audio data to the server.

    audio_state_changed_callback

    FuncDialogAudioStateChange

    Sends the status of the microphone to the server.

    audio_extra_event_callback

    FuncDialogAudioExtraEvent

    Reserved. Reports special events to the server.

    user_data

    void *

    Obtains the user data, which corresponds to the first parameter in the callback.

    FuncDialogListenerOnEvent: reports the occurred event to the server.

    /**
         * Report the occurred event to the server.
         * @param user_data: Reserved.
         * @param event: the event to be reported by the client. You can view possible events in the following table.
         * @param dialog: (reserved) the sequence number of the session.
         * @param wuw: the wake-up word recognition feature.
         * @param asr_result: the recognition result of the audio stream.
         * @param finish: specifies whether the recognition task is completed.
         * @param resultCode: the returned error code. This parameter is valid for the EVENT_ASR_ERROR event.
         */
        typedef void (*FuncDialogListenerOnEvent) (void *user_data,
        NuiCallbackEvent event, long dialog,
        const char *wuw, const char *asr_result, bool finish, int code);

    The following table lists the possible events in the SDK.

    Name

    Description

    EVENT_VAD_START

    Detects the beginning of a speech.

    EVENT_VAD_END

    Detects the end of a speech.

    EVENT_ASR_PARTIAL_RESULT

    Generates the intermediate recognition result.

    EVENT_ASR_RESULT

    Generates the final recognition result.

    EVENT_ASR_ERROR

    Determines the error cause based on the returned error code.

    EVENT_MIC_EEROR

    Returns a recording error.

    EVENT_SENTENCE_START

    Detects the beginning of a sentence. This event is valid for the real-time speech recognition service.

    EVENT_SENTENCE_END

    Detects the end of a sentence. This event is valid for the real-time speech recognition service.

    EVENT_SENTENCE_SEMANTICS

    Reserved.

    EVENT_TRANSCRIBER_COMPLETE

    Indicates that the recognition task is completed.

    FuncDialogUserProvideData: provides audio data.

    /**
         * When the server starts a recognition task, this method is continuously called to read audio data from the client.
         * @param user_data: Reserved.
         * @param buffer: the storage space of the server for storing audio data.
         * @param len: the required number of bytes of the audio data to be read from the client.
         * @return: the actual number of bytes of the audio data that is read from the client.
         */
        typedef int (*FuncDialogUserProvideData)(void *user_data, char *buffer, int len);

    FuncDialogAudioStateChange: determines whether to enable recording based on the value of AudioState.

    /**
         * When the start, stop, or cancel method is called, the SDK uses this callback method to instruct the client to enable or disable recording.
         * @param user_data: Reserved.
         * @param state: specifies whether to enable recording.
         */
        typedef void (*FuncDialogAudioStateChange) (void *user_data, NuiAudioState state);
  • nui_set_params: sets SDK parameters in the JSON format.

    /**
         * Set parameters in the JSON format.
         * @param params: the request parameters. For more information, see Overview.
         * @param async_listener: the asynchronous callback mode. A value of nullptr specifies that the callback is executed synchronously.
         * @return: the returned error code. For more information, see Error codes.
         */
        NuiResultCode nui_set_params(const char *params, const NuiAsyncCallback *listener = nullptr);
  • nui_dialog_start: starts the recognition task.

    /**
         * Start the recognition task.
         * @param vad_mode: the voice activity detection (VAD) mode of the task. Use the Production-to-Test (P2T) mode for a recognition task.
         * @param dialog_params: the parameters used for recognition. This parameter can be left empty.
         * @param async_listener: the asynchronous callback mode. A value of nullptr specifies that the callback is executed synchronously.
         * @return: the returned error code. For more information, see Error codes.
         */
        NuiResultCode nui_dialog_start(NuiVadMode vad_mode, const char *dialog_params, const NuiAsyncCallback *listener = nullptr);
  • nui_dialog_cancel: completes the recognition task.

    /**
         * When this method is called, the server returns the final recognition result to the client and completes the recognition task.
         * @param force: specifies whether to ignore the final recognition result and forcibly complete the recognition task. A value of false specifies that the server stops the task but waits until the final recognition result is returned.
         * @param async_listener: the asynchronous callback mode. A value of nullptr specifies that the callback is executed synchronously.
         * @return: the returned error code. For more information, see Error codes.
         */
        NuiResultCode nui_dialog_cancel(bool force, const NuiAsyncCallback *listener = nullptr);
  • nui_release: releases the SDK.

    /**
         * Release the SDK.
         * @param async_listener: the asynchronous callback mode. A value of nullptr specifies that the callback is executed synchronously.
         * @return: the returned error code. For more information, see Error codes.
         */
        NuiResultCode nui_release(const NuiAsyncCallback *async_listener = nullptr);

Procedure

  1. Initialize the SDK and the recorder instance.

  2. Set request parameters based on your business requirements.

  3. Call the nui_dialog_start method to start the recognition task.

  4. Call the audio_state_changed_callback method based on the value of AudioState and then enable recording accordingly.

  5. Call the user_data_callback method to send audio data to the server.

  6. Obtain the recognition result in the EVENT_ASR_PARTIAL_RESULT and EVENT_SENTENCE_END callback events.

  7. Call the nui_dialog_cancel method to complete the recognition task.

  8. Call the nui_release method to release the SDK.

Sample code

Initialize the NUI SDK

NSString * initParam = [self genInitParams];
    //nui listener
    NuiSdkListener nuiListener;
    nuiListener.event_callback = nuiDialogListenerOnEvent;
    nuiListener.audio_state_changed_callback = nuiDialogAudioStateChange;
    nuiListener.audio_extra_event_callback = nullptr;
    nuiListener.user_data = nullptr;
    nuiListener.user_data_callback = nuiDialogUserProvideData;
    [_nui nui_initialize:[initParam UTF8String] Listener:&nuiListener asyncCallback:nullptr logLevel:LOG_LEVEL_VERBOSE saveLog:save_log];

The genInitParams method generates a JSON string that contains the information about the resource directory and user. The user information contains the following parameters:

    [dictM setObject:id_string forKey:@"device_id"];
    [dictM setObject:@"" forKey:@"url"];
    [dictM setObject:@"" forKey:@"app_key"];
    [dictM setObject:@"" forKey:@"token"];

Set the request parameters

Set the request parameters in the format of a JSON string, as shown in the following code:

-(NSString*) genParams {
    NSMutableDictionary *nls_config = [NSMutableDictionary dictionary];
    [nls_config setValue:@true forKey:@"enable_intermediate_result"];
    [nls_config setValue:@true forKey:@"enable_voice_detection"];
    NSMutableDictionary *dictM = [NSMutableDictionary dictionary];
    [dictM setObject:nls_config forKey:@"nls_config"];
    [dictM setValue:@(nuisdk::SERVICE_TYPE_SPEECH_TRANSCRIBER) forKey:@"service_type"];
    NSData *data = [NSJSONSerialization dataWithJSONObject:dictM options:NSJSONWritingPrettyPrinted error:nil];
    NSString * jsonStr = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];
    return jsonStr;
}
NSString * parameters = [self genParams];
[_nui nui_set_params:[parameters UTF8String] asyncCallback:nullptr];

Start the recognition task

Call the nui_dialog_start method to start the recognition task.

[_nui nui_dialog_start:MODE_P2T dialogParam:[param_string UTF8String] asyncCallback:nullptr];

Handle callbacks

  • Call the onNuiAudioStateChanged method based on the value of AudioState. Then, the SDK determines whether to enable recording based on the obtained value.

    -(void)onNuiAudioStateChanged:(nuisdk::NuiAudioState)state{
        TLog(@"onNuiAudioStateChanged state=%u", state);
        if (state == STATE_CLOSE || state == STATE_PAUSE) {
            [_voiceRecorder stop:YES];
        } else if (state == STATE_OPEN){
            self.recordedVoiceData = [NSMutableData data];
            [_voiceRecorder start];
        }
    }
  • Call the onNuiNeedAudioData method to send audio data to the server.

    -(int)onNuiNeedAudioData:(char *)audioData length:(int)len {
        static int emptyCount = 0;
        @autoreleasepool {
            @synchronized(_recordedVoiceData){
                if (_recordedVoiceData.length > 0) {
                    int recorder_len = 0;
                    if (_recordedVoiceData.length > len)
                        recorder_len = len;
                    else
                        recorder_len = _recordedVoiceData.length;
                    NSData *tempData = [_recordedVoiceData subdataWithRange:NSMakeRange(0, recorder_len)];
                    [tempData getBytes:audioData length:recorder_len];
                    tempData = nil;
                    NSInteger remainLength = _recordedVoiceData.length - recorder_len;
                    NSRange range = NSMakeRange(recorder_len, remainLength);
                    [_recordedVoiceData setData:[_recordedVoiceData subdataWithRange:range]];
                    emptyCount = 0;
                    return recorder_len;
                } else {
                    if (emptyCount++ >= 50) {
                        TLog(@"_recordedVoiceData length = %lu! empty 50times.", (unsigned  long)_recordedVoiceData.length);
                        emptyCount = 0;
                    }
                    return 0;
                }
            }
        }
        return 0;
    }
  • Call the onNuiEventCallback method to report the occurred event to the server. Do not call an SDK method in the callbacks. Otherwise, a deadlock may occur.

    -(void)onNuiEventCallback:(nuisdk::NuiCallbackEvent)nuiEvent
                       dialog:(long)dialog
                    kwsResult:(const char *)wuw
                    asrResult:(const char *)asr_result
                     ifFinish:(bool)finish
                      retCode:(int)code {
        TLog(@"onNuiEventCallback event %d finish %d", nuiEvent, finish);
        if (nuiEvent == nuisdk::EVENT_ASR_PARTIAL_RESULT || nuiEvent == nuisdk::EVENT_SENTENCE_END) {
            TLog(@"ASR RESULT %s finish %d", asr_result, finish);
            NSString *result = [NSString stringWithUTF8String:asr_result];
            [myself showAsrResult:result];
        } else if (nuiEvent == nuisdk::EVENT_ASR_ERROR) {
            TLog(@"EVENT_ASR_ERROR error[%d]", code);
        } else if (nuiEvent == nuisdk::EVENT_MIC_ERROR) {
            TLog(@"MIC ERROR");
            [_voiceRecorder stop:true];
            [_voiceRecorder start];
        }
        if (finish) {
            [myself showStart];
        }
        return;
    }

Complete the recognition task

[_nui nui_dialog_cancel:false asyncCallback:nullptr];