The real-time speech recognition service provides a Natural User Interaction (NUI) SDK for iOS. This topic describes how to download the NUI SDK for iOS, lists the key methods in the SDK, and provides sample code for you to use the SDK.
Prerequisites
You understand how the SDK works. For more information, see Overview.
The appkey of your project is obtained. For more information, see Create a project.
A token used to access the service is obtained. For more information, see Obtain a token.
Download and install the SDK
Decompress the downloaded package to obtain the demo project and use the nuisdk.framework to integrate the demo project with the iOS system.
NoteThe demo project is written in the Objective-C and C++ programming languages. You must use the files with the .mm file extension.
Use Xcode to open the demo project.
The sample code for the real-time speech recognition service is stored in the SpeechTranscriberViewController.mm file.
Key methods
nui_initialize: initializes the SDK.
/** * Initialize the SDK. The SDK uses a singleton pattern. To initialize the SDK again, you must first release the SDK. Do not call the SDK on the user interface (UI) thread. Otherwise, the process may be blocked. * @param parameters: the parameters used in the initialization. For more information, see Overview. * @param listener: the event listener callback. For more information, see the following callback methods. * @param async_listener: the asynchronous callback mode. A value of nullptr specifies that the callback is executed synchronously. * @param level: the log level to use. The smaller the parameter value is, the more logs are recorded. * @param save_log: specifies whether to store logs in files. The debug_path parameter specifies the directory where log files are stored. * @return: the returned error code. For more information, see Error codes. */ NuiResultCode nui_initialize(const char *parameters, const NuiSdkListener *listener, const NuiAsyncCallback *async_listener = nullptr, NuiSdkLogLevel level = LOG_LEVEL_VERBOSE, bool save_log = false);
The following table lists the parameters in the NuiSdkListener method.
Name
Type
Description
event_callback
FuncDialogListenerOnEvent
Reports the occurred event to the server.
user_data_callback
FuncDialogUserProvideData
Sends the captured audio data to the server.
audio_state_changed_callback
FuncDialogAudioStateChange
Sends the status of the microphone to the server.
audio_extra_event_callback
FuncDialogAudioExtraEvent
Reserved. Reports special events to the server.
user_data
void *
Obtains the user data, which corresponds to the first parameter in the callback.
FuncDialogListenerOnEvent: reports the occurred event to the server.
/** * Report the occurred event to the server. * @param user_data: Reserved. * @param event: the event to be reported by the client. You can view possible events in the following table. * @param dialog: (reserved) the sequence number of the session. * @param wuw: the wake-up word recognition feature. * @param asr_result: the recognition result of the audio stream. * @param finish: specifies whether the recognition task is completed. * @param resultCode: the returned error code. This parameter is valid for the EVENT_ASR_ERROR event. */ typedef void (*FuncDialogListenerOnEvent) (void *user_data, NuiCallbackEvent event, long dialog, const char *wuw, const char *asr_result, bool finish, int code);
The following table lists the possible events in the SDK.
Name
Description
EVENT_VAD_START
Detects the beginning of a speech.
EVENT_VAD_END
Detects the end of a speech.
EVENT_ASR_PARTIAL_RESULT
Generates the intermediate recognition result.
EVENT_ASR_RESULT
Generates the final recognition result.
EVENT_ASR_ERROR
Determines the error cause based on the returned error code.
EVENT_MIC_EEROR
Returns a recording error.
EVENT_SENTENCE_START
Detects the beginning of a sentence. This event is valid for the real-time speech recognition service.
EVENT_SENTENCE_END
Detects the end of a sentence. This event is valid for the real-time speech recognition service.
EVENT_SENTENCE_SEMANTICS
Reserved.
EVENT_TRANSCRIBER_COMPLETE
Indicates that the recognition task is completed.
FuncDialogUserProvideData: provides audio data.
/** * When the server starts a recognition task, this method is continuously called to read audio data from the client. * @param user_data: Reserved. * @param buffer: the storage space of the server for storing audio data. * @param len: the required number of bytes of the audio data to be read from the client. * @return: the actual number of bytes of the audio data that is read from the client. */ typedef int (*FuncDialogUserProvideData)(void *user_data, char *buffer, int len);
FuncDialogAudioStateChange: determines whether to enable recording based on the value of AudioState.
/** * When the start, stop, or cancel method is called, the SDK uses this callback method to instruct the client to enable or disable recording. * @param user_data: Reserved. * @param state: specifies whether to enable recording. */ typedef void (*FuncDialogAudioStateChange) (void *user_data, NuiAudioState state);
nui_set_params: sets SDK parameters in the JSON format.
/** * Set parameters in the JSON format. * @param params: the request parameters. For more information, see Overview. * @param async_listener: the asynchronous callback mode. A value of nullptr specifies that the callback is executed synchronously. * @return: the returned error code. For more information, see Error codes. */ NuiResultCode nui_set_params(const char *params, const NuiAsyncCallback *listener = nullptr);
nui_dialog_start: starts the recognition task.
/** * Start the recognition task. * @param vad_mode: the voice activity detection (VAD) mode of the task. Use the Production-to-Test (P2T) mode for a recognition task. * @param dialog_params: the parameters used for recognition. This parameter can be left empty. * @param async_listener: the asynchronous callback mode. A value of nullptr specifies that the callback is executed synchronously. * @return: the returned error code. For more information, see Error codes. */ NuiResultCode nui_dialog_start(NuiVadMode vad_mode, const char *dialog_params, const NuiAsyncCallback *listener = nullptr);
nui_dialog_cancel: completes the recognition task.
/** * When this method is called, the server returns the final recognition result to the client and completes the recognition task. * @param force: specifies whether to ignore the final recognition result and forcibly complete the recognition task. A value of false specifies that the server stops the task but waits until the final recognition result is returned. * @param async_listener: the asynchronous callback mode. A value of nullptr specifies that the callback is executed synchronously. * @return: the returned error code. For more information, see Error codes. */ NuiResultCode nui_dialog_cancel(bool force, const NuiAsyncCallback *listener = nullptr);
nui_release: releases the SDK.
/** * Release the SDK. * @param async_listener: the asynchronous callback mode. A value of nullptr specifies that the callback is executed synchronously. * @return: the returned error code. For more information, see Error codes. */ NuiResultCode nui_release(const NuiAsyncCallback *async_listener = nullptr);
Procedure
Initialize the SDK and the recorder instance.
Set request parameters based on your business requirements.
Call the nui_dialog_start method to start the recognition task.
Call the audio_state_changed_callback method based on the value of AudioState and then enable recording accordingly.
Call the user_data_callback method to send audio data to the server.
Obtain the recognition result in the EVENT_ASR_PARTIAL_RESULT and EVENT_SENTENCE_END callback events.
Call the nui_dialog_cancel method to complete the recognition task.
Call the nui_release method to release the SDK.
Sample code
Initialize the NUI SDK
NSString * initParam = [self genInitParams];
//nui listener
NuiSdkListener nuiListener;
nuiListener.event_callback = nuiDialogListenerOnEvent;
nuiListener.audio_state_changed_callback = nuiDialogAudioStateChange;
nuiListener.audio_extra_event_callback = nullptr;
nuiListener.user_data = nullptr;
nuiListener.user_data_callback = nuiDialogUserProvideData;
[_nui nui_initialize:[initParam UTF8String] Listener:&nuiListener asyncCallback:nullptr logLevel:LOG_LEVEL_VERBOSE saveLog:save_log];
The genInitParams method generates a JSON string that contains the information about the resource directory and user. The user information contains the following parameters:
[dictM setObject:id_string forKey:@"device_id"];
[dictM setObject:@"" forKey:@"url"];
[dictM setObject:@"" forKey:@"app_key"];
[dictM setObject:@"" forKey:@"token"];
Set the request parameters
Set the request parameters in the format of a JSON string, as shown in the following code:
-(NSString*) genParams {
NSMutableDictionary *nls_config = [NSMutableDictionary dictionary];
[nls_config setValue:@true forKey:@"enable_intermediate_result"];
[nls_config setValue:@true forKey:@"enable_voice_detection"];
NSMutableDictionary *dictM = [NSMutableDictionary dictionary];
[dictM setObject:nls_config forKey:@"nls_config"];
[dictM setValue:@(nuisdk::SERVICE_TYPE_SPEECH_TRANSCRIBER) forKey:@"service_type"];
NSData *data = [NSJSONSerialization dataWithJSONObject:dictM options:NSJSONWritingPrettyPrinted error:nil];
NSString * jsonStr = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];
return jsonStr;
}
NSString * parameters = [self genParams];
[_nui nui_set_params:[parameters UTF8String] asyncCallback:nullptr];
Start the recognition task
Call the nui_dialog_start method to start the recognition task.
[_nui nui_dialog_start:MODE_P2T dialogParam:[param_string UTF8String] asyncCallback:nullptr];
Handle callbacks
Call the onNuiAudioStateChanged method based on the value of AudioState. Then, the SDK determines whether to enable recording based on the obtained value.
-(void)onNuiAudioStateChanged:(nuisdk::NuiAudioState)state{ TLog(@"onNuiAudioStateChanged state=%u", state); if (state == STATE_CLOSE || state == STATE_PAUSE) { [_voiceRecorder stop:YES]; } else if (state == STATE_OPEN){ self.recordedVoiceData = [NSMutableData data]; [_voiceRecorder start]; } }
Call the onNuiNeedAudioData method to send audio data to the server.
-(int)onNuiNeedAudioData:(char *)audioData length:(int)len { static int emptyCount = 0; @autoreleasepool { @synchronized(_recordedVoiceData){ if (_recordedVoiceData.length > 0) { int recorder_len = 0; if (_recordedVoiceData.length > len) recorder_len = len; else recorder_len = _recordedVoiceData.length; NSData *tempData = [_recordedVoiceData subdataWithRange:NSMakeRange(0, recorder_len)]; [tempData getBytes:audioData length:recorder_len]; tempData = nil; NSInteger remainLength = _recordedVoiceData.length - recorder_len; NSRange range = NSMakeRange(recorder_len, remainLength); [_recordedVoiceData setData:[_recordedVoiceData subdataWithRange:range]]; emptyCount = 0; return recorder_len; } else { if (emptyCount++ >= 50) { TLog(@"_recordedVoiceData length = %lu! empty 50times.", (unsigned long)_recordedVoiceData.length); emptyCount = 0; } return 0; } } } return 0; }
Call the onNuiEventCallback method to report the occurred event to the server. Do not call an SDK method in the callbacks. Otherwise, a deadlock may occur.
-(void)onNuiEventCallback:(nuisdk::NuiCallbackEvent)nuiEvent dialog:(long)dialog kwsResult:(const char *)wuw asrResult:(const char *)asr_result ifFinish:(bool)finish retCode:(int)code { TLog(@"onNuiEventCallback event %d finish %d", nuiEvent, finish); if (nuiEvent == nuisdk::EVENT_ASR_PARTIAL_RESULT || nuiEvent == nuisdk::EVENT_SENTENCE_END) { TLog(@"ASR RESULT %s finish %d", asr_result, finish); NSString *result = [NSString stringWithUTF8String:asr_result]; [myself showAsrResult:result]; } else if (nuiEvent == nuisdk::EVENT_ASR_ERROR) { TLog(@"EVENT_ASR_ERROR error[%d]", code); } else if (nuiEvent == nuisdk::EVENT_MIC_ERROR) { TLog(@"MIC ERROR"); [_voiceRecorder stop:true]; [_voiceRecorder start]; } if (finish) { [myself showStart]; } return; }
Complete the recognition task
[_nui nui_dialog_cancel:false asyncCallback:nullptr];