All Products
Search
Document Center

Intelligent Media Services:AI interviews

Last Updated:Dec 15, 2025

This topic provides a solution guide for implementing AI interviews, helping companies enhance recruitment quality and efficiency.

Background

In today's fast-paced and competitive business environment, traditional interview methods are often inefficient, subjective, and slow, failing to meet modern hiring demands. AI-powered interviews streamlines candidate screening, reduces hiring cycles, and improves both efficiency and objectivity. By minimizing human bias, AI interviews enhance fairness and provide reliable, data-driven insights to improve candidate-role matching, giving companies a competitive edge in the war for talent.

Solution overview

image

The AI interview process consists of three main stages:

  • Pre-interview:

    • Candidate communication: Establish a clear process to inform candidates about the interview time, format, and what to expect. Provide detailed instructions on how to use the AI interview system to ensure a smooth experience.

    • Question bank setup: Design a targeted question bank based on the responsibilities, skills, and competencies required for each role, covering areas such as professional knowledge, work experience, problem-solving, and teamwork.

  • During the interview:

    • Audio/Video call: Select the appropriate interview format for the role and candidate.

    • Personalized interview: Configure the AI agent with specific parameters to provide a tailored interview for each candidate.

    • Anti-cheating detection: Monitor the candidate's facial expressions and actions in real time to detect potential cheating.

  • Post-interview:

    • Audio/Video archiving: Save the raw audio and video data from the interview session.

    • Transcript archiving: Transcribe the conversation into text and archive it for review.

Options

Interview modes

Our solution offers three interview modes. Specify the desired call type when creating your AI agent and integrate the corresponding SDK. You can experience these modes firsthand in our demo. To integrate the service, see the 添加网页链接.

Audio-only call

Vision call

Video call

Example

555d2e763e3c49c23ac59cb7060d2a44

19bd2b3ffe1089599439cb9f93bf30cf

image

Interaction

  • Candidate: Audio

  • AI interviewer: Audio

  • Supports natural conversation and push-to-talk modes

  • Candidate: Audio, video

  • AI interviewer: Audio

  • Supports natural conversation and push-to-talk modes

  • Candidate: Audio, video

  • AI interviewer: Audio, video

  • Supports natural conversation and push-to-talk modes

Cost

Low

Medium

High

Client SDKs

For detailed SDK integration instructions, see Developer guide.

SDK

Description

Web

Recommended

  • Desktop browsers, such as Chrome.

  • Mobile H5, such as Alipay H5, DingTalk H5, and WeChat mini program H5.

  • In-app WebViews.

Note
  • Use on native mobile browsers is not recommended due to potential WebRTC compatibility issues on some devices.

  • Direct integration with native WeChat Mini Program components is not supported. Use the H5 version within a mini program instead.

Android/iOS

Recommended for native applications on Android or iOS.

Other

For development on Windows or macOS desktops, contact us by joining our DingTalk group (ID: 106730016696).

Basic features

Personalized interviews

Alibaba Cloud provides a rich set of APIs to create a tailored interview for each candidate. You can achieve this by setting call startup parameters on the client or by configuring parameters during server-side initiation.

Setting

Description

Modifiable during call?

LLM prompt

Pass candidate details and job information as part of the initial prompt to enable the AI to conduct a more targeted interview.

Yes

ASR language

Set the speech recognition language (such as Chinese or English).

Yes

TTS voice

Set the AI interviewer's voice and timbre.

Yes

Avatar

If using a VideoAgent with multiple avatars, you can specify which one to use for the call.

No

Welcome message

Set a custom welcome message for each candidate, such as, "Hello, Alice. Welcome to your interview."

No

Conversation modes

Candidates have different speaking styles and speeds. To prevent the AI from interrupting a candidate who pauses to think, our solution offers three conversation modes:

  • Mode 1: Natural conversation with semantic endpointing (Recommended)

    The candidate and AI engage in a natural, full-duplex conversation. When a user pauses, the semantic recognition module intelligently determines if they have finished speaking based on context. We recommend configuring a 5-second wait time (configurable) if the user is likely still speaking. The AI will respond immediately once the user has finished. For details, see Semantic endpointing.

  • Mode 2: Push-to-Talk

    The candidate presses and holds a button to answer a question and releases it to finish. For details, see Push-to-Talk mode.

  • Mode 3: Natural conversation with a stop phrase

    The candidate says a specific phrase (such as "I'm finished") to signal the end of their turn. You can configure multiple stop phrases for a call.

Send custom messages to clients

If you need to send custom information, such as test questions or informational cards, to the client in real-time, our platform provides a dedicated channel for this. Once received, the client can render the content or perform any custom action.

image

There are two ways to implement this:

  • Method 1: Your server can send custom messages directly to the client. See Send proactive messages to clients.

  • Method 2: You can embed custom commands within the LLM's response.

    Note

    The custom commands can be marked with special characters, such as {} or []. These markers can be filtered out by the TTS node so they are not spoken aloud. Parse this content to handle custom business logic.

Pass user information to the model

When multiple candidates are being interviewed simultaneously, the LLM needs to distinguish which input comes from which user. Our platform allows you to pass custom information, such as a UserID, through to the model. See Pass through business parameters to Alibaba Cloud Model Studio.

Detect and handle user silence

You can monitor the timestamp of each user utterance by listening for the intent_recognized callback. See Agent callbacks for details. This allows you to handle cases where a user is silent for an extended period. Common actions include:

Conversation archiving

You can save the audio data and text transcripts generated during the entire interview session. For instructions, see Data archiving.

Advanced features

Anti-cheating system

Check item

Description

Executor

Invalid screen detection

Detects if the screen is obscured (such as by glare, a black screen, or a white screen) for a set duration.

Real-time Conversational AI

Number of people in the video

Provides a real-time callback with the number of people in the frame to detect extra people or if the candidate leaves.

Real-time Conversational AI

Electronic device detection

Provides a real-time callback if electronic devices (phones, watches, headphones) are detected in the frame.

Real-time Conversational AI

Frequent head shaking

Triggers if the candidate shakes their head twice within 5 seconds.

Real-time Conversational AI

Frequent head nodding

Triggers if the candidate nods twice within 5 seconds.

Real-time Conversational AI

Content overlap analysis

After the interview, you can perform plagiarism detection on the candidate's answers using an LLM to check for AI-generated content.

Your own application

Pre-interview phone notifications

Our platform provides outbound calling capabilities, which can be used to send automated invitations before the interview and result notifications afterward.