×
Community Blog When AI Agents Take Over Phones: How to Monitor on Mobile Devices

When AI Agents Take Over Phones: How to Monitor on Mobile Devices

This article introduces detecting AI-driven "non-human" Android operations via AccessibilityService, event injection, and ADB using Alibaba Cloud RUM.

Background

AI agent-based phone assistants have recently gone viral on social media. They use AI to automate phone operations for complex tasks. These tasks include placing orders, comparing prices, and searching. A user simply says, "Find me the cheapest iPhone." The AI then opens shopping apps, searches for products, compares prices, and places the order. This scenario of AI taking over phones reveals a new form of future human-computer interaction.

However, when AI starts operating phones at scale, traditional user behavior analysis faces critical data pollution issues, such as:

Inflated conversion rates: AI automated ordering interferes with conversion rate data. This leads to incorrect business decisions.

User path analysis failure: AI operation paths are highly optimized and repetitive. This pollutes the analysis of user behavior paths.

Recommendation algorithm bias: Recommendation models based on AI operation data deviate from real user preferences.

How do we detect "non-human" operations? Let's first break down how AI or scripts operate phones.

Technical Breakdown

Let's look at the principles of how AI agents operate phones.

flowchart TB
    subgraph Layer1["User Entry Layer"]
        A[User voice/text instructions]
    end
    
    subgraph Layer2["Screen Capture Layer"]
        A --> B[Get screen information]
        style B fill:#ffcccc
    end
    
    subgraph Layer3["Cloud Communication Layer"]
        B -->|Upload screen information| C[Cloud inference server]
        C -->|Return instructions| D[Phone operation instructions]
        style C fill:#99ccff
        style D fill:#cce5ff
    end
    
    subgraph Layer4["Operation Execution Layer"]
        D --> E[Execute operation instructions]
    end
    
    style Layer1 fill:#f9f9f9,stroke:#333,stroke-width:2px
    style Layer2 fill:#fff5f5,stroke:#ff6666,stroke-width:2px
    style Layer3 fill:#f5f9ff,stroke:#6699ff,stroke-width:2px
    style Layer4 fill:#f5fff5,stroke:#66ff66,stroke-width:2px

It is divided into the following layers:

User entry layer: Users issue operation instructions via text or voice.

Screen capture layer: Gets raw screen information.

Cloud communication layer: Cloud inference server.

Operation execution layer: Click, swipe, long press, input, and so on.

To detect "non-human" operations from a mobile monitoring perspective, focus on the "Operation Execution Layer". Take the Android platform as an example. Three common technical paths in the "Operation Execution Layer" enable "non-human" operations:

● Input events using AccessibilityService

● Inject events using INJECT_EVENTS

● Inject events using adb shell input

In addition, custom ROMs and external hardware can also perform "non-human" operations. This part is not covered in this topic.

Input Events Using AccessibilityService

AccessibilityService is an Android accessibility framework. It originally helped users with disabilities use phones but also supports automation. It is the primary technical path for accessibility apps and game supporting tools to automate operations.

AccessibilityService works in three phases:

Phase 1: Event listening

Phase 2: Screen reading

Phase 3: Automated operation

Phase 1: Event listening

When the application interface changes, such as a new page opening or a button status change, the system notifies registered accessibility services via AccessibilityEvent. The service listens for various event types. These include window status changes, content changes, and view scrolling.

Phase 2: Screen reading

The accessibility service retrieves the view hierarchy of the current active window. It uses the AccessibilityNodeInfo object to access all UI elements on the screen. These include:

● Text content, such as button text and input box content

● View properties, such as location, size, and clickability

● View hierarchy relationships, such as parent and child nodes

● This lets the AI Agent "see" screen content and understand the current interface status.

Phase 3: Automated operation

Based on the read screen content, the accessibility service performs two types of operations:

Node operations: Directly operate on UI nodes, such as clicking, long-pressing, and entering text.

Gesture operations: Execute complex touch gestures using the GestureDescription API. Examples include swiping, dragging, and multi-touch.

Inputting events using the accessibility service has the following features:

User authorization required: Users must manually enable the accessibility service in System Settings.

Screen content reading: Fully reads on-screen text and view hierarchy information.

Flexible operation capabilities: Supports complex operations such as clicking, swiping, long-pressing, and entering text.

Injecting Events Using INJECT_EVENTS

INJECT_EVENTS is an Android system-level permission. It lets applications directly inject touch events into the input system to simulate user operations. This is a low-level event injection mechanism provided by the Android system.

The INJECT_EVENTS mechanism also works in three phases:

Phase 1: Event construction

Phase 2: Permission authentication

Phase 3: System injection

Phase 1: Event construction

The application constructs a MotionEvent object by calling system APIs using Instrumentation or reflection. This object contains basic information such as touch coordinates and action types (ACTION_DOWN, ACTION_UP).

Phase 2: Permission authentication

The Android system checks if the caller has the INJECT_EVENTS permission. Ordinary applications cannot obtain this system-level permission. It is available only in the following cases:

● System applications (with system signature)

● Applications with root permissions

Phase 3: System injection

After passing permission authentication, the event enters the Android input subsystem. The input subsystem handles all input events, such as touches and keystrokes. It treats injected events as real hardware input events and distributes them to the current focus window.

Injecting events using INJECT_EVENTS has the following features:

Low-level injection: Events are injected directly into the system at a lower level.

No user authorization required: Does not require manual user authorization, but requires system signature or root permissions.

Harder to detect: Injection occurs at the system level, making it harder for the application layer to detect.

Injecting Events Using adb shell input

The adb shell input is a command line interface provided by the Android Debug Bridge (ADB). It injects input events into devices using a USB or network connection. This is common in development debugging and automated testing. It is essentially the same as INJECT_EVENTS but differs in the calling entity and permission acquisition.

The mechanism for injecting events using adb shell input works in four phases:

● Phase 1: Sending commands

● Phase 2: ADB protocol transmission

● Phase 3: Daemon process processing

● Phase 4: System injection

Phase 1: Sending commands

Send input commands using an ADB client on a PC or remote device (such as a USB device) as follows:

adb shell input tap 500 1000                 # Click coordinates (500, 1000) 
adb shell input swipe 100 200 300 400        # Swipe from (100, 200) to (300, 400) 
adb shell input text "hello"                 # Inp

Phase 2: ADB protocol transmission

The ADB client sends commands to the ADB daemon process (adbd) on the Android device over USB or a TCP/IP network. The ADB protocol handles command serialization, transmission, and deserialization.

Phase 3: Daemon process processing

After the adbd daemon process accepts the command, it parses the command parameters and constructs the corresponding MotionEvent or KeyEvent objects. The adbd process runs with system permission (usually shell or root) and holds system-level privileges.

Phase 4: System injection

adbd invokes the system API (InputManager.injectInputEvent()) to inject the event into the input subsystem. This procedure follows the same final injection path as INJECT_EVENTS within an application.

Compared to INJECT_EVENTS, the adb shell input method for injecting events has the following characteristics:

● Requires transmission via the ADB protocol.

● Permission acquisition: Establishing an ADB connection grants permission. Modifying the application is not required.

● The underlying implementation of event injection is consistent with INJECT_EVENTS.

Detection of "non-human" Operations

Many cheats and scripts, including AI Agents that can operate phones, use the above solution. However, special groups (such as visually impaired users) also use accessibility services. Simple analysis of events based on feature values may lead to false positives. The following section outlines how to use collected event features and external environment features to assist in analyzing "non-human" operation events when they occur.

Detect AccessibilityService input events

To operate a phone using AccessibilityService, enable the corresponding accessibility service in System Settings. The Android system provides APIs to determine whether an accessibility service is running as follows:

● Supports detection of running accessibility services.

● Supports reading the accessibility service ID.

● Supports detection of whether screen content is read.

● Supports detection of the capability to operate applications.

// Detect whether a running accessibility service exists
public boolean hasAccessibilityServiceRunning() {
      AccessibilityManager am = (AccessibilityManager) context.getSystemService(Context.ACCESSIBILITY_SERVICE);
      return am != null && am.isEnabled();
}

// Check the accessibility service ID
public void checkServiceId() {
    List<AccessibilityServiceInfo> enabledServices = am.getEnabledAccessibilityServiceList(AccessibilityServiceInfo.FEEDBACK_ALL_MASK);
    for (AccessibilityServiceInfo service : enabledServices) {
        //  Get the service ID (usually "package name/class name")
        String id = service.getId();
    }
}

// Check if the service has the capability to control the application
public boolean hasFullControlAgent() {
    List<AccessibilityServiceInfo> enabledServices = am.getEnabledAccessibilityServiceList(AccessibilityServiceInfo.FEEDBACK_ALL_MASK);
    for (AccessibilityServiceInfo service : enabledServices) {
        int capabilities = service.getCapabilities();
        
        // 1. Check if the service can read the screen
        boolean canRetrieve = (capabilities & AccessibilityServiceInfo.CAPABILITY_CAN_RETRIEVE_WINDOW_CONTENT) != 0;
        
        // 2. Check if the service can control the application
        boolean canPerform = (capabilities & AccessibilityServiceInfo.CAPABILITY_CAN_PERFORM_GESTURES) != 0;

        // Has the capability to control the application
        if (canRetrieve && canPerform) {
            return true;
        }
    }
    return false;
}

You can also check MotionEvent flags to determine if an accessibility service generated the event:

// Check if the event was generated by an accessibility service (API version requirement applies) 
public boolean isAccessibilityEvent(MotionEvent event) {
    if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.S) {
        int flags = event.getFlags();
        // Use a bitwise operation to check if 0x800 is included
        return (flags & FLAG_IS_ACCESSIBILITY_EVENT) != 0;
    }
    return false;
}

Using the preceding methods, the application can detect running accessibility services. However, this may mistakenly flag normal accessibility services.

Detect events injected by INJECT_EVENTS

Events injected by INJECT_EVENTS typically have the following features:

● Event attributes may lack properties such as pressure value and touch area

● The flag may be FLAG_IS_GENERATED_GESTURE

● The event source might be SOURCE_UNKNOWN

The detection logic is as follows:

public boolean isEventInjected(MotionEvent event) {
    if (event == null) {
        return false;
    }
    
    // Method 1: Check Event Attributes
    // Injected events might lack properties such as pressure value and touch area
    boolean hasPressure = event.getPressure() > 0;
    boolean hasSize = event.getSize() > 0;
    boolean hasToolType = event.getToolType(0) != MotionEvent.TOOL_TYPE_UNKNOWN;
    
    // If the event lacks these basic properties, it might be injected (false positives are possible)
    if (!hasPressure && !hasSize && !hasToolType) {
        return true;
    }
    
    // Method 2: Check event flags 
    int flags = event.getFlags();
    // FLAG_IS_GENERATED_GESTURE indicates a gesture generated by a program
    if ((flags & 0x08000000) != 0) {
        return true;
    }
    
    // Method 3: Check event source int source
    int source = event.getSource();
    if (source == InputDevice.SOURCE_UNKNOWN) {
        return true;
    }
    
    return false;
}

No single reliable method currently exists for events injected using INJECT_EVENTS. These events occur at a lower layer and can bypass application layer detection mechanisms. Detecting such injections often requires a multi-dimensional comprehensive detection approach (this also applies to other types of injected event detection). However, you can check event features to improve the success rate of detecting "non-human" operations.

Detect adb shell input injection events

Events injected using adb shell input are essentially the same as those injected using INJECT_EVENTS. However, because an ADB connection is required, you can detect them by checking the ADB Connection Status.

Detect whether ADB is enabled:

public static boolean isAdbEnabled(Context context) {
    return Settings.Global.getInt(
        context.getContentResolver(),
        Settings.Global.ADB_ENABLED, 0
    ) > 0;
}

Detect USB Connection Status:

private static boolean isUsbConnected(Context context) {
    Intent intent = context.registerReceiver(
        null,
        new IntentFilter(Intent.ACTION_BATTERY_CHANGED)
    );

    if (intent == null) return false;

    int plugged = intent.getIntExtra(BatteryManager.EXTRA_PLUGGED, -1);
    //  Check if powered via USB
    return plugged == BatteryManager.BATTERY_PLUGGED_USB;
}

Check if the ADB port is open (wireless debugging and emulator scenarios):

  private static boolean isAdbPortOpen() {
      // Common ADB ports. 5555 is the default. Some emulators may use 5554-5585
      int[] ports = {5555, 5554, 5556, 5557, 5558, 5559, 5560};

      for (int port : ports) {
          try (Socket socket = new Socket()) {
              socket.connect(new InetSocketAddress("127.0.0.1", port), 50);

              Log.w(TAG, "isAdbPortOpen, opened. port: " + port);
          } catch (Exception e) {
              // ignored
              e.printStackTrace();
              Log.w(TAG, "isAdbPortOpen, closed. port: " + port);
          }
      }
      return false;
  }

Check the debugger status:

public static boolean isDebuggerAttached() {
    return android.os.Debug.isDebuggerConnected();
}

Check the USB ADB status:

private static boolean isUsbAdbActive() {
    try {
        Class<?> systemPropertiesClass = Class.forName("android.os.SystemProperties");
        Method getMethod = systemPropertiesClass.getMethod("get", String.class);
        String usbState = (String) getMethod.invoke(null, "sys.usb.state");

        // Check if adb is included
        // Common return values:
        // "mtp,adb" -> MTP transfer enabled and ADB connected
        // "adb"     -> Charging only mode and ADB connected 
        // "mtp"     -> MTP only. ADB not enabled
        if (usbState != null && usbState.contains("adb")) {
            return true;
        }

        // Double validation: Some vendors might use persist.sys.usb.config
        String usbConfig = (String) getMethod.invoke(null, "persist.sys.usb.config");
        if (usbConfig != null && usbConfig.contains("adb")) {
            return true;
        }

    } catch (Exception e) {
        // Reflection failed or Insufficient Permissions (some newer Android versions might restrict reading)
        e.printStackTrace();
    }
    return false;
}

Use the preceding methods to collect ADB-related environment context during operations. When analyzing operation occurrences, use ADB status information to help determine the likelihood of "non-human" operations in the current application.

Detect Abnormal Operations Using RUM + custom query

The result fields detected by the preceding three methods are reported to the Real User Monitoring product via the RUM SDK. Use query and analysis on the result fields to quickly detect suspicious non-human operations. The following outlines several analysis scenarios.

Scenario 1: Detect users with enabled accessibility services

Analyze the enabling status and service ID of accessibility services to quickly identify users with potential non-human operations.

-- Query users who enabled accessibility services within the last hour and their operation counts
* and context.accessibility_enabled: true | 
SELECT 
  "user.name",
  "context.accessibility_service_id",
  COUNT(*) as operation_count,
  COUNT(DISTINCT "session.id") as session_count
GROUP BY 
  "user.name", "context.accessibility_service_id"
ORDER BY 
  operation_count DESC
LIMIT 100

Analysis: If a user has an abnormally high operation count and has enabled non-system accessibility services, pay close attention.

Scenario 2: Detect accessibility services with full control capabilities

Focus on analyzing accessibility services with the dual capabilities of reading screens and operating applications.

-- Query accessibility services with full control capabilities and the affected User Count
* and context.can_retrieve_window: true and context.can_perform_gestures: true | 
SELECT 
 "context.accessibility_service_id",
 COUNT(DISTINCT "user.name") as affected_users,
 COUNT(DISTINCT "device.id") as affected_devices
FROM log
GROUP BY "context.accessibility_service_id"
ORDER BY affected_users DESC

Analysis: Services with both screen reading and gesture operation capabilities are more likely used for automated operations.

Scenario 3: Detect operations in ADB connection environments

Analyze user operations under ADB Connection Status. This may indicate the presence of scripts or automation tools.

-- Query user operation features under ADB enabling status
* and context.adb_enabled: true or context.usb_adb_active:true or context.adb_port_open: true | 
SELECT 
 "user.name",
 "device.id",
 CASE 
 WHEN "context.adb_enabled" = true THEN 'ADB Enabled'
 WHEN "context.usb_adb_active" = true THEN 'USB-ADB Connected'
 WHEN "context.adb_port_open" = true THEN 'ADB Port Open'
 END as adb_status,
 COUNT(*) as event_count
FROM log
GROUP BY "user.name", "device.id", adb_status
ORDER BY event_count DESC
LIMIT 100

Analysis: Operations under ADB Connection Status may be automation scripts.

Scenario 4: Detect injected event features

Detect events possibly injected using INJECT_EVENTS based on event flags and missing properties.

-- Query events with injection features
* and event_type: "action" and (
 context.is_generated_gesture: true or context.event_source = 'SOURCE_UNKNOWN' or (context.has_pressure: false and context.has_size: false and context.has_tool_type: false)
) | 
SELECT 
 "user.name",
 "device.id",
 COUNT(*) as injected_event_count,
 COUNT(DISTINCT session_id) as session_count,
 ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY "user.name"), 2) as inject_ratio
FROM log
GROUP BY "user.name", "device.id"
HAVING injected_event_count > 50
ORDER BY inject_ratio DESC
LIMIT 100

Analysis: If the ratio of injected events for a user is too high (such as over 50%), it likely indicates non-human operations.

Conclusion

This section analyzed the technical principles of AI agents or scripts controlling mobile phones. It also outlined how to fetch feature information for events using three technical paths. The rise of AI agents such as AutoGLM and Doubao mobile phones marks a new stage in mobile interaction. AI can automatically perform complex interactions. This represents technical progress but creates new challenges in detecting non-human operations. Accurate non-human detection in mobile monitoring requires detection across multiple dimensions. These include enhanced detection of the application runtime environment, accessibility service package names, device features, behavior features, and screen trajectory features. The Alibaba Cloud Real User Monitoring SDK now collects relevant properties. You can use these properties to customize the analysis of potential non-human operations. Note that this feature is in grayscale. Contact the author for more information.

0 1 0
Share on

You may also like

Comments