MobileAgentSDK is an Android library that wraps the C++ MobileAgent SDK and provides a Kotlin/Java-friendly interface. It connects to the MobileAgent server via WebSocket to execute, manage, and monitor agent tasks.
Download the SDK
Usage
Import the AAR
Copy the generated AAR file into your project and add the following dependency to your build.gradle file:
dependencies {
implementation(files('libs/sdk-release.aar'))
}Basic usage
Option 1: Use Flow (recommended)
Flow provides a reactive programming style that is well-suited for Kotlin coroutines:
import com.wuying.mobileagentsdk.MobileAgentSdk
import com.wuying.mobileagentsdk.MobileAgentEvent
import com.wuying.mobileagentsdk.MobileAgentState
import kotlinx.coroutines.flow.filterIsInstance
import kotlinx.coroutines.launch
import kotlinx.coroutines.flow.collectLatest
// Use try-with-resources to automatically manage the lifecycle.
MobileAgentSdk().use { sdk ->
// Collect events in a coroutine.
lifecycleScope.launch {
sdk.events.collectLatest { event ->
when (event) {
is MobileAgentEvent.StateChanged -> {
when (event.state) {
MobileAgentState.CONNECTED -> {
Log.d("SDK", "Connected")
// Execute a task.
sdk.executeTask("Please help me open settings", 10)
}
MobileAgentState.DISCONNECTED -> {
Log.d("SDK", "Disconnected")
}
else -> {}
}
}
is MobileAgentEvent.TaskResult -> {
Log.d("SDK", "Task complete: ${event.resultText}")
}
is MobileAgentEvent.Stream -> {
Log.d("SDK", "Stream data: ${event.content}")
}
else -> {}
}
}
}
// You can also collect only specific types of events.
lifecycleScope.launch {
sdk.events
.filterIsInstance<MobileAgentEvent.StateChanged>()
.collect { event ->
Log.d("SDK", "State: ${event.state}")
}
}
lifecycleScope.launch {
sdk.events
.filterIsInstance<MobileAgentEvent.TaskResult>()
.collect { event ->
Log.d("SDK", "Task result: ${event.resultText}")
}
}
// Connect to the server.
sdk.connect("wss://localhost:30005/ws")
// The SDK is automatically destroyed at the end of the use block.
}Option 2: Use a callback
This approach is for projects that do not use coroutines:
import com.wuying.mobileagentsdk.MobileAgentSdk
import com.wuying.mobileagentsdk.MobileAgentCallback
import com.wuying.mobileagentsdk.MobileAgentState
MobileAgentSdk().use { sdk ->
// Set the callback.
sdk.setCallback(object : MobileAgentCallback {
override fun onStateChanged(state: MobileAgentState) {
when (state) {
MobileAgentState.CONNECTED -> {
Log.d("SDK", "Connected")
sdk.executeTask("Please help me open settings", 10)
}
MobileAgentState.DISCONNECTED -> {
Log.d("SDK", "Disconnected")
}
else -> {}
}
}
override fun onTaskResult(
taskId: String,
result: String,
resultText: String,
success: Boolean,
actualSteps: Int
) {
Log.d("SDK", "Task complete: $resultText")
}
override fun onStream(
taskId: String,
step: Int,
agent: String,
contentType: String,
content: String
) {
Log.d("SDK", "Stream data: $content")
}
// Implement other callbacks as needed.
})
// Connect to the server.
sdk.connect("wss://localhost:30005/ws")
}Lifecycle management
The SDK implements the AutoCloseable interface, which supports the try-with-resources pattern:
// Recommended: Automatically manage the lifecycle.
MobileAgentSdk().use { sdk ->
sdk.connect("wss://localhost:30005/ws")
// Use the SDK...
} // Automatically calls close() to release resources.
// Or, manage the lifecycle manually.
val sdk = MobileAgentSdk()
try {
sdk.connect("wss://localhost:30005/ws")
// Use the SDK...
} finally {
sdk.close() // or sdk.destroy()
}API
Connection management
connect(url: String)
Connects to the server using a WebSocket URL:
sdk.connect("wss://localhost:30005/ws")connectWithTicket(ticket: String)
Connects using a ticket (requires server-side support):
sdk.connectWithTicket("your-ticket-here")disconnect()
Disconnects from the server:
sdk.disconnect()Task management
executeTask(task: String, maxSteps: Int): String
Executes a task and returns the task ID:
val taskId = sdk.executeTask("Please help me open the settings app", 10)pauseTask(taskId: String)
Pauses a task:
sdk.pauseTask(taskId)resumeTask(taskId: String)
Resumes a task:
sdk.resumeTask(taskId)cancelTask(taskId: String)
Cancels a task:
sdk.cancelTask(taskId)State query
getState(): MobileAgentState
Gets the current connection state:
val state = sdk.getState()
when (state) {
MobileAgentState.CONNECTED -> // Connected
MobileAgentState.CONNECTING -> // Connecting
MobileAgentState.DISCONNECTED -> // Disconnected
}Flow API
The SDK provides an events Flow that emits all event types. You can use filterIsInstance to filter for specific events:
// Collect all events.
sdk.events.collect { event ->
when (event) {
is MobileAgentEvent.StateChanged -> // Handle state changes.
is MobileAgentEvent.TaskResult -> // Handle task results.
is MobileAgentEvent.Stream -> // Handle stream data.
// ...
}
}
// Collect only specific types of events.
sdk.events
.filterIsInstance<MobileAgentEvent.StateChanged>()
.collect { event ->
// Handle only state changes.
}Event types:
Event type | Description |
| Connection state changed. |
| To-do list updated. |
| Task started. |
| Step started. |
| Stream data received. |
| Step ended. |
| Task result received. |
Callback API
The MobileAgentCallback interface provides the following callback methods. All methods have default empty implementations, allowing you to override only the ones you need:
Method | Description |
| Connection state changed |
| To-do list updated |
| Task started |
| Step started |
| Stream data received |
| Step ended |
| Task result received |
| Raw, unparsed JSON message. |
Data classes
MobileAgentState
An enum for the connection state:
enum class MobileAgentState {
DISCONNECTED, // Disconnected
CONNECTING, // Connecting
CONNECTED // Connected
}TodoItem
A data class for a to-do item:
data class TodoItem(
val id: String,
val title: String,
val status: String,
val details: String
)MobileAgentEvent
A sealed class representing all possible event types:
sealed class MobileAgentEvent {
data class StateChanged(val state: MobileAgentState)
data class TodoWrite(val todos: Array<TodoItem>, val count: Int)
data class TaskStart(val taskId: String, val task: String, val maxSteps: Int)
data class StepStart(val taskId: String, val step: Int)
data class Stream(val taskId: String, val step: Int, val agent: String, val contentType: String, val content: String)
data class StepEnd(val taskId: String, val step: Int, val status: String, val summary: String)
data class TaskResult(val taskId: String, val result: String, val resultText: String, val success: Boolean, val actualSteps: Int)
}Advanced usage
Parse agent thoughts
The MobileAgent uses the <conclusion> tag in its stream output to wrap the agent's thought process. You can extract this content from Stream events.
<conclusion> format
The agent uses XML tags to mark its thought process in its output:
<conclusion>This is the agent's thought process...</conclusion>Kotlin implementation
The following code demonstrates how to parse the thought content as it arrives:
import com.wuying.mobileagentsdk.MobileAgentSdk
import com.wuying.mobileagentsdk.MobileAgentEvent
class ConclusionParser {
private val conclusionBuffer = StringBuilder()
private var inConclusion = false
/**
* Processes stream data to extract content from <conclusion> tags.
*/
fun processStream(content: String): String? {
val result = StringBuilder()
for (char in content) {
if (!inConclusion) {
// Not inside a conclusion; check for the start of <conclusion>.
conclusionBuffer.append(char)
// Check for <conclusion>.
if (conclusionBuffer.length >= 12) {
if (conclusionBuffer.endsWith("<conclusion>")) {
inConclusion = true
conclusionBuffer.clear()
result.append("[Thought] ")
} else if (conclusionBuffer.length > 12) {
// Keep the last 11 characters for the next check.
conclusionBuffer.delete(0, conclusionBuffer.length - 11)
}
}
} else {
// Inside a conclusion; check for the end tag.
conclusionBuffer.append(char)
// Check for </conclusion>.
if (conclusionBuffer.length >= 13) {
if (conclusionBuffer.endsWith("</conclusion>")) {
// Found the end tag.
inConclusion = false
val conclusionContent = conclusionBuffer.toString()
.dropLast(13) // Remove </conclusion>.
.trim()
if (conclusionContent.isNotEmpty()) {
result.append(conclusionContent).append("\n")
}
conclusionBuffer.clear()
continue
} else if (conclusionBuffer.length > 13) {
// Keep the last 12 characters for the next check.
conclusionBuffer.delete(0, conclusionBuffer.length - 12)
}
}
// If not part of a potential end tag, output immediately.
if (!couldBeEndTag(conclusionBuffer.toString())) {
result.append(char)
}
}
}
return if (result.isNotEmpty()) result.toString() else null
}
/**
* Checks if the end of the buffer could be a prefix of the end tag.
*/
private fun couldBeEndTag(buffer: String): Boolean {
val prefixes = listOf(
"<", "</", "</c", "</co", "</con", "</conc",
"</concl", "</conclu", "</conclus", "</conclusi",
"</conclusio", "</conclusion"
)
return prefixes.any { buffer.endsWith(it) }
}
/**
* Resets the parser state.
*/
fun reset() {
conclusionBuffer.clear()
inConclusion = false
}
}
// Sample usage
MobileAgentSdk().use { sdk ->
val parser = ConclusionParser()
lifecycleScope.launch {
sdk.events
.filterIsInstance<MobileAgentEvent.Stream>()
.collect { event ->
// Parse and print the output.
val output = parser.processStream(event.content)
if (output != null) {
print(output)
}
}
}
sdk.connect("wss://localhost:30005/ws")
}Simplified version
If real-time output is not required, you can parse the content after it has been fully received:
/**
* Extracts all content from <conclusion> tags in a complete string.
*/
fun extractConclusions(content: String): List<String> {
val conclusions = mutableListOf<String>()
val regex = "<conclusion>(.*?)</conclusion>".toRegex(RegexOption.DOT_MATCHES_ALL)
for (match in regex.findAll(content)) {
val conclusion = match.groupValues[1].trim()
if (conclusion.isNotEmpty()) {
conclusions.add(conclusion)
}
}
return conclusions
}
// Sample usage
lifecycleScope.launch {
sdk.events
.filterIsInstance<MobileAgentEvent.TaskResult>()
.collect { event ->
// Extract the thought process from the task result.
val conclusions = extractConclusions(event.resultText)
println("Agent's thought process:")
conclusions.forEachIndexed { index, conclusion ->
println("${index + 1}. $conclusion")
}
}
}Output
[Thought] The user wants to open the Settings app. I need to find its package name and Activity.
[Thought] By querying the system app list, I found the Settings app: com.android.settings/.Settings
[Thought] Now I will start the Settings app.
Task complete: Successfully opened the settings app.Using Flow and callbacks together
You can use both approaches at the same time without interference:
MobileAgentSdk().use { sdk ->
// Use Flow to handle the main logic.
lifecycleScope.launch {
sdk.events
.filterIsInstance<MobileAgentEvent.TaskResult>()
.collect { event ->
// Handle the task result.
}
}
// Use a callback for logging.
sdk.setCallback(object : MobileAgentCallback {
override fun onRawMessage(rawJson: String) {
// Log raw messages for debugging.
Log.d("SDK_RAW", rawJson)
}
})
sdk.connect("wss://localhost:30005/ws")
}Error handling
try {
MobileAgentSdk().use { sdk ->
sdk.connect("wss://localhost:30005/ws")
}
} catch (e: IllegalStateException) {
Log.e("SDK", "Failed to create SDK", e)
} catch (e: Exception) {
Log.e("SDK", "Connection error", e)
}Check SDK status
val sdk = MobileAgentSdk()
// Check if the SDK is destroyed.
if (sdk.isDestroyed()) {
Log.w("SDK", "The SDK has been destroyed.")
}
// Check if the SDK is valid.
sdk.checkNotDestroyed() // Throws an exception if destroyed.ProGuard configuration
If you enable code obfuscation, you must preserve the SDK's native methods. The SDK already includes a proguard-rules.pro file, so you typically do not need extra configuration.
If you need to configure it manually:
# MobileAgentSDK
-keep class com.wuying.mobileagentsdk.** { *; }
-keepclassmembers class com.wuying.mobileagentsdk.MobileAgentSdk {
private native <methods>;
}Dependencies
The SDK has minimal runtime dependencies:
dependencies {
// The only runtime dependency
implementation("org.jetbrains.kotlinx:kotlinx-coroutines-android:1.7.3")
}To remain lightweight, the SDK does not include any UI frameworks, such as AppCompat or Material.