Author: An Infrastructure Engineer Who Values Work-Life Balance
Friday afternoon, 4:30 PM. My manager drops this in our Slack channel:
Ingress NGINX will be retired in March 2026.
Choosing to remain with Ingress NGINX after its retirement leaves you and your users vulnerable to attack. None of the available alternatives are direct drop-in replacements. Migration takes time and engineering resources. About half of cloud native environments will be affected. You have two months left to prepare.
—— Kubernetes Steering Committee & Security Response Committee
Official statement: https://kubernetes.io/blog/2026/01/29/ingress-nginx-statement/
"Can you put together a migration plan by Monday?"
I looked at our cluster: 60+ Ingress resources, scattered configuration snippets. My brain started calculating how many late nights this would take. This wasn't just an optimization task - it was a hard security compliance requirement. Once ingress-nginx stops receiving updates, we'd be running vulnerable infrastructure with no patches available.
Then I remembered the OpenClaw setup I'd configured recently, along with the migration skill the Higress community just released.
Facing the Ingress Nginx retirement, there are several alternatives: Traefik, Kong, Envoy Gateway, Higress, and more.
During my evaluation, I referenced Sealos's migration experience. They completed their migration back in 2023 at scale - 2000+ tenants in ultra-high concurrency scenarios. Their detailed technical comparison article is worth reading: Sealos: Why We Switched from Nginx to Envoy/Higress (2000 Tenants in Production)
This kind of production validation at scale gave me confidence that Higress had proven stability and performance in demanding environments.
Before starting, I needed to teach OpenClaw these migration capabilities. The setup was straightforward - just provide these two skill links to OpenClaw:
https://github.com/alibaba/higress/tree/main/.claude/skills/higress-OpenClaw-integration
https://github.com/alibaba/higress/tree/main/.claude/skills/higress-wasm-go-plugin
Once configured, OpenClaw had complete migration capabilities - including automatically developing WASM plugins when needed, without any extra effort from me.
The entire migration validation, I spent less than 10 minutes actually typing. OpenClaw ran everything in a local Kind cluster, validated all scenarios, and delivered a detailed runbook. All I had to do was review it and execute in production.
Friday evening, 5 PM. I left the office on time.
I know some of you might ask: If AI can do all this, why not let it operate directly on production?
The answer: Because I still want to have a job until retirement.
Production is a red line. No automation tool should directly touch production environments. This isn't about whether AI is capable - it's about operational principles.
The skill design philosophy aligns perfectly with my values: AI experiments in simulation, humans execute in production. Clear separation of concerns, with full traceability if something goes wrong.
I simply told OpenClaw in Discord:
Analyze our current K8s cluster ingress-nginx config and prepare for migration to Higress
OpenClaw automatically executed these commands:
kubectl get ingress -A -o yaml > ingress-backup.yaml
kubectl get configmap -n ingress-nginx ingress-nginx-controller -o yaml
kubectl get ingress -A -o yaml | grep "nginx.ingress.kubernetes.io" | sort | uniq -c
Within seconds, it produced an analysis report:
For those 3 snippets, OpenClaw detailed their functionality:
This step is the essence of the entire workflow.
OpenClaw automatically created a local K8s cluster using Kind, then:
# Commands executed by OpenClaw
kind create cluster --name higress-migration-test
# Install Higress (running in parallel with nginx)
helm install higress higress/higress \
-n higress-system --create-namespace \
--set global.ingressClass=nginx \
--set global.enableStatus=false
Key configuration: global.enableStatus=false
This parameter is crucial - it prevents Higress from updating the Ingress status field, avoiding conflicts with nginx. Both controllers coexist peacefully, each processing the same Ingress resources.
OpenClaw generated test scripts covering all 63 Ingress routes:
./scripts/generate-migration-test.sh > migration-test.sh
./migration-test.sh 127.0.0.1:8080
60 passed immediately because Higress natively supports standard nginx annotations.
For the remaining 3 using snippets, OpenClaw analyzed and provided solutions:
| Original nginx config | Higress solution |
|---|---|
| User-Agent detection for mobile redirect | Built-in custom-response plugin |
| IP whitelist | Built-in ip-restriction plugin |
| Basic Auth | Built-in basic-auth plugin |
None of these required custom WASM plugins! Higress built-in plugins covered them all.
The key: Original Ingress resources remained completely unchanged. OpenClaw auto-generated corresponding plugin configurations (WasmPlugin CRDs) and validated them in the Kind environment.
The above case went smoothly with built-in plugins. But we had another environment where things weren't so simple.
That environment's IoT platform had a Lua script implementing device online status reporting to Redis:
location /api/device/heartbeat {
access_by_lua_block {
local redis = require "resty.redis"
local red = redis:new()
-- Get encrypted device ID from request params
local encrypted_device = ngx.var.arg_d
if not encrypted_device then
ngx.exit(400)
end
-- AES decrypt device ID
local device_id = aes_decrypt(encrypted_device, secret_key)
if not device_id then
ngx.log(ngx.ERR, "Failed to decrypt device ID")
ngx.exit(403)
end
-- Connect to Redis and update online status
red:connect("redis.internal", 6379)
red:setex("device:online:" .. device_id, 300, os.time())
red:close()
}
}
This custom business logic (Redis operations + parameter decryption) couldn't be directly replaced by any built-in plugin. Previously, facing this situation meant either grinding through WASM development learning or finding excuses to delay migration.
The amazing part: I didn't have to do anything.
When OpenClaw analyzed compatibility and found this snippet couldn't be replaced by built-in plugins, it automatically invoked the higress-wasm-go-plugin skill and started the plugin development workflow. The entire process, I just watched from the sidelines:
OpenClaw analyzed the Lua code and extracted core logic:
d
Auto-generated type-safe Go code:
// Auto-generated WASM plugin core logic
func onHttpRequestHeaders(ctx wrapper.HttpContext, cfg config.DeviceOnlineConfig) types.Action {
// Read encrypted device ID parameter
encryptedDevice := getQueryParam(ctx, "d")
if encryptedDevice == "" {
proxywasm.SendHttpResponse(400, "device-online.missing_param",
nil, []byte("Missing device parameter"), -1)
return types.ActionPause
}
// AES decrypt device ID
deviceID, err := aesDecrypt(encryptedDevice, cfg.AESKey)
if err != nil {
proxywasm.LogErrorf("Failed to decrypt device ID: %v", err)
proxywasm.SendHttpResponse(403, "device-online.decrypt_failed",
nil, []byte("Invalid device ID"), -1)
return types.ActionPause
}
// Async Redis update
key := fmt.Sprintf("device:online:%s", deviceID)
timestamp := fmt.Sprintf("%d", time.Now().Unix())
err = cfg.RedisClient.SetEx(key, timestamp, cfg.TTL, func(response resp.Value) {
if response.Error() == nil {
proxywasm.LogInfof("Device %s online status updated", deviceID)
}
proxywasm.ResumeHttpRequest()
})
if err != nil {
proxywasm.LogErrorf("Redis call failed: %v", err)
return types.ActionContinue // Degradation: Redis failure doesn't block request
}
return types.HeaderStopAllIterationAndWatermark
}
Generated code includes:
# OpenClaw auto-executes
cd payment-auth-plugin
go mod tidy
GOOS=wasip1 GOARCH=wasm go build -o main.wasm ./
Build successful, generated main.wasm file.
# Build OCI image and push to Harbor
docker build -t harbor.internal/higress-plugins/device-online:v1 .
docker push harbor.internal/higress-plugins/device-online:v1
Auto-generated WasmPlugin CRD and deployed to Kind environment:
apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
name: device-online
namespace: higress-system
spec:
url: oci://harbor.internal/higress-plugins/device-online:v1
phase: UNSPECIFIED_PHASE
priority: 100
defaultConfig:
aesKey: "${DEVICE_AES_KEY}"
redisCluster: "redis.internal:6379"
ttl: 300
Then auto-ran tests:
# Normal request (valid encrypted device ID)
curl "http://localhost:8080/api/device/heartbeat?d=${ENCRYPTED_DEVICE_ID}"
# ✅ 200 OK
# Redis verification: redis-cli GET device:online:device123 -> current timestamp
# Missing parameter
curl "http://localhost:8080/api/device/heartbeat"
# ✅ 400 Bad Request
# Invalid encrypted data
curl "http://localhost:8080/api/device/heartbeat?d=invalid"
# ✅ 403 Invalid device ID
All passed.
| Phase | Time | Notes |
|---|---|---|
| Requirements analysis | 3s | AI parsing Lua code |
| Code generation | 10s | Full Go project generated |
| Build & compile | 3s | WASM compilation |
| Image push | 10s | Push to Harbor |
| Deploy & validate | 1m | CRD deployment + testing |
| Total | < 2 minutes | Zero manual coding |
Previously, this kind of work would take 1-2 days just to learn the proxy-wasm SDK, another 1-2 days for coding and debugging - at least a week total. Now under 2 minutes, and the generated code quality is better than what I'd write myself.
This is what AI-assisted development should be: not helping you autocomplete a few lines, but automating the entire DevOps workflow.
After all validation passed, OpenClaw generated a detailed runbook for me:
# Nginx to Higress Migration Runbook
## Pre-flight Checks
- [ ] Confirm all Ingress resources backed up
- [ ] Confirm DNS TTL lowered (recommend 60s)
- [ ] Confirm monitoring alerts configured
## Migration Steps
### 1. Install Higress (est. 5 min)
(Specific commands)
### 2. Deploy snippet replacement configs (est. 10 min)
For the 3 Ingress resources using snippets, implement equivalent functionality via plugins:
- Deploy custom-response plugin config (replaces User-Agent detection + mobile routing snippet)
- Deploy ip-restriction plugin config (replaces IP whitelist snippet)
- Deploy basic-auth plugin config (replaces auth snippet)
**Note: Original Ingress resources need NO modification, maintaining 100% compatibility**
### 3. Validate Higress routes (est. 10 min)
(Test commands and expected results)
### 4. Traffic switch (est. 5 min)
(DNS/LB switching steps)
### 5. Monitor metrics (ongoing)
(Checklist of metrics to watch)
## Rollback Plan
(One-click rollback commands)
Monday morning with the runbook in hand, 30 minutes to complete the migration:
Zero alerts. Zero rollback.
Key advantage: Original Ingress resources required absolutely no modification. Rollback is just switching back to nginx, configs remain intact, minimal risk.
Kind clusters cost almost nothing but catch 90% of problems. Let AI experiment in simulation - it's far safer than humans testing in production.
OpenClaw handled the analysis, validation, and documentation grunt work, but final execution was still me. This division makes sense - AI boosts efficiency, humans provide the safety net.
The nginx-to-higress-migration skill design is crystal clear:
If it were designed as "AI directly migrates your production," I'd never use it.
This is crucial - the runbook OpenClaw outputs traces back to actual test results in the Kind environment. Which Ingress needs changes, how to change it, expected outcomes - all validated, not AI making stuff up.
Using AI, the biggest fear is hallucination, especially for production operations. This workflow design minimizes hallucination risk: Run tests first, generate docs second, conclusions in docs backed by test logs.
The Kubernetes official statement is clear: You have two months.
Migration used to require proposal writing, reviews, scheduling, execution, validation - at least a week end-to-end. But the situation is different now - after ingress-nginx retirement, there are no more security updates. Continuing to use it means running naked.
The good news: with AI assistance, you can complete migration validation in a day. The core principle hasn't changed: production must be operated by humans, AI just helps you prepare and validate in advance.
If you're still on ingress-nginx, don't wait. Two months seems long, but between impact assessment, planning, resource coordination, staged rollout, stability observation... time really flies.
This skill can compress the "migration validation" step to 30 minutes, freeing up time for more important things - like convincing your manager to approve the budget, or planning rollback strategies in advance.
If you want to learn more about Alibaba Cloud API Gateway (Higress), please click: https://higress.ai/en/
Kubernetes Has Officially Announced Again, Emphasizing the Immediate Migration of Ingress NGINX
654 posts | 55 followers
FollowAlibaba Cloud Native Community - June 27, 2025
Alibaba Cloud Native Community - November 26, 2025
Alibaba Cloud Native Community - January 6, 2026
Alibaba Cloud Native Community - March 21, 2024
Alibaba Cloud Native Community - April 4, 2023
Alibaba Cloud Native Community - February 3, 2026
654 posts | 55 followers
Follow
DevOps Solution
Accelerate software development and delivery by integrating DevOps with the cloud
Learn More
Alibaba Cloud Flow
An enterprise-level continuous delivery tool.
Learn More
AI Acceleration Solution
Accelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn More
Offline Visual Intelligence Software Packages
Offline SDKs for visual production, such as image segmentation, video segmentation, and character recognition, based on deep learning technologies developed by Alibaba Cloud.
Learn MoreMore Posts by Alibaba Cloud Native Community