All Products
Search
Document Center

Terraform:Automation practices - Top developer examples

Last Updated:Mar 06, 2025

This video introduces how to use Terraform to achieve automation from the perspective of an independent developer. This video also provides developers with demonstrations.

You can refer to the following transcript:

Hello, welcome back to AutoTalk, the Alibaba Cloud Open Platform Automation Series. In this episode, we present a case study on top automation developers. I'm Tiankai from Alibaba Cloud Open Platform. Before we begin, let's revisit why we pursue infrastructure automation and what exactly infrastructure automation is. First, I'll introduce two core concepts. The first is that infrastructure automation essentially transforms our business and repetitive tasks into code through various tools, enabling machines to execute these tasks more efficiently and in a standardized manner. The second point is that during this process, there are numerous options when it comes to tool selection. Whether we use APIs, SDKs, cloud control APIs, or tools like Terraform and ROS, the ultimate goal is for organizations to adopt the right technology stack and solutions that fit their needs, implement Infrastructure as Code (IaC), and improve efficiency. This diagram represents the overall landscape of IaC. However, since today's topic focuses on top developers, we will set aside discussions about organizational collaboration, processes, version control, and code management. Instead, we will zoom in on whether, as an independent operations engineer or SRE, I can solve some of my individual challenges using automation tools. We will have separate videos later to address organizational issues. Therefore, what I've highlighted in the red box refers to leveraging a series of IaC code via a specific tool, such as Terraform, which we will use as an example. We'll explore common scenarios encountered during daily operations.

When discussing technical scenarios, I believe everyone faces different challenges in their daily work. To improve top developer efficiency, we aim to provide better technical support for every SRE or operations engineer who independently manages significant responsibilities. This way, they become more efficient and standardized, and free up time to work on more interesting projects.

Speaking of scenarios, here are some typical examples. The first is service activation. As you may know, when you create a new Alibaba Cloud account and want to purchase a service, you must first activate the service. Some services require activation before they can be purchased and used. These scenarios map internally within organizations to situations like project setup or employee onboarding, where individuals often need to manually activate services step-by-step. This process is actually quite easy to automate, and we will delve into examples later.

The second scenario involves access control. Access control addresses the duration of an employee's permissions and how automated permission configurations can ensure that employees with different roles, such as maintenance, business, and operations roles, are automatically granted appropriate permissions once they join the organization. The third and fourth scenarios focus on cloud infrastructure. Can we quickly build infrastructure? This is one of the main goals of automation. The fifth scenario is DevOps. After the infrastructure is deployed, can my DevOps processes work together to deploy applications onto the infrastructure? This enables an end-to-end automation transformation from the infrastructure layer to the application layer, further improving delivery and change efficiency.

Now let's take a look at the first scenario: activating services for a new account. This scenario refers to situations within an organization where we often need to create new accounts on Alibaba Cloud. As mentioned earlier, certain services require activation before you can purchase and use them on Alibaba Cloud. Typical scenarios include new employee onboarding, launching new business lines, or establishing new departments, all of which might necessitate the creation of new accounts.

Before the operations team hands over these accounts to the business team, are these accounts adequately prepared? Ideally, the business team shouldn't have to worry about or handle the pre-processing steps like service activation. Instead, the accounts should be ready for immediate use in business operations. Our goal is this: if I'm part of the operations team, I want to leverage automation to codify these tasks. I don't want to manually click through the console for each account, repeating the process dozens of times. That's tedious and inefficient. Moreover, during this manual process, it's easy to either grant excessive permissions, activate unnecessary services, miss activating required services, or even activate the wrong ones. These mistakes lead to rework and further reduce efficiency.

Let's look at an example together. Terraform is actually a very useful tool. It provides Alibaba Cloud customers with a pre-built module specifically designed for batch activation of new accounts. You can find this Terraform module in the GitHub repository. You can adapt the code to your local environment without making many changes.

Let me pull the code down first. We can take a closer look together. In the example folder, there's the main file, the outputs file, and the variables file. In the outputs file, you'll see a list of variables representing the services I want to activate If a service is not required, I just need to comment it out. For instance, in this demo, if I don't need to activate these two services, I'll comment them out. If the rest are marked as "on," I'll leave them as part of the baseline for account activation. This way, every time a new account is created, I run this script, and it will automatically activate the necessary services for that account.

Since this is a demo environment, we're using an AccessKey pair. We'll copy the AccessKey pair. However, it's important to emphasize that this is just a demo. In real-world applications and production environments, we strongly advise against storing AccessKey pairs in plain text within the file. Instead, you can use local caching, runtime environment variables, or AccessKey pair management services to protect AccessKey pairs more effectively and manage regular key rotations. Now, let's move on to the init phase. During this process, I made a small mistake. I forgot to save the provider file, so it threw an error saying the file couldn't be found. Let me fast-forward through this. After saving the provider file, I'll run the command again.

Now you can see the preview results. Since Terraform supports the plan and apply commands, during the preview phase, you can see which services will be activated. Next, I'll run the apply command to activate these services for my current account. As you can see, the services have been activated. The entire process is quite simple and implemented in a lightweight manner.

The second scenario is using automation to quickly create cloud infrastructure. As you know, in the cloud era, the efficiency of infrastructure interaction is increasing rapidly. For internet businesses, gaming businesses, and certain specialized businesses, the demand for faster delivery times is growing because the market is changing faster.

In this scenario, the business team requires the operations team to create an environment every two weeks. Once testing is complete, the environment should be promptly released. Then, in the next two-week cycle, the environment needs to be recreated. The frequent deletion and reconfiguration of environments can be extremely painful for the operations team without automation. Moreover, when multiple business teams simultaneously request environments, it becomes a very frustrating operational experience. The operations team hopes not only for rapid delivery but also for improved baseline accuracy.

As an operations engineer, if this task falls on your shoulders, what would you do? In this scenario, we expect the operations team to systematically organize their daily architecture. For example, commonly used services like ECS, Internet-facing SLB, VPC, security groups, and Auto Scaling can all be transformed into automated code. Let's take a look at the code together.

In this scenario, the operations team has provided the business team with a set of code that includes a VPC, a vSwitch, and a security group, along with an inbound policy. A public IP address is assigned, and port 8080 is opened to deploy the application. At this point, my environment is already halfway configured. Next, I proceed to configure the auto scaling group and define its rules. After completing this, I deploy a "hello world" application on Auto Scaling. Let's take a look: first, I execute the Terraform init command. Once Terraform initializes the environment, I let Terraform build the infrastructure for me. During the preview phase, it tells me that nine resources will be created. What are these nine resources?

Once I execute the command, I could grab a cup of coffee or some hot water and wait for the script to finish running. You can see that it's creating resources in an orderly sequence based on dependencies: starting with the VPC, then the security group, followed by the Internet-facing SLB instance. It brings up auto scaling group, configures the corresponding rules, and deploys the "hello world" application on the infrastructure. This simplified scenario essentially achieves what we discussed earlier regarding automated application deployment. However, in real-world business processes, we typically start with infrastructure automation and infrastructure management scenarios. Once the creation is complete, you can see that a public IP address is generated. This public IP address is where my "hello world" application is deployed. To save time, I'll skip ahead since server startup takes a while, and the process simply refreshes during this period.

Now, assuming our application has been successfully deployed to the server, you can see that the "hello world" application is live. In this scenario, the operations team can hand over the infrastructure and its metadata to the application team for regular use. At this point, if we want to build our own CMDB or multi-cloud management platform in the future, this metadata becomes foundational and critical. Next, we need to release the environment.

After two hours of testing, the business team might say, "We no longer need this environment." They request the operations team to release the environment promptly to save costs for the organization. In this process, the operations team only needs to execute a destroy command. The resources requested in the application form will be destroyed based on the predefined deletion logic. This completes the entire lifecycle of the resources from creation and destruction. Let's wait until the deletion is complete.

SLB, security group, vSwitch, and the last resource. Once the deletion is complete, we can verify the result by visiting the website to check if the "hello world" application we deployed earlier is still accessible. As you can see, we created nine resources, and now we are deleting those same nine resources. Returning to the console, I notice that the service and public IP address no longer exist, confirming that the cleanup is complete. Now let's move forward.

The third scenario focuses on the baseline management of cloud resources on single instances. Why single instances? It aligns with today's theme: serving developers or individual operations engineers first. Managing existing resources is often a painful challenge. Before adopting IaC and automation tools, many organizations already have resources running in the cloud. Typically, businesses come first, and IaC frameworks are built later. Therefore, reverse-engineering cloud resources into code and managing their IaC code becomes critical.

The goal here is for an operations engineer to import existing cloud resources into the local environment and manage their full lifecycle. Let's look at a demo. In Demo 2, we recreate the environment with the nine resources demonstrated in the previous scenario.

In Demo 3, we assume that the nine resources created in Demo 2 represent historically existing resources from the console. During this process, I configure the import of these console resources. The import process is straightforward: I simply define the resource type and resource ID and execute the command. Terraform generates the corresponding IaC code and creates the tfstate file, which contains the resource metadata. Once these are aligned, I enter a virtuous cycle where I can further manage my infrastructure and align it with the baseline of existing infrastructure.

This command, executed in demo 3, imports a series of resources based on their IDs. There may be some rare mismatches. For example, if the online value is 0 but the range is 20 to 60, we adjust the size to fit the environment. You can see that there are nine resources to import. After running Terraform apply, they are successfully imported and aligned with the current baseline. At this point, the tfstate files in Demo 2 and Demo 3 are synchronized. Now, regardless of which project you're working on, you can manage the overall automated baseline. Establishing this baseline significantly improves operational efficiency, accuracy, and standardization for daily operations. That is the end of this episode. If you have any questions or ideas about cloud automation, feel free to scan the QR code at the bottom of the screen to join our DingTalk group and connect with us. We look forward to your feedback and hope to see you in the next episode.