Measure the benefits of AI-assisted programming_AI Coding Assistant_AI Coding Assistant - AI Coding Assistant Lingma

This topic describes how to measure research and development (R&D) effectiveness, how AI-assisted programming improves R&D effectiveness, and how to measure the benefits of AI-assisted programming.

Understanding measurement: effectively differentiate metrics

To help R&D teams measure R&D effectiveness, we propose a trifaceted framework that contains metrics of the following types: capability and behavior, delivery, and business outcome. This framework can be used to evaluate R&D work in a holistic approach.

Capability and behavior metrics: Reflect how teams work and their capabilities, which affect delivery efficiency and can be improved. The metrics include unit test coverage, the number of issues found in code scanning, deployment frequency, cyclomatic complexity, and decoupling level.
Delivery effectiveness metrics: Reflect the efficiency of technical teams and correlate with business outcomes but do not directly affect business outcomes. The metrics include speed, throughput, and quality of delivery.
Business outcome metrics: Reflect actual business performance and directly relate to company revenue, scale, and costs. These metrics can be directly used for performance evaluation, which include Generally Accepted Accounting Principles (GAAP) revenue, gross profit, net profit, costs, and monthly active users.

What is R&D effectiveness and how is it measured?

R&D effectiveness refers to the ability of a software development team to consistently deliver high-quality value in a continuous and timely manner. R&D effectiveness includes the following aspects:

Ability to do the right things: Deliver effective value.
Ability to do things right: Continuity, speed, and quality. Quality is a constraint on speed, and continuity is a requirement for consistent speed and quality.

Measurement of R&D effectiveness

Effective metrics can drive the right improvement actions and shape future enhancement initiatives. The responsibilities of the team determine the types of metrics used. In most cases, technical teams are assessed based on the following aspects:

Efficiency: Speed (flow efficiency, the flow rate of a single work item) and throughput (resource efficiency, the number of work items completed within a specific period of time).
Quality: Delivery quality, which refers to the quality of deliverables after they leave the team.
Employee satisfaction: Employee satisfaction is a subjective survey metric that positively correlates with continuity.

How does AI-assisted programming improve R&D effectiveness?

AI-assisted programming utilizes AI to improve programming efficiency, which reflects coding skills and behaviors. The improvement can be measured based on the following aspects:

Coding efficiency: Proportion of developers' time spent on coding × Proportion of code that is AI-generated = Proportion of time saved by the code generator. For example, if developers spend 30% of their time in coding and 40% of the final code is generated by AI, then 12% of developers' time is saved.
Code defect density: Code defect density is a lagging indicator that reflects code quality, such as the number of defects per thousand lines of code.
Employee satisfaction in coding experience: Employee satisfaction in coding experience is a subjective metric that reflects how tools help employees with their programming work, such as tool usability and actual tool effectiveness.

Improvement in coding efficiency

In software development, coding efficiency is a key factor that impacts overall productivity. In addition to coding efficiency, factors such as requirements quality, collaboration processes, testing automation, and engineering capabilities for continuous integration and continuous delivery (CI/CD) affect development efficiency. The factors can be divided into two categories: individual efficiency, which refers to specific improvements, and collaboration efficiency, which refers to overall process improvements. From a problem-solving perspective, the factors can be divided into four key areas: bottlenecks, rework, technical debt, and incapacity.

Proportion of coding time and proportion of AI-generated code

Proportion of developers' time spent on coding × Proportion of code that is AI-generated = Proportion of time saved by the code generator. For example, if developers spend 30% of their time in coding and 40% of the final code is generated by AI, then 12% of developers' time is saved.

Survey results show that respondents spend 32% of time in writing or improving code, 35% on code management (19% on code maintenance, 12% on testing, and 4% on security issue response), and 23% on meetings and operational tasks.

Comprehensive improvement in development behavior

During the development phase, programmers write, debug, and test code and retrieve information. Each task has areas for improvement, and the improvements can be quantified by using the following formula:

Specify a hypothetical baseline, which is the cost per unit of work without AI tools. This baseline reflects relevant statistical data of the enterprise. If data is unavailable, industry statistical data can be referenced. Additional costs associated with AI-driven efficiency improvements must also be considered. Such as revising accepted code which may affect the accuracy of the manual baseline.

The following formula applies regardless of whether you use the first or second method: Behavior × Effect = Efficiency. Excessive statistical precision is unnecessary because it may lead to malpractice or additional management costs. The focus is to address the core questions and guide corresponding improvements.

Impact of improved development efficiency on overall R&D effectiveness

Based on Little's Law, Speed = Amount of work in progress (WIP)/Throughput, which means Throughput = Amount of WIP/Speed. The following aspects can be improved by using AI:

Delivery speed: When the speed of individual work items increases, throughput increases and the number of tasks in progress (task WIP) significantly decreases. This reduction benefits pending requirements and results in fewer pending requirements. The decrease in the number of pending requirements reduces the overall WIP in product development, which enhances R&D speed.
Delivery certainty: Improved speed leads to a corresponding improvement in the time certainty of software development.

Employee satisfaction in coding experience

To assess employee satisfaction with AI Coding Assistant, feedback can be collected by using surveys to identify areas for improvement. The design of the questionnaire involves three elements: user persona, user satisfaction, and user efficiency. Sample questionnaire:

User persona

How many years of programming experience do you have?
- Less than 1 year.
- 1-3 years.
- 3-5 years.
- 5-10 years.
- More than 10 years.
What is your main role at work?
- Junior developer.
- Intermediate developer.
- Senior developer.
- Architect.
- Technical manager.
- Other (please specify).
What programming languages do you commonly use? (multiple choices)
- Java.
- Python.
- C++.
- JavaScript.
- Go.
- Ruby.
- PHP.
- SQL.
- XML.
- Other (please specify).
How often do you use AI Coding Assistant?
- Multiple times a day.
- Once a day.
- A few times a week.
- A few times a month.
- Rarely.

User satisfaction

How satisfied are you with AI Coding Assistant? (Rate from 1 to 5, with 5 indicating the highest level of satisfaction)
What are your thoughts on the following descriptions of using AI Coding Assistant?
- Visually comfortable and operations align to my coding habits.
- No sense of being disturbed.
- Smooth learning curves and intuitive operations.
- Usefulness of the generated suggested code.
- Accurate responses to questions.
- Rapid code and Q&A generation.
- Rarely encounters errors.

User efficiency

To what extent has AI Coding Assistant improved your coding efficiency? (single choice)
- Significantly improved.
- Slightly improved.
- No change.
- Slightly decreased.
- Significantly decreased.
What are your thoughts on the following statements after using AI Coding Assistant?
- My work is more fulfilling.
- I am more confident when coding.
- Higher efficiency when using familiar programming languages.
- Faster progress when using unfamiliar programming languages.
- Reduced writing of repetitive code.
- Keeps me focused on writing without disturbing my flow.
- Reduced use of search engines.

The survey results may be formatted similarly to the example in the following image:

How to measure the effectiveness of AI-assisted programming?

To determine whether to use adoption rate or AI-generated code ratio, make sure that you understand the definitions and calculation logic of both methods:

	Adoption rate	AI-generated code ratio
Description	The ratio of the number of accepted code completions to the number of suggestions within a specific period of time. Formula: Adoption rate = Number of accepted code completions/Number of suggestions.	The ratio of the number of lines of AI-generated code lines accepted by developers to the number of lines of code changes within a specific period of time. Calculation formula is: Ratio of AI-generated code lines = Number of lines of AI-generated code/Number of lines of code changes.
Advantages	Reflects the quality of tool-recommended code in an intuitive manner. Can be used to evaluate the effectiveness of tool suggestions.	Intuitively reflects the actual amount of AI-generated code used. Can exclude invalid code acceptance and focus only on the code used.
Disadvantages	The number of suggestions (denominator) is determined by the tool. Frequent suggestions may result in a low adoption rate. The number of accepted code completions (numerator) does not always represent true value because developers may accept invalid or unnecessary code.	Requires regular identification of code modifications, which increases calculation complexity. Needs to differentiate between AI-generated code and manually written code.

Why doesn't use the proportion of AI-generated code committed to the repository for calculation? Key reasons:

Version control systems cannot identify: Version control systems cannot differentiate between AI-generated code and manually written code. The code is committed by the individual who submits the code, not by AI.
Increased complexity: Pursuing commit rates makes measurement unnecessarily complex. Efforts to achieve concurrent builds in the production environment introduce additional variables.

To measure the effectiveness of AI-assisted programming, we recommend that you use the ratio of AI-generated code completions. If the ratio of AI-generated code completions cannot be determined, the adoption rate is an acceptable alternative. However, take note that overemphasis on statistical precision is unnecessary.

Specific methods to measure the benefits of AI coding tools

To effectively measure the benefits of AI Coding Assistant on efficiency, observations and analyses can be performed based on the following aspects:
- Tool usage:
  - Number of developers: Count the number of developers who use AI coding tools.
  - Popularity: Count the number of active users and their frequency of activity.
- Behavior: Frequency of using specific features, count the frequency of using specific features such as code completion, unit test generation, and code comment generation.
- Effect: Adoption or effective generation ratio, calculate the ratio of the number of lines of AI-generated code that are accepted to the total number of lines of code changes.
Improvement in development efficiency: Observe changes in coding efficiency before and after developers use AI coding tools to establish correlation. You can also use the following formula to obtain statistics on individual efficiency improvements: "Behavior × Effect = Efficiency".
Contribution to R&D efficiency: R&D efficiency involves multiple aspects, such as requirements quality, collaboration processes, testing automation, and CI/CD engineering capabilities. However, improved efficiency in the development phase significantly contributes to overall R&D efficiency.
Establish causal relationships through systems thinking: Analyze causal relationships between behaviors, efficiency, and outcomes from a holistic system perspective. Identify key leverage points, which are the areas of improvement that provide the greatest benefits.
Measurement principles: Measurement metrics must address a fundamental question: Do AI coding tools truly improve development efficiency? The measurement indicators must guide appropriate improvement actions rather than cause misdirection.