Key Basics of CPU
Date: Oct 31, 2022
Related Tags:1. Specify and view CPU options
2. Troubleshoot high CPU utilization
Abstract: This article introduces several key basic knowledge of CPU.
1. The running process of the program is actually the execution process of a large number of instructions that are involved but not involved in the program.
When the part to be executed by the program is loaded into the memory, the CPU needs to fetch the instruction from the memory, then decode the instruction (so as to know the type and operand, which is simply understood as the CPU needs to know what instruction this is), and then execute the instruction. Then the next instruction is fetched, decoded, executed, and so on until the program exits.
2. The three processes of fetching, decoding and executing constitute the basic cycle of a CPU.
3. Each CPU has a set of special instructions that it can execute (note that this part of the instructions is provided by the CPU, which can be viewed by the CPU-Z software).
It is precisely because the instruction sets of different CPU architectures are different that x86 processors cannot execute ARM programs, and ARM programs cannot execute x86 programs. (Intel and AMD both use the x86 instruction set, and the vast majority of mobile phones use the ARM instruction set).
Note: The software and hardware levels of the instruction set are divided: the hardware instruction set is the executable instruction set provided by the CPU itself at the hardware level. The software instruction set refers to the instructions provided by the language program library. As long as the program library of the language is installed, the instructions can be executed.
4. Since the time for the CPU to access the memory to obtain instructions or data is much longer than the time it takes to execute the instructions, some general-purpose registers are provided inside the CPU to store key variables, temporary data and other information.
Therefore, the CPU needs to provide some specific instructions that can read data from memory into registers and store register data into memory.
In addition, it is necessary to provide basic operation instructions such as addition, subtraction, not/and/or, and multiplication and division operations are calculated (see ALU Functions for the supported basic operation instructions), so the speed of multiplication and division is much slower. This is also the reason why the number of times of addition and subtraction is often ignored in the algorithm when considering the time complexity, and the number of times of multiplication and division is considered.
5. In addition to general registers, there are some special registers. Typical such as:
PC: program counter, which represents the program counter, which stores the memory address of the next instruction to be fetched. After the instruction is fetched, the register will be updated to point to the next instruction.
Stack pointer: Points to the top of the current stack in memory, and contains the stack frame of each function execution process. The stack frame saves the function-related input parameters, local variables, and some temporary variables that are not saved in registers.
PSW: program status word, which represents the program status word. Some control bits are stored in this register, such as the priority of the CPU, the working mode of the CPU (user mode or kernel mode), etc.
6. When the CPU performs process switching, it is necessary to write the state data related to the current process in the register to the corresponding location of the memory (the stack space of the process in the kernel) and save it. When switching back to the process, it needs to be saved from the memory. copy back into the register. That is, when the context is switched, it is necessary to protect the scene and restore the scene.
7. In order to improve performance, the CPU is no longer a single instruction fetch->decode->execution route, but provides independent value units, decoding units and execution units for these three processes respectively. This forms the pipeline mode.
For example, the last unit of the pipeline, the execution unit, is executing the nth instruction, and the previous unit can decode the n+1th instruction, and then the previous unit, the instruction fetch unit, can read the n+2th instruction instruction. This is a three-stage pipeline, and there may be longer pipeline patterns.
8. The more optimized CPU architecture is the superscalar architecture (superscalar architecture). This architecture separates the fetch, decode, and execution units, with a large number of execution units, and then each fetch + decode part runs in parallel. For example, there are two parallel working lines of instruction fetch + decoding, and each working line puts the decoded instruction into a cache buffer and waits for the execution unit to fetch and execute.
9. Except for embedded systems, most CPUs have two working modes: kernel mode and user mode. These two operating modes are controlled by a binary bit on the PSW register.
10. The CPU in kernel mode can execute all the instructions in the instruction set and use all the functions of the hardware.
11. The CPU in user mode is only allowed to execute some instructions in the instruction set. Generally speaking, all executions related to IO and memory protection are prohibited in user mode. In addition, some other privileged instructions are also prohibited. For example, the mode setting control bit of PSW cannot be set to kernel mode in user mode. .
12. If the user-mode CPU wants to perform a privileged operation, it needs to initiate a system call to request the kernel to help complete the corresponding operation. In fact, after initiating a system call, the CPU will execute the trap instruction into the kernel. When the privileged operation is completed, an instruction needs to be executed to return the CPU to user mode.
13. In addition to system calls trapped in the kernel, it is more that the hardware will cause trap behavior to be trapped in the kernel, so that the CPU control can be returned to the operating system, so that the operating system can decide how to handle hardware exceptions.
1. The CPU is used for operations (addition operation +, multiplication operation*, logical operation and not or, etc.), such as c=a+b.
2. Operation involves data input (input), processing, and data output (output). A and b are input data, addition operation is processing, and c is output data.
3. The CPU needs to use something called memory (that is, various registers) to store input and output data. The following are several common registers (some are also introduced earlier)
MAR: memory address register, which stores the address in memory of the data to be accessed, and stores the address value
MDR: memory data register, saves the data read from the memory or the data to be written to the memory, saves the data value
AC: Accumulator, saves the intermediate results of arithmetic operations and logical operations, saves the data value
PC: Program Counter, saves the address of the next instruction to be executed, saves the address value
CIR: current instruction register, saves the currently executing instruction
4. The CPU also puts some commonly used basic computing tools (such as adders) into the CPU, which is responsible for the operation, called the Arithmetic Logic Unit (ALU, Arithmetic Logic Unit).
5. There is also a controller (CU, Control Unit) in the CPU, which is responsible for sending the data in the memory to the ALU for operation, and storing the result of the operation back into the memory.
The controller also contains some control signals.
6. The reason why the controller knows where to put the data and what operation to do (such as addition or logical operation?) is told to the controller by instructions, each instruction corresponds to a basic operation, such as an addition operation corresponds to an instruction.
7. For example, copy the values in the two MDR registers (which hold the two data from the memory) to the ALU, then perform the addition operation according to the specified operation instruction, copy the operation result to an MDR register, and finally write to memory.
8. This is the von Neumann structure diagram, which is the structure diagram of the current computer.

1. The physical number of CPUs is determined by the number of slots on the motherboard. Each CPU can have multiple cores, and each core may have multiple threads.
2. Each core of a multi-core CPU (each core is a small chip) is an independent CPU in the eyes of the OS.
3. For hyper-threaded CPU, each core CPU can have multiple threads (the number is two, such as 1 core with dual threads, 2 cores with 4 threads, 4 cores with 8 threads), and each thread is a virtual logical CPU (For example, under Windows, it is called by the name of the logical processor), and each thread is also an independent CPU in the eyes of the OS.
This is tricking the operating system into physically still only having 1 core, but from the perspective of a hyper-threaded CPU, it thinks that its hyper-threading will speed up the execution of the program.
4. To take advantage of hyper-threading, the operating system needs to have special optimization for hyper-threading.
5. Multi-threaded CPUs are more powerful than non-multi-threaded CPU cores, but each thread is not enough to compare with independent CPU cores.
6. The multi-threaded CPUs on each core share the CPU resources of that core.
For example, assuming that each core CPU has only one "engine" resource, then after thread 1, the virtual CPU, uses this "engine", thread 2 cannot use it and can only wait.
Therefore, the main purpose of hyperthreading technology is to increase more independent instructions on the pipeline (see the explanation of the pipeline above), so that thread 1 and thread 2 will not compete for the core CPU resources on the pipeline as much as possible. Therefore, hyperthreading technology takes advantage of the superscalar (superscalar) architecture.
7. Multithreading means that each core can have the state of multiple threads. For example, thread 1 of a core is idle and thread 2 is running.
8. Multithreading does not provide parallel processing in the true sense. Each core CPU can still only run one process at a certain time, because thread 1 and thread 2 share the CPU resources of a certain core. It can be simply considered that each core CPU has a unique resource in the ability to execute processes independently. Thread 1 acquires this resource, but thread 2 cannot acquire it.
However, thread 1 and thread 2 can execute in parallel in many ways. For example, instructions can be fetched in parallel, decoded in parallel, and executed in parallel. So although a single core can only execute one process at a time, thread 1 and thread 2 can help each other to speed up the execution of the process.
Moreover, if thread 1 acquires the ability of the core to execute the process at a certain time, assuming that the process issues an IO request at this moment, then the ability to execute the process mastered by thread 1 can be acquired by thread 2, that is, switch to thread 2. This is a switch between threads of execution and is very lightweight. (WIKI: if resources for one process are not available, then another process can continue if its resources are available)
9. A phenomenon may occur with multithreading: if there are two processes to be scheduled on a 2-core 4-thread CPU, then only two threads will be running. If the two threads are on the same core, the other The core is completely idling, in a wasteful state. The more desirable result is to have a CPU on each core that schedules the two processes separately.
1. The highest-speed cache is the CPU's registers. They are the same material as the CPU, closest to the CPU or closest to the CPU, and there is no delay (<1ns) to access them. But the capacity is small, less than 1kb.
32bit: 32*32 bits = 128 bytes
64bit: 64*64 bits = 512 bytes
2. Below the register is the cache of the CPU. It is divided into L1 cache, L2 cache, and L3 cache. The speed of each layer decreases by orders of magnitude, and the capacity is also increasing.
3. Each core has its own L1 cache. There are two types of L1 cache: L1 instruction cache (L1-icache) and L1 data cache (L1-dcache). The L1 instruction cache is used to store decoded instructions, and the L1 data cache is used to store very frequently accessed data.
4. The L2 cache is used to store recently used memory data. More strictly speaking, it stores data that is likely to be used by the CPU in the future.
5. Each core of most multi-core CPUs has its own L2 cache, but there are also designs where multiple cores share the L2 cache. In any case, L1 is private to each core (but shared for multiple threads within a core).
Related Tags:1. Specify and view CPU options
2. Troubleshoot high CPU utilization
Abstract: This article introduces several key basic knowledge of CPU.
About CPU and program execution
1. The running process of the program is actually the execution process of a large number of instructions that are involved but not involved in the program.
When the part to be executed by the program is loaded into the memory, the CPU needs to fetch the instruction from the memory, then decode the instruction (so as to know the type and operand, which is simply understood as the CPU needs to know what instruction this is), and then execute the instruction. Then the next instruction is fetched, decoded, executed, and so on until the program exits.
2. The three processes of fetching, decoding and executing constitute the basic cycle of a CPU.
3. Each CPU has a set of special instructions that it can execute (note that this part of the instructions is provided by the CPU, which can be viewed by the CPU-Z software).
It is precisely because the instruction sets of different CPU architectures are different that x86 processors cannot execute ARM programs, and ARM programs cannot execute x86 programs. (Intel and AMD both use the x86 instruction set, and the vast majority of mobile phones use the ARM instruction set).
Note: The software and hardware levels of the instruction set are divided: the hardware instruction set is the executable instruction set provided by the CPU itself at the hardware level. The software instruction set refers to the instructions provided by the language program library. As long as the program library of the language is installed, the instructions can be executed.
4. Since the time for the CPU to access the memory to obtain instructions or data is much longer than the time it takes to execute the instructions, some general-purpose registers are provided inside the CPU to store key variables, temporary data and other information.
Therefore, the CPU needs to provide some specific instructions that can read data from memory into registers and store register data into memory.
In addition, it is necessary to provide basic operation instructions such as addition, subtraction, not/and/or, and multiplication and division operations are calculated (see ALU Functions for the supported basic operation instructions), so the speed of multiplication and division is much slower. This is also the reason why the number of times of addition and subtraction is often ignored in the algorithm when considering the time complexity, and the number of times of multiplication and division is considered.
5. In addition to general registers, there are some special registers. Typical such as:
PC: program counter, which represents the program counter, which stores the memory address of the next instruction to be fetched. After the instruction is fetched, the register will be updated to point to the next instruction.
Stack pointer: Points to the top of the current stack in memory, and contains the stack frame of each function execution process. The stack frame saves the function-related input parameters, local variables, and some temporary variables that are not saved in registers.
PSW: program status word, which represents the program status word. Some control bits are stored in this register, such as the priority of the CPU, the working mode of the CPU (user mode or kernel mode), etc.
6. When the CPU performs process switching, it is necessary to write the state data related to the current process in the register to the corresponding location of the memory (the stack space of the process in the kernel) and save it. When switching back to the process, it needs to be saved from the memory. copy back into the register. That is, when the context is switched, it is necessary to protect the scene and restore the scene.
7. In order to improve performance, the CPU is no longer a single instruction fetch->decode->execution route, but provides independent value units, decoding units and execution units for these three processes respectively. This forms the pipeline mode.
For example, the last unit of the pipeline, the execution unit, is executing the nth instruction, and the previous unit can decode the n+1th instruction, and then the previous unit, the instruction fetch unit, can read the n+2th instruction instruction. This is a three-stage pipeline, and there may be longer pipeline patterns.
8. The more optimized CPU architecture is the superscalar architecture (superscalar architecture). This architecture separates the fetch, decode, and execution units, with a large number of execution units, and then each fetch + decode part runs in parallel. For example, there are two parallel working lines of instruction fetch + decoding, and each working line puts the decoded instruction into a cache buffer and waits for the execution unit to fetch and execute.
9. Except for embedded systems, most CPUs have two working modes: kernel mode and user mode. These two operating modes are controlled by a binary bit on the PSW register.
10. The CPU in kernel mode can execute all the instructions in the instruction set and use all the functions of the hardware.
11. The CPU in user mode is only allowed to execute some instructions in the instruction set. Generally speaking, all executions related to IO and memory protection are prohibited in user mode. In addition, some other privileged instructions are also prohibited. For example, the mode setting control bit of PSW cannot be set to kernel mode in user mode. .
12. If the user-mode CPU wants to perform a privileged operation, it needs to initiate a system call to request the kernel to help complete the corresponding operation. In fact, after initiating a system call, the CPU will execute the trap instruction into the kernel. When the privileged operation is completed, an instruction needs to be executed to return the CPU to user mode.
13. In addition to system calls trapped in the kernel, it is more that the hardware will cause trap behavior to be trapped in the kernel, so that the CPU control can be returned to the operating system, so that the operating system can decide how to handle hardware exceptions.
About the basic composition of CPU
1. The CPU is used for operations (addition operation +, multiplication operation*, logical operation and not or, etc.), such as c=a+b.
2. Operation involves data input (input), processing, and data output (output). A and b are input data, addition operation is processing, and c is output data.
3. The CPU needs to use something called memory (that is, various registers) to store input and output data. The following are several common registers (some are also introduced earlier)
MAR: memory address register, which stores the address in memory of the data to be accessed, and stores the address value
MDR: memory data register, saves the data read from the memory or the data to be written to the memory, saves the data value
AC: Accumulator, saves the intermediate results of arithmetic operations and logical operations, saves the data value
PC: Program Counter, saves the address of the next instruction to be executed, saves the address value
CIR: current instruction register, saves the currently executing instruction
4. The CPU also puts some commonly used basic computing tools (such as adders) into the CPU, which is responsible for the operation, called the Arithmetic Logic Unit (ALU, Arithmetic Logic Unit).
5. There is also a controller (CU, Control Unit) in the CPU, which is responsible for sending the data in the memory to the ALU for operation, and storing the result of the operation back into the memory.
The controller also contains some control signals.
6. The reason why the controller knows where to put the data and what operation to do (such as addition or logical operation?) is told to the controller by instructions, each instruction corresponds to a basic operation, such as an addition operation corresponds to an instruction.
7. For example, copy the values in the two MDR registers (which hold the two data from the memory) to the ALU, then perform the addition operation according to the specified operation instruction, copy the operation result to an MDR register, and finally write to memory.
8. This is the von Neumann structure diagram, which is the structure diagram of the current computer.

About CPU multi-core and multi-threading
1. The physical number of CPUs is determined by the number of slots on the motherboard. Each CPU can have multiple cores, and each core may have multiple threads.
2. Each core of a multi-core CPU (each core is a small chip) is an independent CPU in the eyes of the OS.
3. For hyper-threaded CPU, each core CPU can have multiple threads (the number is two, such as 1 core with dual threads, 2 cores with 4 threads, 4 cores with 8 threads), and each thread is a virtual logical CPU (For example, under Windows, it is called by the name of the logical processor), and each thread is also an independent CPU in the eyes of the OS.
This is tricking the operating system into physically still only having 1 core, but from the perspective of a hyper-threaded CPU, it thinks that its hyper-threading will speed up the execution of the program.
4. To take advantage of hyper-threading, the operating system needs to have special optimization for hyper-threading.
5. Multi-threaded CPUs are more powerful than non-multi-threaded CPU cores, but each thread is not enough to compare with independent CPU cores.
6. The multi-threaded CPUs on each core share the CPU resources of that core.
For example, assuming that each core CPU has only one "engine" resource, then after thread 1, the virtual CPU, uses this "engine", thread 2 cannot use it and can only wait.
Therefore, the main purpose of hyperthreading technology is to increase more independent instructions on the pipeline (see the explanation of the pipeline above), so that thread 1 and thread 2 will not compete for the core CPU resources on the pipeline as much as possible. Therefore, hyperthreading technology takes advantage of the superscalar (superscalar) architecture.
7. Multithreading means that each core can have the state of multiple threads. For example, thread 1 of a core is idle and thread 2 is running.
8. Multithreading does not provide parallel processing in the true sense. Each core CPU can still only run one process at a certain time, because thread 1 and thread 2 share the CPU resources of a certain core. It can be simply considered that each core CPU has a unique resource in the ability to execute processes independently. Thread 1 acquires this resource, but thread 2 cannot acquire it.
However, thread 1 and thread 2 can execute in parallel in many ways. For example, instructions can be fetched in parallel, decoded in parallel, and executed in parallel. So although a single core can only execute one process at a time, thread 1 and thread 2 can help each other to speed up the execution of the process.
Moreover, if thread 1 acquires the ability of the core to execute the process at a certain time, assuming that the process issues an IO request at this moment, then the ability to execute the process mastered by thread 1 can be acquired by thread 2, that is, switch to thread 2. This is a switch between threads of execution and is very lightweight. (WIKI: if resources for one process are not available, then another process can continue if its resources are available)
9. A phenomenon may occur with multithreading: if there are two processes to be scheduled on a 2-core 4-thread CPU, then only two threads will be running. If the two threads are on the same core, the other The core is completely idling, in a wasteful state. The more desirable result is to have a CPU on each core that schedules the two processes separately.
About the cache on the CPU
1. The highest-speed cache is the CPU's registers. They are the same material as the CPU, closest to the CPU or closest to the CPU, and there is no delay (<1ns) to access them. But the capacity is small, less than 1kb.
32bit: 32*32 bits = 128 bytes
64bit: 64*64 bits = 512 bytes
2. Below the register is the cache of the CPU. It is divided into L1 cache, L2 cache, and L3 cache. The speed of each layer decreases by orders of magnitude, and the capacity is also increasing.
3. Each core has its own L1 cache. There are two types of L1 cache: L1 instruction cache (L1-icache) and L1 data cache (L1-dcache). The L1 instruction cache is used to store decoded instructions, and the L1 data cache is used to store very frequently accessed data.
4. The L2 cache is used to store recently used memory data. More strictly speaking, it stores data that is likely to be used by the CPU in the future.
5. Each core of most multi-core CPUs has its own L2 cache, but there are also designs where multiple cores share the L2 cache. In any case, L1 is private to each core (but shared for multiple threads within a core).
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00