No massive computing power metaverse

Introduction: On September 26, 2022, the immersive talk show "Breaking the "Metaverse"" will be broadcast simultaneously on multiple official channels such as Alibaba Cloud Developer Community & Alibaba Cloud Developer Video Number & Alibaba Cloud Kankan. NVIDIA He Zhan, head of China's Omniverse business, Lou Yanxin, founder of Sandcore Technology, and Zhang Xintao, Alibaba Cloud elastic computing product expert, shared industry understanding, implementation cases, bottleneck challenges, and more.

Pioneer of the digital world × technology artist × living fossil of cloud computing, what kind of views can the three big coffees collide with? Click on the video below to watch the full episode of the show.

Video: "Breaking the "Metaverse" | Conversation with Big Celebrities
The following is a compilation of articles from this program for reading:
Q1 Metaverse, immersive experience, what is it?

He Zhan: How do you two understand the metaverse, and what does the metaverse look like in your mind?

Zhang Xintao: We think it is the next-generation Internet. In the future, all our necessities, food, housing, transportation, study and work will no longer be carried out on mobile phones and PCs. Instead, we will have a similar lightweight XR terminal and put all our business on the There.

He Zhan: I understand that it is a new evolution of the Internet, hoping to interact through some media such as XR or even the future brain-computer interface.

Lou Yanxin: I think for the metaverse, we have not fully reached the concept of a 100% metaverse, but are building a new generation of Internet together, and we are still contributing to it.

He Zhan: Yes, this ecology needs to be jointly built.

He Zhan: The word immersion is also very interesting. How do we understand this immersion?

Lou Yanxin: The word immersion itself is an adjective, and it refers to a feeling for everyone: users put themselves in a field, and then become a part of this field, AR, VR, etc., especially VR Everyone's feelings are more direct.

Zhang Xintao: Immersive, in addition to what Mr. Lou just said should have VR equipment and high-quality content, it also requires an interactive process. Interaction actually brings a huge challenge to cloud computing and the chip industry. It needs real-time feedback, that is, real-time calculation. For example, if the experiencer touches a flower in the virtual space, the flower will move, and if the corresponding gloves are worn, there will also be corresponding haptic feedback, which all need to be calculated in real time.

We need to use enough technical means to "deceive" the human brain, and then make the brain think that I am in this real world, and there is no way to distinguish between virtual space and real space. When the brain has no way to distinguish, immersion comes naturally.

He Zhan: Actually, I was fortunate to have just participated in Ali's U Design Week a few weeks ago, and many sessions (sharing or exhibition areas) in it introduced algorithm optimization and haptics in terms of vision. The technology of virtual-real interaction by wearing VR glasses and gloves, and the simulation of smell, that is, smell by wearing a collar.

For example, when we see the shot of the chocolate cake in the movie, the collar will smell of chocolate; a particularly smelly stable appears on the screen, and the smelly smell will also be synthesized in real time. The taste experience is to take a small spacer and insert it on the mobile phone, and then simulate the taste experience. But all sensory realizations are inseparable from one thing, which is computing.

Lou Yanxin: Different senses have different simulation costs. What we are doing now is actually more visual and auditory. Touch is more expensive, and there is no way to reduce it to the level that everyone can use. But all the work we do is actually restoring and simulating the senses.

He Zhan: Yes, you also mentioned hearing just now. I experienced hearing a while ago in a university. It is wearing a headset and using a channel generated by AI to string a piece of music, it sounds like it runs from the left ear to the right ear. Just now we discussed immersion, with sight, hearing, taste, smell, and touch. Let's think about it again, if these five senses are to be realized in real time, as Xintao said, it is not just something that a pair of glasses can support.

Lou Yanxin: Yes, there is a long way to go.

Q2 Share the immersive experience practice?

He Zhan: Has Alibaba Cloud made any relevant landing scenarios and applications recently? Let me share with you.

Zhang Xintao: We have recently launched a lot of projects. The most interesting one is the Metaverse concert (produced by Bizhen Technology). Whether an actor or a player is a role in this Metaverse space, the entire surrounding environment is not affected by it. Due to the limitation of physical space, there will also be many visual and sound special effects. Actors are also not limited by physics. Actors can become larger or smaller. This is also impossible to produce on the real stage. It is a new art form.

He Zhan: I understand that this is different from ordinary virtual concerts. It supports switching between different scenes, and then allows the audience and actors to interact in real time.

Lou Yanxin: The concert of the Metaverse that Teacher Xintao just talked about, we also did a concert some time ago. It is not a single person, but a band that drives the characters in the virtual space to perform for everyone in real time through motion capture. But what's special is that this concert can be watched in VR or flat, and the interaction method of each end is different.

One of the songs is in VR, which can fly in the whole space. The characters are in front of a black hole, and then the actors are also in front of the black hole. In the process of flying, color line special effects can also be drawn, etc. It is also considered a A part of a stage special effect. It's just that this stage special effect is not produced by the actors or the stage, but by the effective interaction of the audience.
He Zhan: Did you mean a mobile phone or tablet?

Lou Yanxin: Yes, then the entire performance is rendered in the cloud, and the rendering is completed and sent to the VR headset, as well as tablets and mobile phones. This performance is actually a part of our entire activity. We have newly developed a platform called "Daqian", which is a space aggregation platform that can perform various performances and exhibitions in a virtual space. We have also developed a fully cloud-based version. It is the entire space. Whether users go to participate in exhibitions, performances, etc., a series of activities can be entered through cloud rendering.

He Zhan: Is this platform still real-time?

Lou Yanxin: Yes, it is completely real-time, and it also uses NVIDIA cards.

He Zhan: How much concurrency can be achieved in the projects you have done now?

Lou Yanxin: If the network is concurrency in the traditional sense, it may reach the level of thousands of people.

He Zhan: I remember that at GTC at the end of last year, our technology conference, Lao Huang (NVIDIA founder Huang Renxun) personally showed his conversation with his virtual human. A digital avatar interacts in real time.

Lou Yanxin: We happened to have a film shortlisted for the Venice International Film Festival this year. It is at this time that Venice is showing. It is a theatrical performance project, and the performance space can be transformed into several different stages. During the whole process, the actors wore motion capture suits, and one actor performed to six audiences. But during the whole performance, the actors were in Paris and the audience was in Venice, so in fact it was presented through a transnational motion capture data transmission scheme, and the whole process was done with real-time computing.

Q3 Why must cloud computing be used for immersive experience?

He Zhan: I have a question for Xintao. Xintao is a cloud expert. What I want to ask is why is "immersive experience" closely related to cloud computing? How do you see this? Does the realization of "immersive experience" really require such high computing power?

Zhang Xintao: This is indeed very demanding on computing power. For example, I just mentioned that Lao Huang wants to talk to his own virtual human, which means that there needs to be a language AI behind it. The AI ​​of this language is very complicated. So far, there may be some very leading companies in the world, and their language models actually have many and various problems, and they need a huge computing cluster to do this. Any word I say to the virtual person and any answer the virtual person makes means that a lot of computing power will be mobilized later.

The other one is 3D rendering. If you want to achieve remote location, you have to find the corresponding computing node yourself, and also find the corresponding network transmission, and also reduce the delay of this network to a very low level, so that you cannot clearly feel it when interacting. To delay, there are many such challenges.

He Zhan: Especially for such a major event, it cannot allow problems.

Zhang Xintao: To be able to output such computing power stably seems to be relatively simple, but it is actually a very challenging thing. For example, when our own mobile phone freezes, our PC also fails, but the cloud does not allow such times. Like Alipay, the user may be paying in the hospital at this moment. If there is a malfunction at this time, the problem is very big. The other is the scale. Some concerts may be very hot, and you need to think of 20,000 or 30,000. When the cloud does this, because the cloud has a huge pool of computing resources, it can be given immediately.

He Zhan: Yes, the example of Alipay just mentioned actually touched me. It is really some small things in daily life. Without a stable computing power support, it will have an impact, which will become a big impact. Then I also want to ask Mr. Lou, why did you choose Alibaba Cloud in the process of your business?

Lou Yanxin: In the process of building the "Daqian" platform in the past, many of our plans have been based on the end. We have to consider whether 1080 or (other graphics cards) is the computing power standard. To put it bluntly, it is the level of your computer graphics card to plan this matter.

Later, after coming into contact with the cloud, and after we built the cloud version on the cloud, we found that we were finally relieved, and we no longer had to think about this problem. The current platform of Daqian has both cloud and end, that is, both ends can support access.

We also need to consider transnationality, because our actors may be in China, but they have to perform for overseas audiences, so we have to consider how nodes are deployed, and what kind of institutions can provide us with such capabilities, then look and see Only Alibaba Cloud can provide such capabilities, so we chose Alibaba Cloud from the very beginning.

He Zhan: To sum up, if there is no cloud, it is actually very painful to choose some standards as the support of computing power.

Lou Yanxin: Yes, it's really painful.

He Zhan: Just now you said that hundreds of people are concurrent, thousands of people are concurrent, and you mentioned tens of thousands of people. It can really reach tens of thousands of people. Now we can do it, right?

Zhang Xintao: A few years ago, a customer made an application, which basically achieved a leap in cloud computing, that is, more than 13,000 GPUs served an APP at the same time, and tens of millions of people logged in online at the same time to use an application. APP.

He Zhan: I estimate that only the Chinese market can have such a large concurrency.

Q4 What are the challenges that need to be overcome in the XR field?

He Zhan: Do you think there are other technologies that need to be improved in the field of XR or VR?

Zhang Xintao: In fact, this part of the challenge is still quite big. Our current computing power, communication ability, and computing scale are far from the level we just imagined. For example, if we want to be a very high-definition digital human now, we basically use NVIDIA's most powerful chips to calculate it, but it may not work. Then we may want to consider working with the engine company to see if it can be parallelized.

In the AI ​​part, you will find that our current large-scale language model, many AICG or the ability to recognize human micro-expressions, are still in a situation of weak artificial intelligence. The virtual human has a very low IQ, so the user must not feel immersed, because they will subconsciously think that this is a machine, right? But if it has a high IQ, it can recognize your expressions, and it can understand your emotions, then users will think it is really a virtual person at this time. We believe that computing, including communication, various algorithms, some things still need theoretical breakthroughs.

Lou Yanxin: How to obtain computing power stably and at low cost is always a difficult thing. Because the chips in the current VR equipment are actually all-in-one machines that are commonly used, whether it is Pico or META, it is a mobile ARM chip, which is far from 1080, and may not be able to return to 6600.

The computing power of consumer-grade VR equipment used by the public is very limited, but what we want to do far exceeds its computing power. We want to make very gorgeous and interesting scenes, but there is no way to provide it to everyone. Therefore, the cloud can indeed help everyone, and can give this ability. But how to allow ordinary audience consumers to obtain this computing power stably at a low cost is indeed an aspect that requires joint efforts.

Another point I want to talk about is the issue of interoperability and interoperability. Now in the direction of VR, in fact, many of the things you do are isolated islands of information. We are working on the platform of "Daqian", which is actually a virtual space created by a variety of different people. We have to consider this matter alone. What is the format and what is the interface.

I think that in the future, at the asset format level, everyone may gradually embrace USD (universal scene description). But at the same time, USD is not enough, because USD is actually a description of assets, we also need to have logic, and there is also how users want to play and interact in the engine. These logics are not regulated by USD. I believe that everyone is discussing this issue when participating in the "Metaverse Standard Forum", that is, how can we all jointly build an interconnected network structure of the Metaverse, where assets can circulate with each other, and information can circulate with each other.

He Zhan: Yes, I heard the demand for computing power just now. In fact, our CloudXR is also cooperating with Alibaba Cloud. In addition, the open standard you mentioned is actually the first batch to join this standard. There are a total of 36 companies. Everyone has been discussing it. There are data format standards, scene description standards, material definition standards, and some called The standards of the digital economic system are all participating in customization. This is really difficult. It is such a thing that everyone can communicate with each other.

Lou Yanxin: Yes, I think we are a bit back to the time of web 1.0, when everyone started to build a new network architecture together at the same time.

He Zhan: Yes, so we are now describing that USD is the HTML of the next-generation Internet or the next-generation metaverse.

Lou Yanxin: Yes, the exchange format.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us