Without massive computing power, the metaverse is just a mirage

Introduction: On September 26, 2022, the immersive interview program "Breaking up the" Metauniverse "was broadcast simultaneously in several official channels, such as Alibaba Cloud developer community&Alibaba Cloud developer video number&Alibaba Cloud Watch number. He Zhan, the business leader of NVIDIA China Omniverse, Lou Yanxin, the founder of SANE Technology, and Zhang Xintao, the Alibaba Cloud elastic computing product expert, shared the industry understanding, landing cases, bottleneck challenges, and so on.

Pioneer of the digital world × Technological artists × The living fossil of cloud computing, what views can the three giants come up with? Click the video below to watch the program feature.


Video: Breaking up the "Metauniverse" | Big Guy Dialogue

The following are the articles of this program for reading:

Q1 What is the meta universe and immersive experience?

He Zhan: How do you understand the metauniverse? What is the metauniverse in your mind?

Zhang Xintao: We think it is the next generation of Internet. In the future, all our food, clothing, housing, transportation, study and work will no longer be carried out on mobile phones and PCs. Instead, we will have a similar lightweight XR terminal and put all our businesses there.

He Zhan: I understand that it is a new evolution of the Internet. I hope to interact through some media such as XR and even the future brain-computer interface.

Lou Yanxin: I think we have not yet fully reached the concept of a 100% universe for the metauniverse. Instead, we are jointly building a new generation of Internet, and we are still building blocks for it.

He Zhan: Yes, this ecosystem needs to be built together.

He Zhan: The word immersion is also very interesting. How do we understand this immersion?

Lou Yanxin: Immersion itself is an adjective, which means to give everyone a feeling: users put themselves in a field, and then become a part of the field. AR, VR, etc., especially VR, give you a more direct feeling.

Zhang Xintao: Immersive, in addition to the VR equipment and high-quality content that the building said just now, it also needs an interactive process. Interaction actually brings a huge challenge to the cloud computing and chip industry. It needs real-time feedback, that is, real-time calculation. For example, if the experimenter touches a flower in the virtual space, the flower will move, wear the corresponding gloves, and there will also be corresponding tactile feedback, which requires real-time calculation.

We need to use enough technical means to "cheat" the human brain, and then let the brain think that I am in the real world, and there is no way to distinguish between virtual space and real space. When the brain has no way to distinguish, immersion will naturally occur.

He Zhan: In fact, I was lucky to have just participated in Alibaba's U Design Week a few weeks ago, where many sessions (sharing or exhibition areas) introduced algorithm optimization and haptics in vision. The technology of virtual and real interaction through wearing VR glasses and gloves, and the simulation of smell by wearing a collar.

For example, when we see the scene of chocolate cake in the movie, the collar will emit the flavor of chocolate; A particularly smelly stable appears in the picture, and the smelly smell will be synthesized synchronously. It is real-time. The taste experience is to take a small pad and insert it on the phone, and then simulate the taste feeling. But all sensory realization cannot be separated from one thing, namely, calculation.

Lou Yanxin: Different senses have different simulation costs. What we are doing now is actually more visual and auditory. Touch is more expensive, and there is no way to reduce it to the extent that everyone can use it. But all the work we do is to restore and simulate the senses.

He Zhan: Yes, you just mentioned that there is also hearing. I experienced hearing in a university some time ago. It is wearing a headset and using a channel generated by AI to string a piece of music. It sounds like running from the left ear to the right ear. Just now we discussed immersion, including vision, hearing, taste, smell and touch. Let's think again. If these five senses are to be realized in real time, as Xintao said, it is not just a pair of glasses that can support.

Lou Yanxin: Yes, there is a long way to go.

Q2 Share sinking experience practice?

He Zhan: Do Alibaba Cloud have any relevant landing scenarios and applications recently? Let me share with you.

Zhang Xintao: We have a lot of landing projects recently. The most interesting one is the metauniverse concert (made by Bizhen Technology). Both actors and players are a role in this metauniverse space. The whole environment around is not limited by physical space, and will also produce many visual and sound effects. Actors are also not limited by physics. Actors can become bigger and smaller. This is also a new art form that cannot be produced on the real stage.

He Zhan: I understand that this is different from the ordinary virtual concert. It supports switching different scenes, and then allows the audience and actors to interact in real time.

Lou Yanxin: We also did a concert of the Yuan Universe that Mr. Xintao said just now. It is not a single person, but a band that drives the characters in the virtual space to perform for everyone in real time by means of motion capture. But what is more special is that this concert can be viewed in both VR and plane ways, and the interaction mode of each end is different.

There is a song in VR that can fly in the whole space. The character is in front of a black hole, and then the actor is also in front of the black hole. During the flight, you can draw colored lines and special effects. It is also part of a stage special effect. However, this stage effect is not produced by actors or the stage, but by the effective interaction of the audience.

He Zhan: The plane you just referred to is a mobile phone or tablet?

Lou Yanxin: Yes, and then the whole performance is rendered in the cloud. After rendering, it is distributed to the VR head display terminal, tablet and mobile phone terminal. This performance is actually a part of our whole activity. Our newly developed platform called "Daqian" is a space aggregation platform that can perform various performances and exhibition forms in virtual space. We have also developed a completely cloud-based version. It is the whole space. Whether users go to exhibitions, performances, etc., a series of activities can be accessed through cloud rendering.

He Zhan: Is this platform still real-time?

Lou Yanxin: Yes, it is completely real-time, and NVIDIA card is also used.

He Zhan: How many concurrent projects can you do now?

Lou Yanxin: If the network is concurrent in the traditional sense, it may reach thousands of people.

He Zhan: I remember that at the end of last year's GTC was our technology conference, Lao Huang (Huang Renxun, founder of NVIDIA) personally showed his dialogue with his virtual person. It should be my first time to see a real person interact with a digital image in real time.

Lou Yanxin: We have a work this year that is also nominated for the Venice International Film Festival. It is at this time that Venice is showing. It is a theatrical performance project. The space of performance can be transformed by several different stages. During the whole process, the actors were dressed in dynamic arrest suits, and one actor performed to six audiences. But during the whole performance, the actors are in Paris and the audience is in Venice, so in fact, it is presented by a transnational motion capture data transmission scheme, and the whole process is also done by real-time calculation.

Q3 Why must cloud computing be used for immersive experience?

He Zhan: I have a question for Xintao, who is a cloud expert. What I want to ask is why "immersive experience" is strongly related to cloud computing? How do you see this? Is the realization of "immersive experience" really so high on computational power?

Zhang Xintao: This really requires very high computational power. For example, when I just mentioned that Lao Huang wants to talk with his own virtual person, it means that there needs to be a set of language AI behind it. This set of language AI is very complex. So far, there may be many kinds of problems in the language models of the world's leading enterprises, which need a huge computing cluster to do this. Any word I say to the virtual person and any answer the virtual person makes means that a lot of computing power needs to be mobilized later.

The other is 3D rendering. If you want to realize remote rendering, you must find the corresponding computing node, find the corresponding network transmission, and reduce the latency of this network to a very low level. You can't obviously feel the latency when interacting. There are many challenges.

He Zhan: Especially for such major activities, it cannot allow problems.

Zhang Xintao: It seems simple to be able to stably output such computing power, but it is actually a very challenging thing. For example, when our mobile phone crashes and our PC fails, the cloud does not allow such a time. Like Alipay, maybe the user is in the hospital at the moment to pay. If there is a failure at this time, the problem is very big. The other is the scale. Some concerts may be very hot. You need to think of 20000 and 30000. When the cloud does this, because it has a huge pool of computing resources, it can be given immediately.

He Zhan: Yes, the example of Alipay mentioned just now really touched me. It's really a small thing in daily life. Without a stable computing support, it will have an impact and become a big one. Then I also want to ask the General Manager downstairs, why do you choose Alibaba Cloud in the process of your business?

Lou Yanxin: In the past, in the process of building the "Daqian" platform, many of our plans were based on the end. We need to consider whether 1080 or (other graphics cards) is the standard of computing power. To put it plainly, what level of your computer graphics card is to plan this matter.

Later, after we came into contact with the cloud and built the cloud version on the cloud, we found that we were relieved at last. The current platform has both cloud and end, that is, both ends can support access.

We also need to consider transnational, because our actors may be in China, but we need to play for audiences overseas, so we need to consider how to deploy nodes and what kind of institutions can provide us with such capabilities. It seems that only Alibaba Cloud can provide such capabilities, so we chose Alibaba Cloud from the beginning.

He Zhan: Let me conclude that if there is no cloud, it would be painful to choose some standards as the support of computing power.

Lou Yanxin: Yes, it's really painful.

He Zhan: Just now you said that hundreds of people are concurrent, thousands of people are concurrent, and tens of thousands of people are concurrent. We can really achieve tens of thousands of concurrent, right?

Zhang Xintao: A few years ago, one of our customers made an application, which basically achieved a leap in cloud computing. That is, more than 13000 GPUs serve one APP at the same time, and tens of millions of people log in to use one APP at the same time.

He Zhan: I guess only the Chinese market can have such a large amount of concurrency.

What are the challenges that need to be solved in Q4 XR?

He Zhan: What technologies do you think need to be improved in XR or VR?

Zhang Xintao: In fact, this part of the challenge is still quite big. Our current computing capacity, communication capacity and computing scale are far from the level we just imagined. For example, if we want to be a very high-definition digital person now, we may not be able to take NVIDIA's most powerful chip as an example. Then we may want to consider working with engine companies to try to make it parallel.

In the AI part, you will find that our current large language models, many AICG or the ability to recognize human micro-expression, are still in a weak AI situation. Virtual people have low IQ, so users must have no sense of immersion, because they will subconsciously think that this is a machine, right? But if it has a high IQ, can recognize your expression, and can understand your emotions, then users will think it is really a virtual person at this time. We believe that computing, including communication and various algorithms, still needs theoretical breakthroughs.

Lou Yanxin: It is always difficult to obtain computational power stably and cheaply. Because the chips in the current VR device are still the all-in-one machines commonly used by everyone. Whether Pico or META, the chips in it are mobile ARM chips, which are far from 1080 and may not reach 6600.

The computing power of the consumer-grade VR device used by the general public is very limited, but what we want to do far exceeds its computing power. We want to do very gorgeous and interesting scenes, but we can't provide them to you. So cloud can really help you and give you this ability. However, how to let ordinary audience consumers obtain this kind of computing power at low cost and stably is really an aspect that needs joint efforts.

Another point I want to talk about is interoperability. In the direction of VR, many of the things we do are information islands. We are working on the "Daqian" platform, which is actually the aggregation of virtual spaces created by different people. For this matter alone, we should consider what the format is and what the interface is.

I think that in the future, at the level of asset format, we may gradually embrace USD (general scenario description). But at the same time, USD is not enough, because USD is actually a description of assets, and we need to have logic. There are also how users play and interact in the engine, which is not regulated by USD. I believe that when you participate in the "Meta Universe Standard Forum", you are all discussing this issue, that is, how we can jointly build an interconnected network architecture of the Meta Universe, where assets can be circulated and information can be circulated.

He Zhan: Yes, after hearing the demand for computing power just now, our CloudXR is also cooperating with Alibaba Cloud. In addition, we are the first batch to join this open standard. A total of 36 enterprises have been discussing the standards of data format, scene description, material definition, and some standards called digital economic system. They are all involved in customization. It is really difficult to develop a thing that everyone can communicate with.

Lou Yanxin: Yes, I think it's time for us to go back to web1.0 and start building a new network architecture together.

He Zhan: Yes, so now we are also describing that USD is the HTML of the next generation of Internet or the next generation of metauniverse.

Lou Yanxin: Yes, interchange format.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us