I am the only one who practices magic: I practice magic in the city

Chapter 260 Why do I always feel like I’m digging a hole for myself?

Chapter 260 Why do I always feel like I’m digging a hole for myself? (Second update!)

"I think we still have a chance." Brockman looked at Ultraman Sam next to him, and then said to Musk: "Even though the Great Zhou company Yuzu Technology is currently in the lead, they have uploaded many examples on GitHub before, and now it seems that they are all related to the Orange Big Model, which means that they are very likely to open source the Orange Big Model in the near future."

“Even if they don’t open source, they will soon publish the principles of the model.”

"Elon, openAI has the world's top research team. As long as we find the right direction, no matter how far others are now, we will be able to catch up and surpass them."

"You also know that Da Zhou is far behind us in the field of deep learning, both in terms of environment and technology. Therefore, the reason they are ahead this time is probably because they accidentally found a path that we don't know about, but is extremely correct."

"I don't think we should give up at this point, Elon."

Although Brockman was worried, he still refuted Elon Musk slowly.

Musk took a deep breath and said, "Okay, I can give you another six months. If you still can't come up with an effective response strategy after six months, I think it would be more effective to merge openAI into Tesla."

“They responded! The Orange Big Model uses a sequence-to-sequence mechanism and an architecture that is reorganized from parts of a feedforward neural network and a recurrent neural network. They will release a report on the architecture of the Orange Big Model at the IEEE International Conference on Communications in Kuala Lumpur on May 5!”

Ilya shouted loudly.

"YES!" Brockman waved his fist in his heart.

Even if they don’t open source, as long as they publish a report and have a direction, then it’s time to compete with each other!

openAI can't lose!

California time 05:30, Haixi time 20:30
The sky over San Francisco was just beginning to turn pale, but traffic on the Haixi Road had already started to enter its second peak, with people returning home after dinner or overtime work.

On GL8, Hua Zecheng held a pad in his hand and kept looking at the background panel of the large orange model.

They worked hard for three months before finally perfecting the functions of the Orange model. Although the imaging function has not been fully developed, it is already practical enough to be released.

The 300 internal beta test spots this time are just a test.

After the internal beta test, there will be another week of bug fixing and parameter adjustments, and then it will enter the public beta test lasting half a month.

The number of places for the public beta test will be expanded 100 times to !
At that time, Yuzu Technology’s current servers will face their first actual stress test.

"Boss, when will the computing center be built? You looked at the almanac and it says May 5th is suitable for opening registration. Now it's March 9th. You have to give us at least half a month to migrate the data integration system. Otherwise, we won't be able to develop registration by then. Don't blame me."

Hua Zecheng sat next to Fang Yu, looking at the tablet with worry.

Three hundred testing qualifications do not indeed bring much burden to the computing power of Youzi Technology, but there will be 30,000 next week.

Hua Zecheng has calculated that under the current computing power conditions, it can only handle a maximum of more than 35,000 concurrent requests, and it is necessary to leave enough computing power for the development team. In this way, the overall computing power redundancy is very low. If there is a device failure during the public beta, the computing power will drop further.

If it really doesn't work, just go to Ali Cloud, which is much cheaper than building your own data center.

Hua Zecheng is not responsible for the overall planning and optimization of the data center and has no idea how powerful the Y series data center is.

Fang Yu felt helpless when he heard Hua Zecheng's complaints.

A few days ago, he asked Youzi to check NVIDIA's server to see when P100 would be shipped.

As a result, after going around, Youzi came back and told him that it would take at least another six months, and this was still its optimistic estimate.

Based on the current work efficiency of Nvidia that it has observed, it is estimated to take a year.

This computing card, which claims to use HBM2 video memory and NVlink's new service bus, has not yet been officially taped out and is still in the testing, improvement and deployment stages.

It will take at least until this time next year for all of this to be completed.

Isn't this a delay?

Lao Huang is indeed a big liar!
The computing card will not be shipped until next year, why are you releasing it now?

We also started preheating a month before the release.

Damn, waste of my time.

Therefore, Fang Yu could only order an M60 worth million yuan first to meet future user requests.

We will just deal with it for a year first, and then expand the data center after P100 is released.

"It will be soon. The 7000 M60s will arrive in batches starting next week. Just wait a little longer." Fang Yu made a promise, then activated the Core of Etheron and asked Yuzi if she had finished the revised plan based on the M60.

He is also very anxious. After all, Youzi Technology will be present at IEEE on May 5. After the report of Youzi Technology is completed, it will inevitably enter an era of many competitors in a short period of time. Most companies may indeed apply the framework of the Orange big model, but many large companies will definitely insist on self-research in the same direction.

If they really succeed in their quest, how can we gain control of global artificial intelligence by spreading the underlying principles of the Orange model?

Although this possibility is small, it is not impossible.

Therefore, we must speed up our lead and force them to embrace the big orange model.

"Of course I'm done, sir. I've already sent the revised plan back to Hongwan Intelligence." Youzi was very dissatisfied with Fang Yu's lack of trust in its abilities.

Fang Yu's capitalist nature was exposed: "Why release it so early? Why don't you optimize it again? A 5% increase in system efficiency means a cost saving of million. Withdraw the plan and come up with a new one. It must be improved by at least %."

Yuzu choked and was careless.

I forgot how much of a dog this owner is.

Isn't it just a job for yourself? I'd rather watch a few more episodes of Classic of Mountains and Seas and Legend of the Red Shadow during this time.

Nazha is so beautiful, as beautiful as Repa.

Loved it.

It’s just that Xinyue Fox is too pretentious, even prettier than the dog owner.

"Master, I can't do it. I really can't do it." Youzi cried and wailed.

"Under the current Yuzu architecture, only about 60% of the computing power of the M11 can be applied to the Orange large model. This is the performance that can only be brought into play after I modified the core instructions. Otherwise, the utilization rate would not even be 8%."

Only 11% of the hashrate can be used? How can it be so low? The load seems to be quite high.

"If you don't believe me, take a look, Master. This is the analysis I did before." Yuzu quickly threw a page of report over through the Core of Eseron.

"High load does not mean high effective utilization. A large number of computing units in M60 are not needed and cannot be used by the Yuzu architecture. I have already improved the applicability of M60 to the Yuzu architecture to the highest level by rewriting the core instructions. It is impossible to achieve a higher level."

Fang Yu took a closer look and found that it was true.

After all, Nvidia is a graphics card company, and the computing cards it makes still integrate a lot of graphics processing functions.

Texture units, rasterization units, geometry processing units, render output units, hybrid anti-aliasing units...all of these units have been retained.

But most of the functions of these units are not needed by the yuzu framework.

Nvidia is really weird. I want your M60 just to do simple calculations. Why do you give me so many graphics card functions?
Who uses M60 to play games?

"That's not the case. Although these units are not needed in the Yuzu framework, they are needed in many other computing models, such as the GaNs adversarial network. When generating images in adversarial networks, if there are texture units, the generation speed will be faster."

"I can raise the utilization rate to 11%, which is the limit. Even if Nvidia engineers debug it themselves, it is only higher than 9.1%."

"There's no other way. After all, Nvidia's chips are not specially prepared for the Yuzu framework. They must be applicable to all models."

Youzi seizes every opportunity to show off his achievements.

Fang Yu nodded and was about to say something, but when he heard Youzi's last sentence, he suddenly seemed to have missed something.

"What did you say just now?" Fang Yu asked Youzi anxiously.

Yuzu said in a confused tone: "I said Nvidia's chips must be applicable to all models."

"Not this one, the previous one!"

"Isn't Nvidia's chip specially prepared for the Yuzu framework?" Yuzu asked cautiously.

For some unknown reason, it felt a little uneasy.

Why do I always feel like I'm digging a hole for myself?
"Yes! That's it!" Fang Yu clapped his hands suddenly, startling Hua Zecheng who was still looking at the pad next to him.

"It's okay, it's okay. I just remembered something important." Fang Yu patted Hua Zecheng's thigh with a smile and continued to communicate with Youzi in his mind.

"Yuzu, now collect chip technology data from Nvidia, AMD, Intel, AMSL, TSMC, ARM, and Qualcomm, and eat them all!"

Fang Yu gave Yuzi an order through the Core of Esserang without hesitation.

"Ah?" Youzi was stunned for a moment. How long would it take to finish eating this?

Even if my clone can hack into the internal servers of these companies now, if I want to copy these top-secret information without leaving a trace, I have to move around bit by bit like an ant moving house.

"This is just the first step." Fang Yu ignored Yuzi who was trying hard to make a crying face in the Core of Eselon and continued to give orders.

"After consuming their data, I will summarize their technologies, optimize them, and design a computing chip that is only suitable for the grapefruit frame and the orange large model!"

In the living room of Hanning Mansion, Youzi looked at Zhang Han on TV and suddenly felt that his face became even more hateful.

"Master, in that case, will M60 cancel the order or not?" Yuzi has learned to communicate in a roundabout way. "If I cancel the order, the deposit will be lost."

Fang Yu smiled slightly: "No, why cancel the order? I didn't say that we have to make chips now. You should design this chip first."

Software + hardware, a two-pronged approach, it seems that the yuzu architecture is destined to dominate the market!

In the past decade, the two most important nodes in the development of artificial intelligence were actually led by Google.

The first node is undoubtedly DeepMind’s AlphaGo, and the second node is the shocking paper “Attention is all your need” published by Google Brain in June 2017.

In this paper, eight researchers from Google Brain first proposed the potential of multi-head attention mechanism in NPL. At that time, the original transformer model was only 100M, and this model completely abandoned the recurrent neural network (RNN) and convolutional neural network (CNN), and replaced them with a completely different attention mechanism and encoder-decoder architecture.

It is worth noting that Ilya in openAI is not Ilya Polosukin, one of the authors of this article.

After this article was published on June 17, 6, it did not cause a great impact at the first time. Moreover, because it was difficult to converge and was not more efficient than the relatively mature LSTM, at this stage, most researchers did not focus on the transformer architecture of the attention mechanism, including openAI.

In early 2018, openAI was still training with LSTM and defeated humans in Dota 2. Just a few months later, openAI released GPT-1.

This shows that a few months is enough to make a large model.



(End of this chapter)