Xueba's computing system

Chapter 25 Difficulties in Data Collection

It was already very hot walking on the road in Zijin City during the day in early August, so Lin Yuan chose to get up early in the morning.

He did not choose to take the subway, but rode his electric donkey, because building data channels for the system would require him to visit every business in the area, and he did not want to do it on his own.

The area that Liu Huzi managed covered the most prosperous area in the entire HX District. There were countless office buildings here, and the Zijin branch of Haotuan Takeaway was also nearby.

Lin Yuan applied to the company for an opportunity to test delivery outside the company, because some updates to the food delivery platform still need to be tested by a dedicated person on the spot before they go online. Generally speaking, programmers are not allowed to go out in person, but in order to go out during work hours, Lin Yuan took the initiative to take over this task.

Lin Yuan was not clear about the process of building data channels for merchants.

For this purpose, he deliberately did not eat breakfast, and then found a breakfast shop early in the morning and sat down to observe carefully.

According to previous experience, the computing system will have relevant prompts when it completes data loading. However, after Lin Yuan sat in this breakfast shop and almost finished eating two buns, the system still did not respond.

In the end, the system did not give any response until he finished drinking the soy milk and walked out the door.

[Can you only load the data that has already been collected? Can't you actively organize and obtain it based on the location? ]

[As a system, you should at least have some dignity. 】

Even though Lin Yuan kept complaining to the system, the system was pretending to be dead and there would be no response.

On the main road early in the morning, Lin Yuan just stood by the roadside, watching the cars coming and going, feeling a little lost.

If the data path cannot be built for merchants, then it is naturally impossible to build the same for riders. In this way, the two most critical points in the entire food delivery path and delivery problem - merchants and riders - will be completely disconnected from the algorithm.

What AI, what artificial intelligence, what chatGPT. No matter how loud the name is, no matter how high the level is. In the end, they all come down to one thing - data-driven.

No matter how powerful an AI model is, it is driven by data. Data is the source of everything. This is also true for computing systems.

Data represents direction and destination. Without it, even a luxury car worth tens of millions would not know where to go.

If the data channel construction method envisioned by Lin Yuan is not feasible, then the trouble is not limited to the food delivery algorithm optimization project in front of him. The bigger trouble comes from the way this computing power system is used.

The computing system can automatically complete data collection under simple guidance, rather than collecting data and then throwing it to the computing system. This is like the difference between autonomous driving and manual driving.

There is a huge difference here.

Just like you cannot be distracted by doing other things when driving manually, if data must be collected manually and then sent to the computing system, Lin Yuan will spend a lot of time in the future to deal with such data collection problems.

The further problem is that if we take the example of automatic driving and manual driving, if the purpose of the car is to take people to a certain place, then the difference between the two is whether the people in the car can be distracted. But what if the purpose itself is not to take people but to drive the car to a certain place?

In other words, if the purpose of driving is to get the car from one place to another, then there will be a world of difference between autonomous driving and manual driving.

Because if it is self-driving, people don't need to be in the car. People only need to set the destination for the car and then leave it alone. One person can handle thousands of cars. But manual driving is not possible. One person can only handle one car.

What is this called?

This is called the underlying principles affecting the upper-level applications.

The advantages of the underlying principles often make an exponential difference when fed back to the upper-level applications. This is the case with data collection.

Manual collection is like manual driving, one person can only handle one data node. But with automatic collection, one person can handle N data nodes.

If Lin Yuan was really asked to collect data manually, then the food delivery algorithm optimization project would not be necessary. This is because he could not squat at the doorsteps of all the merchants every day, or sit on the back seat of all the riders' electric donkeys, constantly recording the data they generated.

This is the law of science. When you only see one car, you don't think there is much difference between automatic driving and manual driving. But when you expand your vision to include countless cars, the huge difference becomes apparent.

This is also one of the reasons why so many big technology companies are willing to spend huge amounts of money to bet on autonomous driving.

But this is beside the point. Lin Yuan was standing in the morning breeze. The gradually rising temperature in the air was like his slowly burning mood.

After truly embarking on the path of IT, Lin Yuan gradually came to two greatest realizations.

One is to habitually explore problems and grasp their essence when encountering them. The other is to truly understand the importance of direction.

These two points are not empty words.

Lin Yuan did not despair because of the loss. He began to carefully analyze the characteristics of the system data path, trying to grasp the essence of the problem.

The computing system can easily obtain the takeaway data that has been collected and exported by Haotuan Company from the background, and is not sensitive to the total amount of data, and can quickly load even large data. In other words, the system is more concerned about the form of the data.

The collected takeout data is not in the vector form that can ultimately be executed by the AI model.

The takeaway data is generally like this: in a certain month of a certain year, Zhang San received an order (number: order123) at place A, then went to place B where the merchant was located, spent a certain amount of time waiting for the meal, and then took a certain route and time to deliver the meal to place C where the customer was located.

It is impossible to directly feed such data into the real-world AI model for calculation. The fucking AI refers to the thing that AI ultimately produces, not that the thing that produces AI is itself an AI.

This is contrary to common sense - AI is actually an algorithm, and AI algorithms are produced, but the production process is not AI at all.

This is like pouring manure on the fruits and vegetables in the field, and delicious fruits and vegetables will grow in the field. The fruits and vegetables are delicious, but the stuff poured on them is obviously not edible.

However, this is only true for AI models in the real world. This is not the case with computing systems, which can directly load these unprocessed data for calculation.

Before real-world AI models perform calculations, they usually process the takeout data into matrix vectors.

The AI model is cold and doesn’t care what the data you throw at it means. In its eyes, it is just a matrix vector. So the takeaway data needs to be converted into cold numbers like [-1, 23, 321, .].

These numbers represent real food delivery data. For example, if a food delivery order is delivered on a sunny day, a parameter in the matrix vector may be represented by the number "1", and then the number "0" may be used to represent a cloudy day.

But the computing system is different. Lin Yuan has tested it before. The takeaway data does not need to be preprocessed at all. It can be processed by directly loading it into the system. It seems that the system itself can perform data preprocessing.

This is consistent with the nature of the system - after all, the system is like a living computer that can change its hardware parameters on demand.

So Lin Yuan naturally thought of finding a breakthrough from this point.

(End of this chapter)

Prev Index Next

Tap the screen to use advanced tools Tip: You can use left and right keyboard keys to browse between chapters.