1. Data is not too big and contains wisdom
The big data at the beginning is not big. How much data did you have? Now everyone is going to read e-books and watch the news online. In the post-80s, when we were young, the amount of information was not so big. Just look at books and look at newspapers. How many words do you have in a week’s newspapers? If you are not in a big city, there is not a few bookshelves in the library of an ordinary school. Later, with the arrival of information technology, information will become more and more.
First, let’s take a look at the data in big data. It is divided into three types, one is structured data, the other is unstructured data, and the other is semi-structured data. .
Structured data: there is a fixed format and a limited length of data. For example, the completed form is structured data, nationality: People’s Republic of China, nationality: Han, gender: male, which is called structured data.
Unstructured data: More and more unstructured data is now, data that is indefinitely long and has no fixed format, such as web pages, sometimes very long, sometimes a few words are gone. For example, voice and video are unstructured data.
Semi-structured data: It’s in some XML or HTML format. It doesn’t matter if you don’t engage in technology, but it doesn’t matter.
In fact, the data itself is not useful and must be processed. For example, if you run a bracelet every day, it is also the data collected. So many web pages on the Internet are also data. We call it Data. The data itself is of no use, but the data contains a very important thing called Information.
The data is very messy, and it can be called information after combing and cleaning. Information will contain many rules. We need to summarize the rules from the information, called knowledge, and knowledge changes fate. The information is a lot, but some people see that the information is equivalent to white, but some people have seen the future of e-commerce from the information, some people have seen the future of the live broadcast, so people will be cattle. If you don’t extract knowledge from the information, you can only see a circle of friends in the Internet.
With knowledge, and then use this knowledge to apply to actual combat, some people will do very well, this thing is called Intelligence. Knowledge is not necessarily wise. For example, many scholars are very knowledgeable. What has happened can be analyzed from all angles. But when it is done, it can’t be transformed into wisdom. The reason why many entrepreneurs are great is to apply the knowledge gained to practice and finally do a lot of business.
The application of data is divided into four steps: data, information, knowledge, and wisdom.
The final stage is what many businesses want. You see that I have collected so much data, can I use this data to help me make the next decision and improve my product. For example, when a user watches a video, an advertisement pops up next to it, which is exactly what he wants to buy; when the user listens to music, he also recommends other music that he really wants to listen to.
Users just click on the mouse on my app or website. Entering text is data for me. I just want to extract some of them, guide practice, form wisdom, and let users fall into I can’t extricate myself in my application. I don’t want to leave when I go to my network. I keep buying my hands and keep buying.
A lot of people say that I want to break the net for the double eleven. My wife is constantly buying and buying on it. I bought A and recommended B. My wife said, “Oh, B is what I like.” My husband, I want to buy.” You said how this program is so ox, so smart, I know my wife better than me, how is this thing done?
2. How data is sublimated into wisdom
The processing of the data is divided into several steps, and it will be wise at the end.
The first step is called data collection. There must be data first, and there are two ways to collect the data:
The first way is to take it. The professional point is called crawling or crawling. For example, the search engine does this: it downloads all the information on the Internet to its data center, and then you can search it out. For example, when you go to search, the result will be a list. Why is this list in the search engine company? It’s because he has taken the data down, but if you click on it, the website is not in the search engine. For example, Sina has a news, you use Baidu to search out, when you don’t order, that page is in the Baidu data center, a little out of the page is in the Sina data center.
The second way is to push, there are many terminals that can help me collect data. For example, the Xiaomi bracelet can upload your daily running data, heartbeat data, and sleep data to the data center.
The second step is the transfer of data. It is usually done in a queue, because the amount of data is really too large, and the data must be processed to be useful. Can be handled systematically, but had to queue up and deal with it slowly.
The third step is the storage of data. Now the data is money, and mastering the data is equivalent to mastering the money. Otherwise, how does the website know what you want to buy? Just because it has data on your historical transactions, this information can not be given to others, it is very valuable, so it needs to be stored.
The fourth step is the processing and analysis of the data. The data stored above is the original data, the original data is mostly chaotic, there is a lot of junk data in it, so it needs to be cleaned and filtered to get some high quality data. For high-quality data, you can analyze it to classify the data, or discover the relationship between the data and get the knowledge.
For example, the story of the rumored Wal-Mart beer and diapers is based on the analysis of people’s purchase data. It is found that when men buy diapers, they will buy beer at the same time, thus discovering beer and diapers. The relationship between the two, the acquisition of knowledge, and then applied to practice, the beer and diaper counters are very close, and gained wisdom.
The fifth step is to retrieve and mine data. Search is search, the so-called foreign affairs is not determined to ask Google, and it is not necessary to ask Baidu. Both the internal and external search engines put the analyzed data into the search engine, so when people want to find information, they will have a search.
The other is mining. Just searching out can no longer satisfy people’s requirements. It is also necessary to dig out the relationship from the information. For example, in financial search, when searching for a company’s stock, should the company’s executives be excavated? If you only searched out the company’s stock and found it to be particularly good, then you went to buy it. In fact, its executive issued a statement that was very unfavorable to the stock and fell the next day. Doesn’t it harm the majority of investors? Therefore, it is very important to mine the relationships in the data through various algorithms to form a knowledge base.
3. In the era of big data, everyone collects firewood high.
When the amount of data is small, few machines can solve it. Slowly, when the amount of data is getting bigger and bigger, and the most cattle servers can’t solve the problem, what should I do? At this time, it is necessary to aggregate the power of multiple machines. Everyone works together to get the matter together.
For data collection: As far as IoT is concerned, thousands of detection devices are deployed outside, collecting a large amount of data such as temperature, humidity, monitoring, power, etc.; In other words, all the web pages of the entire Internet need to be downloaded. Obviously, one machine can’t do it. It needs multiple machines to form a network crawler system. Each machine downloads a part and works at the same time to download a large number of web pages in a limited time.
For the transmission of data: a queue in memory will be smashed by a large amount of data, so a distributed queue based on the hard disk is generated, so that the queue can be transmitted simultaneously by multiple machines, with your data. The amount is large, as long as my queue is enough, the pipeline is thick enough to be able to hold it.
For the storage of data: the file system of a machine is definitely not put down, so you need a large distributed file system to do this, and make the hard disk of multiple machines into one big one. File system.
Analysis of data: It may be necessary to decompose, count, and summarize a large amount of data. A machine may not be able to handle it. So there is a distributed computing method, which divides a large amount of data into small parts, each machine processes a small portion, and multiple machines process in parallel, which can be completed quickly. For example, the famous Terasort sorts data of 1 TB, which is equivalent to 1000G. If it is processed by a single machine, it will take several hours, but parallel processing will be completed in 209 seconds.
So what is called big data? To put it bluntly, it’s just that one machine can’t finish, everyone is doing it together. However, as the amount of data grows larger, many small companies need to process quite a lot of data. What can these small companies do without so many machines?
4. Big data needs cloud computing, cloud computing needs big data
Speaking of it, everyone thinks about cloud computing. When you want to do these things, you need a lot of machines to do it. It really depends on when and when you want it.
For example, the financial situation of big data analysis company may be analyzed once a week. If you want to put this hundred machines or one thousand machines in there, it is very wasteful to use once a week. When you can count, do you take out the thousands of machines; when it’s not, let the thousands of machines do other things?
Who can do this? Only cloud computing can provide resource layer flexibility for big data operations. Cloud computing also deploys big data on its PaaS platform as a very, very important general-purpose application. Because the big data platform can make multiple machines do one thing together, this thing is not something that ordinary people can develop, nor is it that ordinary people can play it. How can they hire dozens of hundreds of people to play this?
So, just like a database, you still need a bunch of professional people to play with this stuff. Now there are basically big data solutions on the public cloud. When a small company needs a big data platform, there is no need to purchase a thousand machines. As long as it is on the public cloud, this thousand machines are out, and The big data platform that has been deployed above, just put the data into it and you can do it.
Cloud computing requires big data, big data requires cloud computing, and the two are combined.