Kai-Fu Lee, President of Google China, since July 2005, recently gave presentation about Cloud Computing at the 17th International World Wide Web Conference. John Breslin highlights some interesting ideas from Kai-Fu Lee's presentation keynote.
He mentions six properties of cloud computing from Google's perspective:
- User centric. “If data is all stored in the Cloud - images, messages, whatever - once you're connected to the Cloud, any new PC or mobile device that can access your data becomes yours. Not only is the data yours, but you can share it with others.
- Task centric. “The applications of the past - spreadsheets, e-mail, calendar - are becoming modules, and can be composed and laid out in a task-specific manner. (...) Google considers communication to be a task” and that's the reason why Gmail integrates a chat feature for instant communication.
- Powerful. “Having lots of computers in the Cloud means that it can do things that your PC cannot do. For example, Google Search is faster than searching in Windows or Outlook or Word” because a Google query hits at least 1000 machines.
- Accessible. Having your data in the cloud means you can instantly get more information from different repositories - Google's universal search is one example of simultaneous search. “Traditional web page search does IR / TF-IDF / page rank stuff pretty well on the Web at large, but if you want to do a specific type of search, for restaurants, images, etc., web search isn't necessarily the best option. It's difficult for most people to get to the right vertical search page in the first place, since they usually can't remember where to go. Universal search is basically a single search that will access all of these vertical searches.
- Intelligent. “Data mining and massive data analysis are required to give some intelligence to the masses of data available (massive data storage + massive data analysis = Google Intelligence).
- Programmable. “For fault tolerance, Google uses GFS or distributed disk storage. Every piece of data is replicated three times. If one machine dies, a master redistributes the data to a new server. There are around 200 clusters (some with over 5 PB of disk space on 500 machines). The Big Table is used for distributed memory. The largest cells in the Big Table are 700 TB, spread over 2000 machines. MapReduce is the solution for new programming paradigms. It cuts a trillion records into a thousand parts on a thousand machines. Each machine will then load a billion records and will run the same program over these records, and then the results are recombined. While in 2005, there were some 72,000 jobs being run on MapReduce, in 2007, there were two million jobs (use seems to be increasing exponentially). This recent video has more information about Google's infrastructure.