Hadoop 简明教程
Hadoop - Big Data Solutions
Traditional Approach
与此方法类似,一家企业将备有一台计算机来存储和处理海量数据。出于存储的目的,程序员将借助他们的数据库供应商(比如甲骨文和 IBM 等)的选择。在此方法中,用户与应用程序交互,应用程序再处理数据存储和分析部分。
In this approach, an enterprise will have a computer to store and process big data. For storage purpose, the programmers will take the help of their choice of database vendors such as Oracle, IBM, etc. In this approach, the user interacts with the application, which in turn handles the part of data storage and analysis.

Limitation
此方法对于那些处理少量的标准数据库服务器即可容纳或达到正在处理数据的处理器限制的数据的应用程序来说效果很好。但当处理大量的可扩展数据时,通过单一数据库瓶颈处理此类数据是一件繁重的工作。
This approach works fine with those applications that process less voluminous data that can be accommodated by standard database servers, or up to the limit of the processor that is processing the data. But when it comes to dealing with huge amounts of scalable data, it is a hectic task to process such data through a single database bottleneck.
Google’s Solution
Google 使用一种称为 MapReduce 的算法解决了此问题。该算法将任务分成小块,并将它们分配给多台计算机,并从它们那里收集结果,这些结果在集成后形成结果数据集。
Google solved this problem using an algorithm called MapReduce. This algorithm divides the task into small parts and assigns them to many computers, and collects the results from them which when integrated, form the result dataset.

Hadoop
使用Google提供的解决方案, Doug Cutting 和他的团队开发了一个名为 HADOOP 的开源项目。
Using the solution provided by Google, Doug Cutting and his team developed an Open Source Project called HADOOP.
Hadoop 使用MapReduce算法运行应用程序,其中数据与其他数据并行处理。简而言之,Hadoop 用于开发可以在海量数据上执行完整统计分析的应用程序。
Hadoop runs applications using the MapReduce algorithm, where the data is processed in parallel with others. In short, Hadoop is used to develop applications that could perform complete statistical analysis on huge amounts of data.
