Internet Technologies 简明教程
Search Engines
Introduction
Search Engine 指的是一个巨大的互联网资源数据库,例如网页、新闻组、程序、图像等。它有助于在万维网上定位信息。
Search Engine refers to a huge database of internet resources such as web pages, newsgroups, programs, images etc. It helps to locate information on World Wide Web.
用户可以通过按关键字或词组的形式传递查询来搜索任何信息。然后它会在其数据库中搜索相关信息并返回给用户。
User can search for any information by passing query in form of keywords or phrase. It then searches for relevant information in its database and return to the user.
Search Engine Components
搜索引擎通常具有三个基本组件,如下所列:
Generally there are three basic components of a search engine as listed below:
Web crawler
它也称为 spider 或 bots. 。这是一个遍历网络以收集信息的软件组件。
It is also known as spider or bots. It is a software component that traverses the web to gather information.
Search Engine Working
网络爬虫、数据库和搜索界面是搜索引擎的主要组件,实际上它们使搜索引擎能够工作。搜索引擎使用布尔表达式 AND、OR、NOT 来限制和拓宽搜索结果。以下是搜索引擎执行的步骤:
Web crawler, database and the search interface are the major component of a search engine that actually makes search engine to work. Search engines make use of Boolean expression AND, OR, NOT to restrict and widen the results of a search. Following are the steps that are performed by the search engine:
-
The search engine looks for the keyword in the index for predefined database instead of going directly to the web to search for the keyword.
-
It then uses software to search for the information in the database. This software component is known as web crawler.
-
Once web crawler finds the pages, the search engine then shows the relevant web pages as a result. These retrieved web pages generally include title of page, size of text portion, first several sentences etc.
-
User can click on any of the search results to open it.
Architecture
搜索引擎架构包含以下三个基本层:
The search engine architecture comprises of the three basic layers listed below:
-
Content collection and refinement.
-
Search core
-
User and application interfaces
Search Engine Processing
Indexing Process
索引过程包括以下三个任务:
Indexing process comprises of the following three tasks:
-
Text acquisition
-
Text transformation
-
Index creation
识别并存储要索引的文档。
It identifies and stores documents for indexing.
将文档转换为索引项或特征。
It transforms document into index terms or features.
获取由文本转换创建的索引项,并创建数据结构来支持快速搜索。
It takes index terms created by text transformations and create data structures to suport fast searching.
Query Process
查询过程包括以下三个任务:
Query process comprises of the following three tasks:
-
User interaction
-
Ranking
-
Evaluation
支持创建和优化用户查询并显示结果。
It supporst creation and refinement of user query and displays the results.
使用查询和索引创建文档的有序列表。
It uses query and indexes to create ranked list of documents.
监测和衡量有效性和效率。它是在线完成的。
It monitors and measures the effectiveness and efficiency. It is done offline.
Examples
以下是一些现今可用的搜索引擎:
Following are the several search engines available today:
Search Engine |
Description |
It was originally called BackRub. It is the most popular search engine globally. |
|
Bing |
It was launched in 2009 by Microsoft. It is the latest web-based search engine that also delivers Yahoo’s results. |
Ask |
It was launched in 1996 and was originally known as Ask Jeeves. It includes support for match, dictionary, and conversation question. |
AltaVista |
It was launched by Digital Equipment Corporation in 1995. Since 2003, it is powered by Yahoo technology. |
AOL.Search |
It is powered by Google. |
LYCOS |
It is top 5 internet portal and 13th largest online property according to Media Matrix. |
Alexa |
It is subsidiary of Amazon and used for providing website traffic information. |