外文翻译--网络服务的爬虫引擎.doc资源-CSDN文库

70 浏览量 2023-07-02 14:05:51 上传评论收藏 607KB DOC 举报

资源推荐

资源详情

资源评论

外文资料

WSCE: A Crawler Engine for Large-Scale Discovery of Web Services

Eyhab Al-Masri and Qusay H. Mahmoud

Abstract

This paper addresses issues relating to the efficient access and discovery of Web

services across multiple UDDI Business Registries (UBRs). The ability to explore

Web services across multiple UBRs is becoming a challenge particularly as size and

magnitude of these registries increase. As Web services proliferate, finding an

appropriate Web service across one or more service registries using existing registry

APIs (i.e. UDDI APIs) raises a number of concerns such as performance, efficiency,

end-to-end reliability, and most importantly quality of returned results. Clients do not

have to endlessly search accessible UBRs for finding appropriate Web services

particularly when operating via mobile devices. Finding relevant Webservices should

be time effective and highly productive. In an attempt to enhance the efficiency of

searching for businesses and Web services across multiple UBRs, we propose a novel

exploration engine, the Web Service Crawler Engine (WSCE). WSCE is capable of

crawling multiple UBRs, and enables for the establishment of a centralized Web

services’repository which can be used for large-scale discovery of Web services. The

paper presents experimental validation, results, and analysis of the presented ideas.

1. Introduction

The continuous growth and propagation of the internet have been some of the

main factors for information overload which at many instances act as deterrents for

quick and easy discovery of information. Web services are internet-based, modular

applications, and the automatic discovery and composition of Web services are an

emerging technology of choice for building understandable applications used for

business-to-business integration and are of an immense interest to governments,

businesses, as well as individuals. As Web services proliferate, the same dilemma

perceived in the discovery of Web pages will become tangible and the searching for

specific business applications or Web services becomes challenging and time

consuming particularly as the number of UDDI Business Registries (UBRs) begins to

multiply.

In addition, decentralizing UBRs adds another level of complexity on how to

effectively find Web services within these distributed registries. Decentralization of

UBRs is becoming tangible as new operating systems, applications, and APIs are

already equipped with built-in functionalities and tools that enable organizations or

businesses to publish their own internal UBRs for intranet and extranet use such as the

Enterprise UDDI Services in Windows Server 2003, WebShpere Application Server,

Systinet Business Registry, jUDDI, to name a few. Enabling businesses or

organizations to self-operate and mange their own UBRs will maximize the likelihood

of having a significant increase in the number of business registries and therefore,

clients will soon face the challenge of finding Web services across hundreds, if not

thousands of UBRs.

At the heart of the Service Oriented Architecture (SOA) is a service registry

which connects and mediates service providers with clients as shown in Figure 1.

Service registries extend the concept of an application-centric Web by allowing

clients (or conceivably applications) to access a wide range of Web services that

match specific search criteria in an autonomous manner.

Without publishing Web services through registries, clients will not be able to locate

services in an efficient manner, and service providers will have to devote extra

efforts in advertising their services through other channels. There are several

companies that offer Web-based Web service directories such as WebServiceList [1],

RemoteMethods [2], WSIndex [3], and XMethods.net [4]. However, due to the fact

that these Web-based service directories fail to adhere to Web services’ standards

such as UDDI, it is likely that they become vulnerable to being unreliable sources

forfinding relevant Web services, and may become disconnected from the Web

services environment as in the cases of BindingPoint and SalCentral which closed

their Web-based Web service directories after many years of exposure.

Apart from having Web-based service directories, there have been numerous

efforts that attempted to improve the discovery of Web services [5,6,9,21], however,

many of them have failed to address the issue of handling discovery operations across

multiple UBRs. Due to the fact that UBRs are hosted on Web servers, they are

dependent on network traffic and performance, and therefore, clients that are looking

for appropriate Web services are susceptible to performance issues

when carrying out multiple UBR search requests. To address the above-mentioned

issues, this work introduces a framework that serves as the heart of our Web Services

Repository Builder (WSRB) architecture [7] by enhancing the discovery of Web

services without having any modifications to exiting standards. In this paper, we

propose the Web Service Crawler Engine (WSCE) which actively crawls accessible

UBRs and collects business and Web service information. Our architecture enables

businesses and organizations to maintain autonomous control over their UBRs while

allowing clients to perform search queries adapted to large-scale discovery of Web

services. Our solution has been tested and results present high performance rates

when compared with other existing models.

The remainder of this paper is organized as follows. Section two discusses related

work. Section three discusses some of the limitations with existing UBRs. Section

four discusses the motivations for WSCE. Section five presents our Web service

crawler engine’s architecture. Experiments and results are discussed in Section six,

and finally conclusion and future work are discussed in Section seven.

2. Related Work

Discovery of Web services is a fundamental area of research in ubiquitous computing.

Many researchers have focused on discovering Web services through a centralized

UDDI registry [8,9,10]. Although centralized registries can provide effective methods

for the discovery of Web services, they suffer from problems associated with having

centralized systems such as single point of failure, and bottlenecks. In addition, other

issues relating to the scalability of data replication, providing notifications to all

subscribers when performing any system upgrades, and handling versioning of

services from the same provider have driven researchers to find other alternatives.

Other approaches focused on having multiple public/private registries grouped into

registry federations [6,12] such as METEOR-S for enhancing the discovery process.

METEOR-S provides a discovery mechanism for publishing Web services over

federated registries but this solution does not provide the means for articulating

advanced search techniques which are essential for locating appropriate business

applications. In addition, having federated registry environments can potentially

provide inconsistent policies to be employed which will have a significant impact on

the practicability of conducting inquiries across them. Furthermore, federated

registry environments will have increased configuration overhead, additional

processing time, and poor performance in terms of execution time when performing

service discovery operations. A desirable solution would be a Web services’ crawler

engine such as WSCE that can facilitate the aggregation of Web service references,

resources, and description documents, and can provide clients with a standard,

universal access point for discovering Web services distributed across multiple

registries.

Several approaches focused on applying traditional Information Retrieval (IR)

techniques or using keyword-based matching [13,14] which primarily depend on

analyzing the frequency of terms. Other attempts focused on schema matching [15,16]

which try to understand the meanings of the schemas and suggest any trends or

patterns. Other approaches studied the use of supervised classification and

unsupervised clustering of Web services [17], artificial neural networks [18], or using

unsupervised matching at the operation level [19].

Other approaches focused on the peer-to-peer framework architecture for service

discovery and ranking [20], providing a conceptual model based on Web service

reputation [21], and providing keyword-based search engine for querying Web

services [22]. However, many of these approaches provide a very limited set of search

methods (i.e. search by business name, business location, etc.) and attempt to apply

traditional IR techniques that may not be suitable for services’ discovery since Web

services often contain or provide very brief textual description of what they offer. In

addition, the Web services’ structure is complex and only a small portion of text is

often provided.

WSCE enhances the process of discovering Web services by providing advanced

search capabilities for locating proper business applications across one or more UDDI

registries and any other searchable repositories. In addition, WSCE allows for high

performance and reliable discovery mechanism while current approaches are mainly

dependent on external resources which in turn can significantly impact the ability to

provide accurate and meaningful results. Furthermore, current techniques do not take

into consideration the ability to predict, detect, recover from failures at the Web

service host, or keep track of any dynamic updates or service changes.

3. UDDI Business Registries (UBRs)

Business registries provide the foundation for the cataloging and classification of

Web services and other additional components. A UDDI Business Registry (UBR)

serves as a service directory for the publishing of technical information about Web

services [23]. The UDDI is an initiative originally backed up by several technology

companies including Microsoft, IBM, and Ariba [24] and aims at providing a focal

point where all businesses, including their Web services meet together in an open and

platform-independent framework. Hundreds of other companies have endorsed the

UDDI initiative including HP, Intel, Fujitsu, BEA, Oracle, SAP, Nortel Networks,

WebMethods, Andersen Consulting, Sun Microsystems, to name a few. E-Business

XML (ebXML) is another service registry standard that focuses more on the

collaboration between businesses [27]. Although commonalities between UDDI and

ebXML registries present opportunities for interoperability between them [26], the

剩余28页未读，继续阅读

评论收藏

内容反馈

xinkai1688

粉丝: 335
资源: 8万+

外文翻译--网络服务的爬虫引擎.doc

外文翻译--单片机应用.doc

外文翻译--PLC简介.doc

外文翻译--人工智能.doc

外文翻译--学生信息管理系统.doc

外文翻译--工程中的单片机.doc

外文翻译--基于网络爬虫的有效URL缓存.doc

文献网络计算机网络 外文文献 英文文献 外文翻译 探讨搜索引擎爬虫.doc

Python基础入门课程-学习笔记(1).doc

外文翻译--客户信息管理系统.doc

外文翻译--关系数据库的结构.doc

外文翻译--单片机AT89C52.doc

外文翻译--信息管理系统.doc

物联网--外文文献翻译.doc

Python基础入门课程-学习笔记.doc

UItest.doc

若干源程序资料12.rar

java版中国象棋源码-bot_irori:问就是女生自用

python入门到高级全栈工程师培训 第3期 附课件代码

外文翻译--AT89S52单片机.doc

外文翻译--数据库管理.doc

计算机外文翻译--数据库安全.doc

外文翻译--MATLAB.doc

生产系统--外文翻译.doc

外文翻译--计算机病毒.doc

外文翻译--单片机介绍.doc

最新资源

文献网络计算机网络外文文献英文文献外文翻译探讨搜索引擎爬虫.doc

python入门到高级全栈工程师培训第3期附课件代码