没有合适的资源?快使用搜索试试~ 我知道了~
外文翻译--网络服务的爬虫引擎.doc
0 下载量 70 浏览量
2023-07-02
14:05:51
上传
评论
收藏 607KB DOC 举报
温馨提示
试读
29页
外文翻译--网络服务的爬虫引擎.doc
资源推荐
资源详情
资源评论
外文资料
WSCE: A Crawler Engine for Large-Scale Discovery of Web Services
Eyhab Al-Masri and Qusay H. Mahmoud
Abstract
This paper addresses issues relating to the efficient access and discovery of Web
services across multiple UDDI Business Registries (UBRs). The ability to explore
Web services across multiple UBRs is becoming a challenge particularly as size and
magnitude of these registries increase. As Web services proliferate, finding an
appropriate Web service across one or more service registries using existing registry
APIs (i.e. UDDI APIs) raises a number of concerns such as performance, efficiency,
end-to-end reliability, and most importantly quality of returned results. Clients do not
have to endlessly search accessible UBRs for finding appropriate Web services
particularly when operating via mobile devices. Finding relevant Webservices should
be time effective and highly productive. In an attempt to enhance the efficiency of
searching for businesses and Web services across multiple UBRs, we propose a novel
exploration engine, the Web Service Crawler Engine (WSCE). WSCE is capable of
crawling multiple UBRs, and enables for the establishment of a centralized Web
services’repository which can be used for large-scale discovery of Web services. The
paper presents experimental validation, results, and analysis of the presented ideas.
1. Introduction
The continuous growth and propagation of the internet have been some of the
main factors for information overload which at many instances act as deterrents for
quick and easy discovery of information. Web services are internet-based, modular
applications, and the automatic discovery and composition of Web services are an
emerging technology of choice for building understandable applications used for
business-to-business integration and are of an immense interest to governments,
businesses, as well as individuals. As Web services proliferate, the same dilemma
perceived in the discovery of Web pages will become tangible and the searching for
specific business applications or Web services becomes challenging and time
consuming particularly as the number of UDDI Business Registries (UBRs) begins to
multiply.
In addition, decentralizing UBRs adds another level of complexity on how to
effectively find Web services within these distributed registries. Decentralization of
UBRs is becoming tangible as new operating systems, applications, and APIs are
already equipped with built-in functionalities and tools that enable organizations or
businesses to publish their own internal UBRs for intranet and extranet use such as the
Enterprise UDDI Services in Windows Server 2003, WebShpere Application Server,
Systinet Business Registry, jUDDI, to name a few. Enabling businesses or
organizations to self-operate and mange their own UBRs will maximize the likelihood
of having a significant increase in the number of business registries and therefore,
clients will soon face the challenge of finding Web services across hundreds, if not
thousands of UBRs.
At the heart of the Service Oriented Architecture (SOA) is a service registry
which connects and mediates service providers with clients as shown in Figure 1.
Service registries extend the concept of an application-centric Web by allowing
clients (or conceivably applications) to access a wide range of Web services that
match specific search criteria in an autonomous manner.
Without publishing Web services through registries, clients will not be able to locate
services in an efficient manner, and service providers will have to devote extra
efforts in advertising their services through other channels. There are several
companies that offer Web-based Web service directories such as WebServiceList [1],
RemoteMethods [2], WSIndex [3], and XMethods.net [4]. However, due to the fact
that these Web-based service directories fail to adhere to Web services’ standards
such as UDDI, it is likely that they become vulnerable to being unreliable sources
forfinding relevant Web services, and may become disconnected from the Web
services environment as in the cases of BindingPoint and SalCentral which closed
their Web-based Web service directories after many years of exposure.
Apart from having Web-based service directories, there have been numerous
efforts that attempted to improve the discovery of Web services [5,6,9,21], however,
many of them have failed to address the issue of handling discovery operations across
multiple UBRs. Due to the fact that UBRs are hosted on Web servers, they are
dependent on network traffic and performance, and therefore, clients that are looking
for appropriate Web services are susceptible to performance issues
when carrying out multiple UBR search requests. To address the above-mentioned
issues, this work introduces a framework that serves as the heart of our Web Services
Repository Builder (WSRB) architecture [7] by enhancing the discovery of Web
services without having any modifications to exiting standards. In this paper, we
propose the Web Service Crawler Engine (WSCE) which actively crawls accessible
UBRs and collects business and Web service information. Our architecture enables
businesses and organizations to maintain autonomous control over their UBRs while
allowing clients to perform search queries adapted to large-scale discovery of Web
services. Our solution has been tested and results present high performance rates
when compared with other existing models.
The remainder of this paper is organized as follows. Section two discusses related
work. Section three discusses some of the limitations with existing UBRs. Section
four discusses the motivations for WSCE. Section five presents our Web service
crawler engine’s architecture. Experiments and results are discussed in Section six,
and finally conclusion and future work are discussed in Section seven.
2. Related Work
Discovery of Web services is a fundamental area of research in ubiquitous computing.
Many researchers have focused on discovering Web services through a centralized
UDDI registry [8,9,10]. Although centralized registries can provide effective methods
for the discovery of Web services, they suffer from problems associated with having
centralized systems such as single point of failure, and bottlenecks. In addition, other
issues relating to the scalability of data replication, providing notifications to all
subscribers when performing any system upgrades, and handling versioning of
services from the same provider have driven researchers to find other alternatives.
Other approaches focused on having multiple public/private registries grouped into
registry federations [6,12] such as METEOR-S for enhancing the discovery process.
METEOR-S provides a discovery mechanism for publishing Web services over
federated registries but this solution does not provide the means for articulating
advanced search techniques which are essential for locating appropriate business
applications. In addition, having federated registry environments can potentially
provide inconsistent policies to be employed which will have a significant impact on
the practicability of conducting inquiries across them. Furthermore, federated
registry environments will have increased configuration overhead, additional
processing time, and poor performance in terms of execution time when performing
service discovery operations. A desirable solution would be a Web services’ crawler
engine such as WSCE that can facilitate the aggregation of Web service references,
resources, and description documents, and can provide clients with a standard,
universal access point for discovering Web services distributed across multiple
registries.
Several approaches focused on applying traditional Information Retrieval (IR)
techniques or using keyword-based matching [13,14] which primarily depend on
analyzing the frequency of terms. Other attempts focused on schema matching [15,16]
which try to understand the meanings of the schemas and suggest any trends or
patterns. Other approaches studied the use of supervised classification and
unsupervised clustering of Web services [17], artificial neural networks [18], or using
unsupervised matching at the operation level [19].
Other approaches focused on the peer-to-peer framework architecture for service
discovery and ranking [20], providing a conceptual model based on Web service
reputation [21], and providing keyword-based search engine for querying Web
services [22]. However, many of these approaches provide a very limited set of search
methods (i.e. search by business name, business location, etc.) and attempt to apply
traditional IR techniques that may not be suitable for services’ discovery since Web
services often contain or provide very brief textual description of what they offer. In
addition, the Web services’ structure is complex and only a small portion of text is
often provided.
WSCE enhances the process of discovering Web services by providing advanced
search capabilities for locating proper business applications across one or more UDDI
registries and any other searchable repositories. In addition, WSCE allows for high
performance and reliable discovery mechanism while current approaches are mainly
dependent on external resources which in turn can significantly impact the ability to
provide accurate and meaningful results. Furthermore, current techniques do not take
into consideration the ability to predict, detect, recover from failures at the Web
service host, or keep track of any dynamic updates or service changes.
3. UDDI Business Registries (UBRs)
Business registries provide the foundation for the cataloging and classification of
Web services and other additional components. A UDDI Business Registry (UBR)
serves as a service directory for the publishing of technical information about Web
services [23]. The UDDI is an initiative originally backed up by several technology
companies including Microsoft, IBM, and Ariba [24] and aims at providing a focal
point where all businesses, including their Web services meet together in an open and
platform-independent framework. Hundreds of other companies have endorsed the
UDDI initiative including HP, Intel, Fujitsu, BEA, Oracle, SAP, Nortel Networks,
WebMethods, Andersen Consulting, Sun Microsystems, to name a few. E-Business
XML (ebXML) is another service registry standard that focuses more on the
collaboration between businesses [27]. Although commonalities between UDDI and
ebXML registries present opportunities for interoperability between them [26], the
UDDI remains the de facto industry standard for Web service discovery [21].
Although the UDDI provides ways for locating businesses and how to interface with
them electronically, it is limited to a single search criterion. Keyword-based search
techniques offered by UDDI will make it impractical to assume that it can be very
useful for Web services’ discovery or composition. In addition, a client does not have
to endlessly search UBRs for finding an appropriate Web service. As Web services
proliferate and the number of UBRs increases, limited search capabilities are likely to
yield less meaningful search results which makes the task of performing search
queries across one or multiple UBRs very time consuming, and less productive.
3.1. Limitations with Current UDDI
Apart from the problems regarding limited search capabilities offered by UDDI,
there are other major limitations and shortcomings with the existing UDDI standard.
Some of these limitations include: (1) UDDI was intended to be used only for Web
services’ discovery; (2) UDDI registration is voluntary, and therefore, it risks
becoming passive; (3) UDDI does not provide any guarantees to the validity and
quality of information it contains; (4) the disconnection between UDDI and the
current Web; (5) UDDI is incapable of providing Quality of Service (QoS)
measurements for registered Web services, which can provide helpful information to
clients when choosing appropriate Web services, (6) UDDI does not clearly define
how service providers can advertise pricing models; and (7) UDDI does not maintain
nor provide any Web service life-cycle information (i.e. Web services across
stages).
Other limitations with the current UDDI standard [23] are shown in Table 1.
剩余28页未读,继续阅读
资源评论
xinkai1688
- 粉丝: 335
- 资源: 8万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 基于Vue的电商购物网站设计源码
- 基于Java的elasticsearch视频教程配套Maven工程eshelloword设计源码
- 基于Typescript的兔子饭店经营类游戏源码设计免费送cocoscreator
- 基于Java的web快速开发数据权限管理脚手架wonder-server设计源码
- 基于Apache Log4cxx的C++日志库设计源码
- 基于Vue3的likeadmin免费任意商用管理后台设计源码
- 基于JavaScript的Chrome扩展WeNote分享插件设计源码
- 基于C++的中泰EM9108S动态库开发示例源码
- gxlx2-p291-1g.dts和gxlx2-p291-1g.dtb
- STM32WBxx Keil芯片包
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功