RepresentingWebApplicationsAsKnowledgeGraphs.pdf资源-CSDN文库

版权申诉

article

175 浏览量 2024-10-24 15:46:03 上传评论收藏 779KB PDF 举报

资源推荐

资源详情

资源评论

Representing Web Applications As

Knowledge Graphs

Yogesh Chandrasekharuni

Skil Inc, USA

yogesh@skil.ai

Abstract—Traditional methods for crawling and parsing web

applications predominantly rely on extracting hyperlinks from

initial pages and recursively following linked resources. This

approach constructs a graph where nodes represent unstructured

data from web pages, and edges signify transitions between them.

However, these techniques are limited in capturing the dynamic

and interactive behaviors inherent to modern web applications. In

contrast, the proposed method models each node as a structured

representation of the application’s current state, with edges

reﬂecting user-initiated actions or transitions. This structured

representation enables a more comprehensive and functional

understanding of web applications, offering valuable insights

for downstream tasks such as automated testing and behavior

analysis.

I. INTRODUCTION

Web applications require rich data representation for down-

stream tasks such as automation testing, user behavior analy-

sis, and functional veriﬁcation. Traditional web parsers operate

through a structured yet simplistic algorithm:

1) Initialize a queue with the starting page.

2) Set a maximum depth (if applicable) and initialize the

current depth to zero.

3) While the queue is not empty and the maximum depth

is not exceeded:

a) Dequeue the next page from the queue.

b) If the page has not been visited:

i) Navigate to the page.

ii) Extract the desired data and store it as a node.

iii) Extract all hyperlinks from the page.

iv) Add all unseen and unvisited hyperlinks to the

queue.

v) Mark the current page as visited.

c) Increment the depth if moving to a new level.

4) Stop when all pages are visited or the maximum depth

is reached.

While this approach effectively scrapes static web applica-

tions, it falls short in handling dynamic applications, where

signiﬁcant portions of the application are unreachable through

simple hyperlink navigation. Modern web applications often

follow structured user ﬂows, which involve interaction beyond

hyperlinks. For instance, in an e-commerce site, reaching the

checkout page might require several actions: searching for a

product, adding it to the cart, entering a delivery location,

and only then accessing the checkout. Traditional parsers,

which rely solely on clicking hyperlinks, cannot capture such

dynamic ﬂows and are limited in their ability to represent the

application’s state accurately.

Additionally, many web applications exhibit variability at

the same endpoint depending on the user’s context. For ex-

ample, a checkout page may display ”Ready to purchase” for

one user and ”Item cannot be delivered to your location” for

another, based on the delivery address provided.

In this work, the proposed solution overcomes these limita-

tions by representing each unique state of a web application

as a node, with edges deﬁned by speciﬁc actions taken within

the application. This method captures the full complexity of

user ﬂows, allowing for a more accurate and interpretable

knowledge representation of web applications.

II. BACKGROUND

Early web crawlers, such as World Wide Web Wanderer

(1993) [1], were primarily designed to map the size of the

web by collecting basic HTML from static websites [2]. As

the web expanded, tools like JumpStation emerged, becoming

the ﬁrst search engine to use crawlers for indexing web content

[3]. These early systems, however, were limited to handling

static web content, as dynamic web pages driven by JavaScript

and AJAX had not yet become widespread.

The emergence of dynamic content signiﬁcantly compli-

cated the process of web scraping for traditional parsers.

Frameworks such as Beautiful Soup (2004) were introduced

to facilitate the extraction of structured data from increasingly

complex web pages. Although effective for parsing static

HTML content, these tools were inherently limited in their

capacity to handle dynamic, JavaScript-driven web elements

or to interact with user-initiated events. As modern web appli-

cations began to rely heavily on dynamic content loading and

client-side interactions, more advanced methodologies became

necessary to accurately capture these behaviors. Several tools

have been developed to address these challenges. Selenium [4]

is widely used for automating browser interactions, allowing

developers to simulate user actions such as clicking, typing,

and submitting forms.

To address these limitations, visual web scraping tools like

Octoparse [5] emerged, offering user-friendly interfaces that

allowed non-programmers to automate the extraction of both

static and dynamic website data. These tools simulate user

behavior, such as clicks and form submissions, to capture data.

However, tools like Octoparse lack self-exploration capabili-

ties and are unable to reason through or autonomously navigate

arXiv:2410.17258v1 [cs.IR] 6 Oct 2024

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余6页未读，立即下载

评论收藏

内容反馈

版权申诉

soso1968

粉丝: 3063
资源: 1万+

Representing Web Applications As Knowledge Graphs.pdf

最新资源

Representing Web Applications As Knowledge Graphs.pdf

Knowledge Graphs Fundamentals, Techniques, and Applications

An End-to-End Network for Generating Social Relationship Graphs.pdf

Graph Neural Networks_ A Review of Methods and Applications----清华大学周杰.pdf

tudy of Metaphorical Patterns in Editorial Cartoons Representing the Global F-论文.zip

Novel Diffusion-Based Models for Image Restoration and Interpolation2019.pdf

DNA中信息的结构感知智能编码和解码（计算机博士论文英文参考资料）.pdf

基于双向LSTM神经网络和注意模型的语音情感分析.pdf

NIST SP500-304.pdf

企业管理制度律师职业独立理论及制度研究.pdf

CLEVO W110ERW110ERF.pdf

SCWCD-310-081.pdf

Slides04.pdf

Mahout in Action完整版本.pdf

国科大刘莹数据挖掘第二次作业.pdf

多层卷积脉冲神经网络.pdf

prentice Hall .Computer Systems a programmer's perspective.pdf

Introduction_to_Optimum_Design.pdf

AkkaScala.pdf

MC74HC00A_datasheet-2.pdf

nerf_eccv2020.pdf

Competitive Programmer's Handbook Antti Laaksonen

Machine Learning with TensorFlow.pdf

Unix shell programming in 24 hours.pdf

更多目录以及详细说明（年份、来源、截图等）

相关实用应用程序（Windows可用）

最新资源