html.rar_C#HTML解析_chtml解析_htmlC#资源-CSDN文库

共1个文件

docx：1个

版权申诉

92 浏览量 2022-09-22 20:50:15 上传评论收藏 19KB RAR 举报

资源推荐

资源详情

资源评论

收起资源包目录

package

html.rar （1个子文件）

html.docx 22KB

在搜索引擎的开发中，我们需要对网页的 Html 内容进行检索，难免的就需要对 Html 进行解析。拆分每一个节点并且获取节点间的

内容。此文介绍两种 C#解析 Html 的方法。

C#解析 Html 的第一种方法：

用 System.Net.WebClient 下载 Web Page 存到本地文件或者 String 中，用正则表达式来分析。这个方法可以用在 Web Crawler 等需

要分析很多 Web Page 的应用中。

估计这也是大家最直接，最容易想到的一个方法。

转自网上的一个实例：所有的 href 都抽取出来：

1. using System;

2. using System.Net;

3. using System.Text;

4. using System.Text.RegularExpressions;

5. namespace HttpGet

6. {

7. class Class1

8. {

9. [STAThread]

10. static void Main(string[] args)

11. {

12. System.Net.WebClient client = new WebClient();

13. byte[] page = client.DownloadData("http://www.google.com");

14. string content = System.Text.Encoding.UTF8.GetString(page);

15. string regex = "href=[\\"\\'](http:\/\/|\.\/|\/)?\w+(\.\w+)*(\/\w+(\.\w+)?)*(\

/|\?\w*=\w*(&\w*=\w*)*)?[\\"\\']";

16. Regex re = new Regex(regex);

17. MatchCollection matches = re.Matches(content);

18.

19. System.Collections.IEnumerator enu = matches.GetEnumerator();

20. while (enu.MoveNext() && enu.Current != null)

21. {

22. Match match = (Match)(enu.Current);

23. Console.Write(match.Value + "");

24. }

25. }

26. }

27. }

� 一些爬虫的 HTML 解析中也是用的类似的方法。

� C#解析 Html 的第二种方法：

� 利用 Winista.Htmlparser.Net 解析 Html。这是.NET 平台下解析 Html 的开源代码，网上有源码下载，百度一下就能搜到，这

里就不提供了。并且有英文的帮助文档。找不到的留下邮箱。

� 个人认为这是.net 平台下解析 html 不错的解决方案，基本上能够满足我们对 html 的解析工作。

� 自己做了个实例：

内容反馈

版权申诉

JonSco

粉丝: 67
资源: 1万+

最新资源

资源上传下载、课程学习等过程中有任何疑问或建议，欢迎提出宝贵意见哦~我们会及时处理！点击此处反馈

feedback-tip