
python 爬虫正则练习题及代码答案
正则练习题
1 已知字符串:
info = 'baidu'
用正则模块提取出网址:"http://www.baidu.com"和链接文本:"baidu"
2 字符串:"one1two2three3four4" 用正则处理,输出 "1234"
3 已知字符串:text = "JGood is a handsome boy, he is cool, clever, and so on..."
查找所有包含'oo'的单词。
正则练习题答案:
import re
# 1 已知字符串:
# info = '<a href="http://www.baidu.com">baidu</a>'
# 用正则模块提取出网址:"http://www.baidu.com"和链接文本:"baidu"
info = '<a href="http://www.baidu.com">baidu</a>'
# pattern1=re.compile(r'http:.+.com')#['www.baidu.com', 'baidu']
# pattern1=re.compile(r"[a-z.]*baidu[.a-z]*")#['www.baidu.com', 'baidu']
pattern1=re.compile(r"[w.]*baidu\.*\w*") #['www.baidu.com', 'baidu']
f1=pattern1.findall(info)
print(f1)
# print(f1[0])
#2 字符串:"one1two2three3four4" 用正则处理,输出 "1234"
info1="one1two2three3four4"
pattern2=re.compile(r'\d{1}')
f2=pattern2.findall(info1)
print(f2) #['1', '2', '3', '4']
# 3 已知字符串:text = "JGood is a handsome boy, he is cool, clever, and so on..."
查找所有包含'oo'的单词。
info3="JGood is a handsome boy, he is cool, clever, and so on..."
pattern3=re.compile(r'\w*oo\w*')
f3=pattern3.findall(info3)
print(f3)