## python3验证码机器学习
### 文档结构为
```
-- iconset
-- ...
-- jpg
-- captcha.gif
-- py
-- crack.py
```
### 需要的库
`pip3 install pillow` or `easy_install Pillow`
### 必须文件下载地址
[python3验证码机器学习](https://github.com/TTyb/)
> 1.读取图片,打印图片的结构直方图
```
# !/usr/bin/python3.4
# -*- coding: utf-8 -*-
# From:https://zhuanlan.zhihu.com/p/24222942
# 该知乎栏目为py2编写,这里改造成py3
im = Image.open("../jpg/captcha.gif")
his = im.histogram()
```
打印结果为
`[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 2, 0, 1, 0, 0, 1, 0, 2, 0, 0, 1, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 3, 1, 3, 3, 0, 0, 0, 0, 0, 0, 1, 0, 3, 2, 132, 1, 1, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 15, 0, 1, 0, 1, 0, 0, 8, 1, 0, 0, 0, 0, 1, 6, 0, 2, 0, 0, 0, 0, 18, 1, 1, 1, 1, 1, 2, 365, 115, 0, 1, 0, 0, 0, 135, 186, 0, 0, 1, 0, 0, 0, 116, 3, 0, 0, 0, 0, 0, 21, 1, 1, 0, 0, 0, 2, 10, 2, 0, 0, 0, 0, 2, 10, 0, 0, 0, 0, 1, 0, 625]`
该数组长度为255,每一个元素代表(0-255)颜色的多少,例如最后一个元素为625,即255(代表的是白色)最多,组合在一起
```
values = {}
for i in range(0, 256):
values[i] = his[i]
# 排序,x:x[1]是按照括号内第二个字段进行排序,x:x[0]是按照第一个字段
temp = sorted(values.items(), key=lambda x: x[1], reverse=True)
# print(temp)
```
打印结果为
`[(255, 625), (212, 365), (220, 186), (219, 135), (169, 132), (227, 116), (213, 115), (234, 21), (205, 18), (184, 15), (241, 10), (248, 10), (191, 8), (198, 6), (155, 3), (157, 3), (158, 3), (167, 3), (228, 3), (56, 2), (67, 2), (91, 2), (96, 2), (109, 2), (122, 2), (127, 2), (134, 2), (140, 2), (168, 2), (176, 2), (200, 2), (211, 2), (240, 2), (242, 2), (247, 2), (43, 1), (44, 1), (53, 1), (61, 1), (68, 1), (79, 1), (84, 1), (92, 1), (101, 1), (103, 1), (104, 1), (107, 1), (121, 1), (126, 1), (129, 1), (132, 1), (137, 1), (149, 1), (151, 1), (153, 1), (156, 1), (165, 1), (170, 1), (171, 1), (175, 1), (186, 1), (188, 1), (192, 1), (197, 1), (206, 1), (207, 1), (208, 1), (209, 1), (210, 1), (215, 1), (223, 1), (235, 1), (236, 1), (253, 1), (0, 0), (1, 0), (2, 0), (3, 0), (4, 0), (5, 0), (6, 0), (7, 0), (8, 0), (9, 0), (10, 0), (11, 0), (12, 0), (13, 0), (14, 0), (15, 0), (16, 0), (17, 0), (18, 0), (19, 0), (20, 0), (21, 0), (22, 0), (23, 0), (24, 0), (25, 0), (26, 0), (27, 0), (28, 0), (29, 0), (30, 0), (31, 0), (32, 0), (33, 0), (34, 0), (35, 0), (36, 0), (37, 0), (38, 0), (39, 0), (40, 0), (41, 0), (42, 0), (45, 0), (46, 0), (47, 0), (48, 0), (49, 0), (50, 0), (51, 0), (52, 0), (54, 0), (55, 0), (57, 0), (58, 0), (59, 0), (60, 0), (62, 0), (63, 0), (64, 0), (65, 0), (66, 0), (69, 0), (70, 0), (71, 0), (72, 0), (73, 0), (74, 0), (75, 0), (76, 0), (77, 0), (78, 0), (80, 0), (81, 0), (82, 0), (83, 0), (85, 0), (86, 0), (87, 0), (88, 0), (89, 0), (90, 0), (93, 0), (94, 0), (95, 0), (97, 0), (98, 0), (99, 0), (100, 0), (102, 0), (105, 0), (106, 0), (108, 0), (110, 0), (111, 0), (112, 0), (113, 0), (114, 0), (115, 0), (116, 0), (117, 0), (118, 0), (119, 0), (120, 0), (123, 0), (124, 0), (125, 0), (128, 0), (130, 0), (131, 0), (133, 0), (135, 0), (136, 0), (138, 0), (139, 0), (141, 0), (142, 0), (143, 0), (144, 0), (145, 0), (146, 0), (147, 0), (148, 0), (150, 0), (152, 0), (154, 0), (159, 0), (160, 0), (161, 0), (162, 0), (163, 0), (164, 0), (166, 0), (172, 0), (173, 0), (174, 0), (177, 0), (178, 0), (179, 0), (180, 0), (181, 0), (182, 0), (183, 0), (185, 0), (187, 0), (189, 0), (190, 0), (193, 0), (194, 0), (195, 0), (196, 0), (199, 0), (201, 0), (202, 0), (203, 0), (204, 0), (214, 0), (216, 0), (217, 0), (218, 0), (221, 0), (222, 0), (224, 0), (225, 0), (226, 0), (229, 0), (230, 0), (231, 0), (232, 0), (233, 0), (237, 0), (238, 0), (239, 0), (243, 0), (244, 0), (245, 0), (246, 0), (249, 0), (250, 0), (251, 0), (252, 0), (254, 0)]`
将占比最多的10个颜色筛选出来
```
# 占比最多的10种颜色
# for j, k in temp[:10]:
# print(j, k)
# 255 625
# 212 365
# 220 186
# 219 135
# 169 132
# 227 116
# 213 115
# 234 21
# 205 18
# 184 15
```
> 2.构造新的无杂质图片
生成一张白底啥都没有的图片
```
# 获取图片大小,生成一张白底255的图片
im2 = Image.new("P", im.size, 255)
# print(im2.size[1])
# (84, 22)
```
原作者自己观察得到代表数字的颜色为220灰色和227红色
![](http://images2015.cnblogs.com/blog/996148/201612/996148-20161208140808413-2029784108.gif)
将这些颜色根据宽和高的坐标以此写入新生成的白底照片中
```
# (84, 22)=(宽,高)=(size[0],size[1])
# 获得y坐标
for y in range(im.size[1]):
# 获得y坐标
for x in range(im.size[0]):
# 获得坐标(x,y)的RGB值
pix = im.getpixel((x, y))
# 这些是要得到的数字
# 220灰色,227红色
if pix == 220 or pix == 227:
# 将黑色0填充到im2中
im2.putpixel((x, y), 0)
# 生成了一张黑白二值照片
# im2.show()
```
`黑白二值照片`
![](http://images2015.cnblogs.com/blog/996148/201612/996148-20161209104616976-1371975816.png)
> 3.切割图片
**x代表图片的宽,y代表图片的高**
对图片进行纵向切割
```
# 纵向切割
# 找到切割的起始和结束的横坐标
inletter = False
foundletter = False
start = 0
end = 0
letters = []
for x in range(im2.size[0]):
for y in range(im2.size[1]):
pix = im2.getpixel((x, y))
if pix != 255:
inletter = True
if foundletter == False and inletter == True:
foundletter = True
start = x
if foundletter == True and inletter == False:
foundletter = False
end = x
letters.append((start, end))
inletter = False
```
打印结果为
`# [(6, 14), (15, 25), (27, 35), (37, 46), (48, 56), (57, 67)]`
(6, 14)代表从x=6到x=14纵向切割成一条状
保存字段到本地观察,这一步没有什么用,只是保存下来看看而已
```
# 保存切割下来的字段
import time
count = 0
for letter in letters:
# (切割的起始横坐标,起始纵坐标,切割的宽度,切割的高度)
im3 = im2.crop((letter[0], 0, letter[1], im2.size[1]))
# 更改成用时间命名
# im3.save("../jpg/%s.gif" % (time.strftime('%Y%m%d%H%M%S', time.localtime())))
count += 1
# 可以看到保存下来的6个字段
```
字段样式
![](http://images2015.cnblogs.com/blog/996148/201612/996148-20161208143321069-1542917297.png)
> 4.训练识别
使用的是 **AI与向量空间图像识别**
`将标准图片转换成向量坐标a,需要识别的图片字段为向量坐标b,cos(a,b)值越大说明夹角越小,越接近重合`
空间两向量计算公式
![](http://images2015.cnblogs.com/blog/996148/201612/996148-20161209092720663-899187747.png)
![](http://images2015.cnblogs.com/blog/996148/201612/996148-20161209093023944-1652411562.png)
编写的夹角公式为
```
# 夹角公式
import math
class VectorCompare:
# 计算矢量大小
# 计算平方和
def magnitude(self, concordance):
total = 0
# concordance.iteritems:报错'dict' object has no attribute 'iteritems'
# concordance.items()
for word, count in concordance.items():
total += count ** 2
return math.sqrt(total)
# 计算矢量之间的 cos 值
def relation(self, concordance1, concordance2):
topvalue = 0
# concordance1.iteritems:报错'dict' object has no attribute 'iteritems'
# concordance1.items()
for word, count in concordance1.items():
# if conco
没有合适的资源?快使用搜索试试~ 我知道了~
验证码机器学习.zip
共139个文件
gif:90个
db:37个
ds_store:10个
需积分: 5 0 下载量 30 浏览量
2024-05-08
10:09:51
上传
评论
收藏 262KB ZIP 举报
温馨提示
验证码机器学习.zip
资源推荐
资源详情
资源评论
收起资源包目录
验证码机器学习.zip (139个子文件)
Thumbs.db 33KB
Thumbs.db 12KB
Thumbs.db 12KB
Thumbs.db 12KB
Thumbs.db 11KB
Thumbs.db 11KB
Thumbs.db 11KB
Thumbs.db 9KB
Thumbs.db 8KB
Thumbs.db 8KB
Thumbs.db 7KB
Thumbs.db 7KB
Thumbs.db 6KB
Thumbs.db 6KB
Thumbs.db 6KB
Thumbs.db 6KB
Thumbs.db 6KB
Thumbs.db 6KB
Thumbs.db 6KB
Thumbs.db 6KB
Thumbs.db 6KB
Thumbs.db 6KB
Thumbs.db 6KB
Thumbs.db 6KB
Thumbs.db 6KB
Thumbs.db 6KB
Thumbs.db 6KB
Thumbs.db 6KB
Thumbs.db 6KB
Thumbs.db 5KB
Thumbs.db 5KB
Thumbs.db 5KB
Thumbs.db 5KB
Thumbs.db 5KB
Thumbs.db 5KB
Thumbs.db 5KB
Thumbs.db 5KB
.DS_Store 10KB
.DS_Store 6KB
.DS_Store 6KB
.DS_Store 6KB
.DS_Store 6KB
.DS_Store 6KB
.DS_Store 6KB
.DS_Store 6KB
.DS_Store 6KB
.DS_Store 6KB
captcha.gif 3KB
9354e249bb3b601c211d11599cee0cf9.gif 1002B
f425a6b30ab315c1e6174541e3c89c9d.gif 1002B
185ab4dd9bc1148ac16aaacb14f330e8.gif 990B
5.gif 968B
aeb537a5d1465a2fb489106f1c3f62a2.gif 963B
7a55275ed1f0605bf894399850ea791a.gif 952B
67b4b88e78441cf04933dff353996186.gif 928B
1.gif 928B
39cc48b101c99761d08d4f37ce1e14be.gif 927B
51da1a83211b3d0ac12287e6cc88f18f.gif 926B
2.gif 925B
4.gif 925B
167f01b37e1c5f297030c52001c88914.gif 924B
2.gif 923B
93d88f2b5c98b79b2fba2f230a3ce56d.gif 921B
99e577bc7a75f5aeb55d0c109351478f.gif 920B
4.gif 916B
f4556d87f5e1ed0f267404670270d885.gif 914B
8cd6e8039c58a9e5613e886ba7cdda5d.gif 912B
0.gif 911B
37a6e9572b340bec0320cfd1d680925c.gif 910B
4.gif 910B
d147cc89a891f97773f93d94cdff76d5.gif 910B
3.gif 910B
d8150da674a65a6cfe194052a825383c.gif 909B
2c2a6482f89ad2f2d6f0b6ab5e77b4ad.gif 909B
ad198f0b2430d4d3a1eb7fcd7e2db9d5.gif 909B
b215a9c9972ff9a4635ec98fc2a8613f.gif 907B
5.gif 907B
999702f4e904471b8fbf6052eb1f804e.gif 906B
1.gif 906B
1.gif 906B
6f0dd3827d64a5f0d55343fa07aeb665.gif 906B
c6885c9577c2a3c64e4de5b55ee90f40.gif 903B
d3fc8a15ac7e0664cca53766d322814d.gif 903B
9f988d2da232e73bcfaa89d56ee8e1dc.gif 902B
3.gif 901B
2934db4192b933657baafaac5ea08c3d.gif 900B
ebd34ada21fb6ca1d39cc12829a405ae.gif 900B
136127c6fc10fd0e0853e7d747efa41d.gif 898B
981ca6fe34eaf28511810a47f7465c5e.gif 898B
9a39d6d490c1f7ad67e7f62ad22c66ff.gif 897B
056ffb1fd397877d0848eb6cf9b839f3.gif 894B
6.gif 894B
0.gif 893B
0.gif 892B
2.gif 892B
4.gif 891B
5.gif 891B
3.gif 891B
d2730e33be7df962dea138bf5d76ebc1.gif 890B
826436717e96bad5434c46c86a35bf93.gif 890B
共 139 条
- 1
- 2
资源评论
生瓜蛋子
- 粉丝: 3829
- 资源: 6047
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功