Argo是一个自动化扫描器爬虫用于自动化获取网站的URL基于go-rod实现了静态和动态结合的方式来实现资源-CSDN文库

共60个文件

go：35个

jpg：5个

yml：4个

版权申诉

73 浏览量 2024-05-24 11:06:07 上传评论收藏 2.19MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

Argo is an automated general crawler for automatically obtaining website URLs . Argo 是一个自动化扫描器爬虫用于自动化获取网站的URL 基于go-rod实现了静态和动态结合的方式来实现.zip （60个子文件）

content

go.mod 1KB

.github

workflows

build.yml 1KB

go.sum 8KB

LICENSE 35KB

headless

dvwa.yml 487B

configs

config.yml 541B

cmd

argo.go 7KB

docs

测试.md 75B

README_EN.md 7KB

build

test.sh 67B

build.sh 104B

pkg

utils

zip.go 2KB

file.go 1KB

page.go 534B

md5.go 146B

base64.go 140B

other.go 436B

date.go 119B

vector

vector.go 1KB

inject

after

open_hook.js 807B

close_hook.js 332B

after_test.js 30B

auto.js.bak 5KB

auto.go 6KB

inject.go 2KB

before

before_test.js 31B

req

req.go 3KB

engine

normalize.go 4KB

tab.go 8KB

filter_test.go 3KB

filter.go 2KB

template.go 1KB

result.go 5KB

engine.go 11KB

updateself

update.go 5KB

static

robotstxt.go 1KB

robotstxt_test.go 919B

notfoud.go 342B

parse_test.go 320B

sitemapxml.go 1KB

regex.go 589B

metadata.go 317B

sitemapxml_test.go 968B

parse.go 6KB

conf

conf.go 6KB

playback

playback.go 2KB

log

log.go 510B

format.go 1KB

match.go 7KB

test

docker-compose.yml 210B

.gitignore 63B

imgs

debug.jpg 148KB

leakless.png 74KB

Argo交流群.jpg 487KB

result_html.jpg 71KB

result_excel.jpg 124KB

logo.jpg 84KB

demo.gif 1.28MB

README.md 9KB

# Argo <div align=center><img width="250" height="250" src="imgs/logo.jpg"/></div> 中文 | [English ](./README_EN.md) 基于go-rod的自动化通用爬虫用于自动化获取网站的URL 肯定也是基于无头浏览器实现的 ## 功能支持如下 1. 智能触发页面事件比如点击后有新增的dom 会优先进行处理 2. 智能登录网站暂不支持有验证码的情况 3. 支持hook全流量通过go-rod的 HijackRequests 获取浏览器的全部流量输出请求及响应内容 4. 对URL进行去重最后输出存储的都是去重后的 5. 支持多格式结果输出 txt、json、xlsx、html 6. 支持回放yaml格式的脚本会按照顺序执行操作 7. 支持开启浏览器界面支持debug输出 8. 支持代理 9. 支持url深度层数控制 10. 支持控制是否存储完整请求响应base64字符串 json格式 11. 支持程序自动升级 12. 支持指定远程浏览器本地浏览器 ## 安装可以直接从这里下载最新版 https://github.com/Ciyfly/Argo/releases 不需要手动下载 chrome 直接运行程序会自动下载chrome ```yaml ./argo -h NAME: argo - -t http://testphp.vulnweb.com/ USAGE: argo [global options] command [command options] [arguments...] VERSION: v1.0 AUTHOR: Recar <https://github.com/Ciyfly> COMMANDS: help, h Shows a list of commands or help for one command GLOBAL OPTIONS: --help, -h show help --version, -v print the version Browser --slow value The default delay time for operating after enabling (default: 1000) --trace Display operation elements after interface opens? (default: false) Config --browsertimeout value Set max browser run time, close if limit exceeded. Unit is seconds. (default: 900) --chrome value Specify the Chrome executable path, e.g. --chrome /opt/google/chrome/chrome --maxdepth value Scrape web content with increasing depth by crawling URLs, stop at max depth. (default: 5) --remote value Specify remote Chrome address, e.g. --remote http://127.0.0.1:3000 --tabcount value, -c value The maximum number of tab pages that can be opened (default: 10) --tabtimeout value Set max tab run time, close if limit exceeded. Unit is seconds. (default: 15) Data --email value Default email if logging in. (default: "argo@recar.com") --password value, -p value Default password if logging in. (default: "argo123") --phone value Default phone if logging in. (default: "18888888888") --username value, -u value Default username if logging in. (default: "argo") Debug --debug Output debug info? (default: false) --dev Enable dev mode, activates browser interface and stops after page access for dev purposes. (default: false) --testplayback irectly end if open, after specified playback script execution. (default: false) --unheadless, --uh Default interface disabled? Use 'uh' to enable it. (default: false) OutPut --format value Output format separated by commas, txt, json, xlsx, html supported. (default: "txt,json") --outputdir value save output to directory --quiet Enable quiet mode to output only the URL information that has been retrieved, in JSON format (default: false) --save value Result saved as 'target' by default. Use '--save test' to save as 'test'. Update --update update self (default: false) Use --norrs No storage of req-res strings, saves memory, suitable for large scans. (default: false) --playback value Support replay like headless YAML scripts --proxy value Set up a proxy, for example, http://127.0.0.1:3128 --target value, -t value Specify the entry point for testing --targetsfile value, -f value The file list has targets separated by new lines, like other tools we've used before. ``` ## 运行 ### 测试 http://testphp.vulnweb.com/ ```shell ./argo -t http://testphp.vulnweb.com/ --format txt ``` ![](imgs/demo.gif) ### 测试 DVWA 需要登录的 ```shell ./argo -t http://192.168.192.128:8080/ -u admin -p password --format txt ``` ![](imgs/dvwa.gif) ### 配置代理 ```shell ./argo -t http://testphp.vulnweb.com/ --format txt --proxy http://127.0.0.1:3128 ./argo -t http://testphp.vulnweb.com/ --format txt --proxy http://username:password@127.0.0.1:3128 ``` ### 使用 playback 实现dvwa的登录 ```shell ./argo -t http://192.168.192.128:8080/ --playback headless/dvwa.yml --format txt ``` ### 通过 -f 指定目标文件即多个target 目前是按顺序单个目标的执行永远是一个浏览器在运行如果有需要登录的记得增加用户名密码参数目前只支持单个 ```shell cat targets.txt http://testphp.vulnweb.com/ http://192.168.192.128:8080/ # run argo ./argo -f targets.txt --format txt ``` ### 指定浏览器加了两个参数一个是指定本地下载好的浏览器一个是指定远程浏览器远程浏览器可以使用 https://github.com/browserless/chrome 然后运行容器监听端口 argo配置即可 ``` # 指定本地浏览器路径 ./argo -t http://192.168.192.128:8080/ --chrome chrome_path # 指定远程浏览器ip 端口 ./argo -t http://192.168.192.128:8080/ --remote http://127.0.0.1:3000 ``` ### 设置浏览器超时时间页面超时时间浏览器默认超时时间 900s ### 支持控制事件触发间隔 --slow 默认是1000ms 即1s 事件如输入点击后会等待间隔时间后再继续触发 ```shell ./argo -t http://192.168.192.128:8080/ --slow ``` ### 查看浏览器界面 --uh 指定 --uh 参数程序运行就会显示浏览器界面可以用调试对应的可以开启 trace 参数来跟着事件触发的元素 ```shell ./argo -t http://192.168.192.128:8080/ --uh ``` ### 控制不存储请求响应的base64字符串存储的话会消耗内存降低性能 ``` ./argo -t http://192.168.192.128:8080/ --norrs ``` ### url深度层数控制默认是3 超过最大深度就会抛弃这个url ``` ./argo -t http://192.168.192.128:8080/ --maxdepth 层数 ``` ### 程序升级升级会去github判断版本对比自动下载新版本根据平台自动判断下载较慢的话可以选择手动下载的方式 ``` ./argo -t http://192.168.192.128:8080/ --update ``` ### debug输出 ```shell ./argo -t http://192.168.192.128:8080/ --debug ``` debug输出会输出详细的泛化去重解析url等信息如下图 ![](imgs/debug.jpg) ### 支持多种输出格式例如 html输出结果如下 ![](imgs/result_html.jpg) excel表格输出结果如下 ![](imgs/result_excel.jpg) ## 说明是w8ay师傅知识星球的作业也是我最近工作相关的于是就做了这个程序是基于各位大佬的基础上进行设计和实现当然有任何问题欢迎提 issus 或者跟我联系目前程序还有很多地方可以完善这种程序肯定是需要时间和测试来打磨的下一步准备测试程序去逼近自动化能完成的以及下一步准备更好的支持web2.0的网站 ## 参考 http://blog.fatezero.org/2018/04/09/web-scanner-crawler-02/ https://pkg.go.dev/github.com/go-rod/go-rod-chinese https://chat.openai.com/ ## FAQ 如果运行出现杀毒报毒如图说 leakless.exe 有问题可以信任他这是 go-rod用来控制chrome进程遗留问题的源码在这里 https://github.com/ysmood/leakless 当然也可以自己编译替换 ![](imgs/leakless.png) argo的编译后的程序是 github action 自动编译的当然可以自己编译如果第一次运行报错 error while loading shar

评论收藏

内容反馈

版权申诉