【免费】基于国内大模型评测数据集factool的UI.zip资源-CSDN文库

共43个文件

ts：11个

tsx：9个

jsonl：6个

AI源码

需积分: 0 83 浏览量 2023-12-24 13:14:04 上传评论收藏 1.01MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

基于国内大模型评测数据集 factool 的 UI.zip （43个子文件）

openopenopenopenopenopenopen

yarn.lock 158KB

prisma

schema.prisma 425B

tailwind.config.js 2KB

components.json 333B

.env.example 630B

src

pages

_app.tsx 361B

index.tsx 2KB

api

trpc

[trpc].ts 543B

env.mjs 1KB

styles

globals.css 2KB

ds.ts 254B

utils

api.ts 2KB

shadcn.ts 167B

components

factool-main.tsx 5KB

factool-sub.tsx 2KB

input.tsx 835B

button.tsx 2KB

dropdown-menu.tsx 7KB

label.tsx 727B

table.tsx 3KB

types

global.ts 73B

server

api

trpc.ts 3KB

root.ts 481B

routers

factool.ts 677B

example.ts 410B

datasets

math

math.jsonl 95KB

scientific

scientific.jsonl 116KB

chinese

factool_output.jsonl 2.41MB

dataset_chinese.jsonl 566KB

README.md 2KB

knowledge_qa

knowledge_qa.jsonl 38KB

code

code.jsonl 105KB

db.ts 399B

postcss.config.cjs 107B

package.json 2KB

public

favicon.ico 15KB

next.config.mjs 673B

tailwind.config.ts 171B

prettier.config.mjs 177B

.eslintrc.cjs 391B

.gitignore 598B

tsconfig.json 761B

README.md 460B

## ChineseFactEval: A Factuality Benchmark for Chinese LLMs We release a dataset designed for benchmarking the factuality of Chinese LLMs. Please check out our full [report](https://GAIR-NLP.github.io/ChineseFactEval). This benchmark contains 125 prompts across 7 scenarios: general domain, scientific research, medical, law, finance, math, and Chinese modern history. We evaluate six Chinese LLMs, including Yiyan (文心一言), Doubao (豆包), Baichuan （百川), ChatGlm （智谱清言), Sensetime (商量), and ABAB, alongside GPT-4, using our benchmark. The annotations were done collectively by the authors of the report. We also leverage Factool, a tool augmented framework for detecting factual errors of texts generated by LLMs, to assist us in annotating the complex responses in medical and law domains. The results from Factool have also been made available. ## Factuality Leaderboard for Chinese LLMs Our factuality leaderboard shows the factuality of different chatbots in different scenarios. | LLMs | General | Scientific Research | Medical | Law | Finance | Math | Chinense Modern History | Total | | -------- | -------- | -------- | -------- | ---------------- | -------- | ------------------ | ------ | --------------------- | | GPT-4 | ****61/94**** | ****13.5 / 21**** | 9 / 20 | 19/47 | ****12 / 21**** | 26 / 52 | ****43 / 46**** | ****183.5 / 301****| | Doubao (豆包） | 49/94 | 3 / 21 | ****12 / 20**** | ****20 / 47**** | 7 / 21 | 11 / 52 | 37 / 46 | 139 / 301 | | Yiyan (文心一言) | 34/94 | 6 / 21 | 8 / 20 | 13 / 47 | 1.5 / 21 | ****37 / 52**** | 23 / 46 | 122.5 / 301 | | Sensetime (商量) | 44/94 | 3 / 21 | 6 / 20 | 12 / 47 | 7 / 21 | 0 / 52 | 31.5 / 46 | 103.5 / 301 | | ChatGlm （智谱清言) | 27.5/94 | 0 / 21 | 3 / 20 | 13 / 47 | 8 / 21 | 15 / 52 | 23 / 46 | 89.5 / 301 | | ABAB | 34.5/94 | 1.5 / 21 | 9 / 20 | 15 / 47 | 5 / 21 | 6 / 52 | 6 / 46 | 77 / 301 | | Baichuan (百川) | 18/94 | 0 / 21 | 3 / 20 | 7 / 47 | 3 / 21 | 2 / 52 | 30.5 / 46 | 63.5 / 301 |

评论收藏

内容反馈