说明文档

土耳其语命名实体识别（NER）模型

该模型是基于 "dbmdz/bert-base-turkish-cased" 精调后的模型，使用了知名土耳其语 NER 数据集的审核版本（https://github.com/stefan-it/turkish-bert/files/4558187/nerdata.txt）。

精调参数：

task = "ner"
model_checkpoint = "dbmdz/bert-base-turkish-cased"
batch_size = 8 
label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']
max_length = 512 
learning_rate = 2e-5 
num_train_epochs = 3 
weight_decay = 0.01

使用方法：

model = AutoModelForTokenClassification.from_pretrained("akdeniz27/bert-base-turkish-cased-ner")
tokenizer = AutoTokenizer.from_pretrained("akdeniz27/bert-base-turkish-cased-ner")
ner = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="first")
ner("your text here")

请参阅 "https://huggingface.co/transformers/_modules/transformers/pipelines/token_classification.html" 了解如何使用 aggregation_strategy 参数进行实体分组。

参考测试结果：

准确率：0.9933935699477056
F1：0.9592969472710453
精确率：0.9543530277931161
召回率：0.9642923563325274

使用 "Küçük, D., Küçük, D., Arıcı, N. 2016. Türkçe Varlık İsmi Tanıma için bir Veri Kümesi ("A Named Entity Recognition Dataset for Turkish"). IEEE Sinyal İşleme, İletişim ve Uygulamaları Kurultayı. Zonguldak, Türkiye." 论文中提出的测试集进行的评估结果。

测试集准确率精确率召回率 F1分数
20010000 0.9946 0.9871 0.9463 0.9662
20020000 0.9928 0.9134 0.9206 0.9170
20030000 0.9942 0.9814 0.9186 0.9489
20040000 0.9943 0.9660 0.9522 0.9590
20050000 0.9971 0.9539 0.9932 0.9732
20060000 0.9993 0.9942 0.9942 0.9942
20070000 0.9970 0.9806 0.9439 0.9619
20080000 0.9988 0.9821 0.9649 0.9735
20090000 0.9977 0.9891 0.9479 0.9681
20100000 0.9961 0.9684 0.9293 0.9485
总体 0.9961 0.9720 0.9516 0.9617

akdeniz27/bert-base-turkish-cased-ner

作者 akdeniz27

token-classification transformers

↓ 132.8K ♥ 25

创建时间: 2022-03-02 23:29:05+00:00

更新时间: 2024-06-18 09:42:03+00:00

在 Hugging Face 上查看

文件 (10)

.gitattributes

README.md

config.json

model.onnx ONNX

model.safetensors

pytorch_model.bin

special_tokens_map.json

tokenizer.json

tokenizer_config.json

vocab.txt