ONNX 模型库
返回模型

说明文档

土耳其语命名实体识别(NER)模型

该模型是基于 "dbmdz/bert-base-turkish-cased" 精调后的模型,使用了知名土耳其语 NER 数据集的审核版本(https://github.com/stefan-it/turkish-bert/files/4558187/nerdata.txt)。

精调参数:

task = "ner"
model_checkpoint = "dbmdz/bert-base-turkish-cased"
batch_size = 8 
label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']
max_length = 512 
learning_rate = 2e-5 
num_train_epochs = 3 
weight_decay = 0.01 

使用方法:

model = AutoModelForTokenClassification.from_pretrained("akdeniz27/bert-base-turkish-cased-ner")
tokenizer = AutoTokenizer.from_pretrained("akdeniz27/bert-base-turkish-cased-ner")
ner = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="first")
ner("your text here")

请参阅 "https://huggingface.co/transformers/_modules/transformers/pipelines/token_classification.html" 了解如何使用 aggregation_strategy 参数进行实体分组。

参考测试结果:

  • 准确率:0.9933935699477056
  • F1:0.9592969472710453
  • 精确率:0.9543530277931161
  • 召回率:0.9642923563325274

使用 "Küçük, D., Küçük, D., Arıcı, N. 2016. Türkçe Varlık İsmi Tanıma için bir Veri Kümesi ("A Named Entity Recognition Dataset for Turkish"). IEEE Sinyal İşleme, İletişim ve Uygulamaları Kurultayı. Zonguldak, Türkiye." 论文中提出的测试集进行的评估结果。

  • 测试集 准确率 精确率 召回率 F1分数
  • 20010000 0.9946 0.9871 0.9463 0.9662
  • 20020000 0.9928 0.9134 0.9206 0.9170
  • 20030000 0.9942 0.9814 0.9186 0.9489
  • 20040000 0.9943 0.9660 0.9522 0.9590
  • 20050000 0.9971 0.9539 0.9932 0.9732
  • 20060000 0.9993 0.9942 0.9942 0.9942
  • 20070000 0.9970 0.9806 0.9439 0.9619
  • 20080000 0.9988 0.9821 0.9649 0.9735
  • 20090000 0.9977 0.9891 0.9479 0.9681
  • 20100000 0.9961 0.9684 0.9293 0.9485
  • 总体 0.9961 0.9720 0.9516 0.9617

akdeniz27/bert-base-turkish-cased-ner

作者 akdeniz27

token-classification transformers
↓ 132.8K ♥ 25

创建时间: 2022-03-02 23:29:05+00:00

更新时间: 2024-06-18 09:42:03+00:00

在 Hugging Face 上查看

文件 (10)

.gitattributes
README.md
config.json
model.onnx ONNX
model.safetensors
pytorch_model.bin
special_tokens_map.json
tokenizer.json
tokenizer_config.json
vocab.txt