返回模型
说明文档
土耳其语命名实体识别(NER)模型
该模型是基于 "dbmdz/bert-base-turkish-cased" 精调后的模型,使用了知名土耳其语 NER 数据集的审核版本(https://github.com/stefan-it/turkish-bert/files/4558187/nerdata.txt)。
精调参数:
task = "ner"
model_checkpoint = "dbmdz/bert-base-turkish-cased"
batch_size = 8
label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']
max_length = 512
learning_rate = 2e-5
num_train_epochs = 3
weight_decay = 0.01
使用方法:
model = AutoModelForTokenClassification.from_pretrained("akdeniz27/bert-base-turkish-cased-ner")
tokenizer = AutoTokenizer.from_pretrained("akdeniz27/bert-base-turkish-cased-ner")
ner = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="first")
ner("your text here")
请参阅 "https://huggingface.co/transformers/_modules/transformers/pipelines/token_classification.html" 了解如何使用 aggregation_strategy 参数进行实体分组。
参考测试结果:
- 准确率:0.9933935699477056
- F1:0.9592969472710453
- 精确率:0.9543530277931161
- 召回率:0.9642923563325274
- 测试集 准确率 精确率 召回率 F1分数
- 20010000 0.9946 0.9871 0.9463 0.9662
- 20020000 0.9928 0.9134 0.9206 0.9170
- 20030000 0.9942 0.9814 0.9186 0.9489
- 20040000 0.9943 0.9660 0.9522 0.9590
- 20050000 0.9971 0.9539 0.9932 0.9732
- 20060000 0.9993 0.9942 0.9942 0.9942
- 20070000 0.9970 0.9806 0.9439 0.9619
- 20080000 0.9988 0.9821 0.9649 0.9735
- 20090000 0.9977 0.9891 0.9479 0.9681
- 20100000 0.9961 0.9684 0.9293 0.9485
- 总体 0.9961 0.9720 0.9516 0.9617
akdeniz27/bert-base-turkish-cased-ner
作者 akdeniz27
token-classification
transformers
↓ 132.8K
♥ 25
创建时间: 2022-03-02 23:29:05+00:00
更新时间: 2024-06-18 09:42:03+00:00
在 Hugging Face 上查看文件 (10)
.gitattributes
README.md
config.json
model.onnx
ONNX
model.safetensors
pytorch_model.bin
special_tokens_map.json
tokenizer.json
tokenizer_config.json
vocab.txt