说明文档

模型卡片

模型描述

本模型是基于 distilbert/distilbert-base-cased 在 finer-139 数据集的子集上微调而成的版本。

它在测试集（选定的子集，选择过程如下所述）上取得了以下结果：

	precision	recall	f1-score	support
DebtInstrumentBasisSpreadOnVariableRate1	0.35	1.00	0.51	9
DebtInstrumentFaceAmount	0.06	0.46	0.10	13
DebtInstrumentInterestRateStatedPercentage	0.11	1.00	0.19	8
LineOfCreditFacilityMaximumBorrowingCapacity	0.11	0.62	0.19	13

训练和评估数据

本模型使用了原始 finer-139 数据集的一个子集进行构建。

获取该子集的步骤如下：

不使用原始的 139 个实体集合，仅涵盖 4 个实体的子集，即：
- DebtInstrumentInterestRateStatedPercentage
- LineOfCreditFacilityMaximumBorrowingCapacity
- DebtInstrumentBasisSpreadOnVariableRate1
- DebtInstrumentFaceAmount
选择上述 4 个实体是因为它们是原始数据集中最常见的实体。原始数据集中的任何其他实体将被视为 "O"。
数据集中超过 200 个 token（词）的记录被移除。（剩余数据已覆盖大多数情况。）
不包含任何实体的记录被移除。

以上三个步骤对 finer-139 数据集的 "train" 和 "validation" 部分均已执行。但对于 "test" 集，未执行步骤 3，因为我们仍然希望看到微调后的模型如何处理更泛化的情况。

然而，出于速度考虑，我们在实验中随机选取了 1000 条记录来构建一个较小的测试集。

训练过程

训练超参数

训练期间使用了以下超参数：

seed: 42
learning_rate: 2e-5
per_device_train_batch_size: 8
per_device_eval_batch_size: 8
num_train_epochs: 3
weight_decay: 0.01

训练结果

Epoch	Training Loss	Validation Loss	Precision	Recall	F1	Accuracy
1	No log	0.056148	0.715729	0.796148	0.753799	0.980751
2	0.093300	0.059250	0.781487	0.826645	0.803432	0.982387
3	0.017500	0.064857	0.785185	0.850722	0.816641	0.983058

库版本（模型开发期间）

jupyterlab: 4.3.5
transformers: 4.48.3
torch: 2.6.0
datasets: 3.3.2
pandas: 2.2.3
matplotlib: 3.10.1
seaborn: 0.13.2
seqeval: 1.2.2
evaluate: 0.4.3
accelerate: 1.5.1
scikit-learn: 1.6.1
onnxruntime: 1.21.0
onnx: 1.17.0
optimum['exporters']: 1.24.0

如何使用该模型

原始 PyTorch 模型

示例用法

from transformers import pipeline

ner_pipeline = pipeline('token-classification', model='superbean/distilbert-base-cased-finer-finetuned')

test_text = "( 3 ) In February 2020 , the committed amount under the facility was temporarily increased $ 75.0 million to $ 150.0 million , which expires on May 29 , 2020 ."

result = ner_pipeline(test_text)

ONNX 模型

示例用法

from optimum.pipelines import pipeline
from optimum.onnxruntime import ORTModelForTokenClassification

ort_model = ORTModelForTokenClassification.from_pretrained('superbean/distilbert-base-cased-finer-finetuned', subfolder='onnx')
ort_ner_pipeline = pipeline('token-classification', model=ort_model, accelerator='ort')

test_text = "( 3 ) In February 2020 , the committed amount under the facility was temporarily increased $ 75.0 million to $ 150.0 million , which expires on May 29 , 2020 ."

result = ort_ner_pipeline(test_text)

对于以上两种使用方式，你可以预期 result 的输出类似于以下内容：

# [
#     {
#         'entity': 'B-LineOfCreditFacilityMaximumBorrowingCapacity',
#         'score': np.float32(0.5120859),
#         'index': 18,
#         'word': '75',
#         'start': 93,
#         'end': 95
#     }, {
#         'entity': 'I-LineOfCreditFacilityMaximumBorrowingCapacity',
#         'score': np.float32(0.68770176),
#         'index': 19,
#         'word': '.',
#         'start': 95,
#         'end': 96
#     }, ..., {
#         'entity': 'I-LineOfCreditFacilityMaximumBorrowingCapacity',
#         'score': np.float32(0.9414747),
#         'index': 26,
#         'word': '0',
#         'start': 115,
#         'end': 116
#     }
# ]

superbean/distilbert-base-cased-finer-finetuned

作者 superbean

token-classification transformers

↓ 1 ♥ 0

创建时间: 2025-03-14 23:35:44+00:00

更新时间: 2025-03-16 07:23:44+00:00

在 Hugging Face 上查看

文件 (14)

.gitattributes

README.md

config.json

model.safetensors

onnx/config.json

onnx/model.onnx ONNX

onnx/special_tokens_map.json

onnx/tokenizer.json

onnx/tokenizer_config.json

onnx/vocab.txt

special_tokens_map.json

tokenizer.json

tokenizer_config.json

vocab.txt