说明文档

tsilva/clinical-field-mapper-causal_lm 模型卡片

该模型是基于 tsilva/clinical-field-mappings 数据集对 distilbert/distilgpt2 进行微调的版本。其目的是将医疗保健数据库列名规范化为一组标准化的目标列名。

任务

这是一个因果语言模型，旨在将自由文本字段名称映射到标准化的架构术语。

使用方法

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tsilva/clinical-field-mapper-causal_lm")
model = AutoModelForCausalLM.from_pretrained("tsilva/clinical-field-mapper-causal_lm")

def predict(input_text):
    inputs = tokenizer(input_text + "|", return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=50)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))

predict('cardi@')

评估结果

训练准确率: 98.24%
验证准确率: 89.84%
测试准确率: 89.35%

训练详情

随机种子: 42
计划轮数: 50
完成轮数: 14
触发早停: 是
最终训练损失: 1.3344
最终评估损失: 1.1981
优化器: adamw_bnb_8bit
学习率: 0.0005
批次大小: 512
精度: fp16
启用 DeepSpeed: 是
梯度累积步数: 1

许可证

在此处指定您的许可证（例如：Apache 2.0、MIT 等）

局限性与偏见

模型是在特定的临床映射数据集上训练的。
对于分布外的列名，性能可能会有所不同。
请确保在生产环境中验证模型输出。

tsilva/clinical-field-mapper-causal_lm

作者 tsilva

text-generation transformers

↓ 0 ♥ 0

创建时间: 2025-05-05 16:54:19+00:00

更新时间: 2025-05-05 17:39:53+00:00

在 Hugging Face 上查看

文件 (20)

.gitattributes

README.md

added_tokens.json

artifacts/2025-05-05T17-01-24Z/2025-05-05T17-01-24Z.zip

artifacts/2025-05-05T17-01-24Z/config.json

artifacts/2025-05-05T17-01-24Z/requirements.txt

artifacts/2025-05-05T17-39-49Z/2025-05-05T17-39-49Z.zip

artifacts/2025-05-05T17-39-49Z/config.json

artifacts/2025-05-05T17-39-49Z/requirements.txt

config.json

evaluation_report.json

generation_config.json

merges.txt

model.safetensors

onnx/model.onnx ONNX

special_tokens_map.json

tokenizer.json

tokenizer_config.json

training_args.bin

vocab.json