说明文档

NCI 二元宣传检测器 v2

本模型是 NCI（叙事控制指数）两阶段宣传检测流水线的第一阶段。它执行二元分类，用于检测文本是否包含任何宣传技巧。

模型描述

模型类型： 二元文本分类器
基础模型： answerdotai/ModernBERT-base
训练数据： synapti/nci-binary-classification（24,517 训练，1,727 验证，1,729 测试）
语言： 英语
许可证： Apache 2.0

性能指标

指标	数值
准确率	99.4%
精确率	98.9%
召回率	100.0%
F1 分数	99.4%
假阳性率	1.47%
假阴性率	0.00%

混淆矩阵（测试集，n=1,729）

                  预测结果
                  无宣传 | 有宣传
实际无宣传:          736  |     11
实际有宣传:           0  |    982

阈值分析

阈值	准确率	精确率	召回率	F1
0.3	99.2%	98.6%	100%	99.3%
0.4	99.2%	98.7%	100%	99.3%
0.5	99.4%	98.9%	100%	99.4%
0.6	99.7%	99.4%	100%	99.7%
0.7	99.7%	99.5%	100%	99.7%

推荐阈值： 0.5（默认）或 0.6 以降低假阳性

训练详情

损失函数： Focal Loss（gamma=2.0，alpha=0.25）用于处理类别不平衡
优化器： AdamW，权重衰减 0.01
学习率： 2e-5，预热比例 0.1
批量大小： 16（梯度累积有效为 32）
轮次： 5，提前停止（耐心值=3）
最佳模型选择： 基于验证集上的 F1 分数

使用方法

使用 Transformers Pipeline

from transformers import pipeline

detector = pipeline(
    "text-classification",
    model="synapti/nci-binary-detector-v2"
)

result = detector("The radical left is DESTROYING our country!")
# [{"label": "has_propaganda", "score": 0.99}]

result = detector("The Federal Reserve announced a 0.25% rate increase.")
# [{"label": "no_propaganda", "score": 0.98}]

使用 AutoModel

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model = AutoModelForSequenceClassification.from_pretrained("synapti/nci-binary-detector-v2")
tokenizer = AutoTokenizer.from_pretrained("synapti/nci-binary-detector-v2")

text = "Wake up, people! They are hiding the truth from you!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=1)
    propaganda_prob = probs[0, 1].item()

print(f"Propaganda probability: {propaganda_prob:.2%}")

两阶段流水线（推荐）

要进行完整的宣传分析和技巧识别：

from transformers import pipeline

# 阶段 1：二元检测
binary_detector = pipeline(
    "text-classification",
    model="synapti/nci-binary-detector-v2"
)

# 阶段 2：技巧分类
technique_classifier = pipeline(
    "text-classification",
    model="synapti/nci-technique-classifier-v2",
    top_k=None
)

text = "Some text to analyze..."

# 运行阶段 1
binary_result = binary_detector(text)[0]
if binary_result["label"] == "has_propaganda" and binary_result["score"] >= 0.5:
    # 仅在检测到宣传时运行阶段 2
    techniques = technique_classifier(text)[0]
    detected = [t for t in techniques if t["score"] >= 0.3]
    print(f"Detected techniques: {[t['label'] for t in detected]}")
else:
    print("No propaganda detected")

标签 ID	标签名称	描述
0	no_propaganda	文本不包含宣传技巧
1	has_propaganda	文本包含一个或多个宣传技巧

预期用途

主要用例

媒体素养工具和浏览器扩展
内容审核辅助
信息操纵研究
批判性思维教育平台

不在范围内

审查或自动内容删除
政治目标定位或监控
单一来源真相判定

局限性

针对英语文本进行了优化
对极短文本（<10 个词）的性能可能降低
主要在政治/新闻内容上训练；领域偏移可能影响性能
应作为多个信号之一使用，而非唯一判断依据

引用

如果您使用此模型，请引用：

@misc{nci-binary-detector-v2,
  author = {Synapti},
  title = {NCI Binary Propaganda Detector v2},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/synapti/nci-binary-detector-v2}
}

synapti/nci-binary-detector-v2

作者 synapti

text-classification transformers

↓ 62.2K ♥ 0

创建时间: 2025-12-10 13:25:06+00:00

更新时间: 2025-12-10 15:07:08+00:00

在 Hugging Face 上查看

文件 (13)

.gitattributes

README.md

config.json

model.safetensors

onnx/config.json

onnx/model.onnx ONNX

onnx/special_tokens_map.json

onnx/tokenizer.json

onnx/tokenizer_config.json

special_tokens_map.json

tokenizer.json

tokenizer_config.json

training_args.bin