说明文档

language:

ko license: apache-2.0 tags:
video-classification
driver-behavior-detection
swin-transformer
video-swin
pytorch datasets:
custom metrics:
accuracy
f1 pipeline_tag: video-classification model-index:
name: driver-behavior-swin3d-t results:
- task: type: video-classification name: Video Classification metrics:
  - type: accuracy value: 0.9805 name: Accuracy
  - type: f1 value: 0.9757 name: Macro F1

驾驶员行为检测模型 (第7轮)

基于 Video Swin Transformer 的驾驶员异常行为检测模型。

模型描述

架构: Video Swin Transformer Tiny (swin3d_t)
骨干网络预训练: Kinetics-400
参数量: 27.85M
输入: [B, 3, 30, 224, 224] (批次, 通道, 帧数, 高度, 宽度)

类别 (5类)

标签	类别	F1分数
0	正常	0.97
1	疲劳驾驶	0.99
2	翻找物品	0.96
3	使用手机	0.96
4	驾驶员被攻击	1.00

性能 (第7轮)

指标	数值
准确率	98.05%
宏平均F1	0.9757
验证样本数	1,371,062

训练配置

参数	数值
硬件	2x NVIDIA RTX A6000 (48GB)
分布式	DDP (分布式数据并行)
批次大小	32 (16 x 2 GPU)
梯度累积	4
有效批次	128
优化器	AdamW (lr=1e-3, wd=0.05)
调度器	OneCycleLR
混合精度	FP16
损失函数	交叉熵 + 标签平滑 (0.1)
正则化	Mixup (a=0.4), Dropout (0.3)

文件

文件	大小	描述
`pytorch_model.bin`	121 MB	PyTorch权重 (FP32)
`model.onnx`	164 MB	用于移动端部署的ONNX模型
`config.json`	1.2 KB	模型配置
`model.py`	6.9 KB	模型架构代码
`convert_coreml_macos.py`	2.2 KB	CoreML转换脚本

平台特定用法

PyTorch (服务器/桌面端)

import torch
from model import DriverBehaviorModel

model = DriverBehaviorModel(num_classes=5, pretrained=False)
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
model.load_state_dict(checkpoint["model"])
model.eval()

iOS (CoreML)

将 model.onnx 复制到 macOS
运行转换脚本:

python convert_coreml_macos.py

将生成的 DriverBehavior.mlpackage 添加到 Xcode 项目中

Android (ONNX Runtime)

// build.gradle
implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'

// Kotlin
val session = OrtEnvironment.getEnvironment()
    .createSession(assetManager.open("model.onnx").readBytes())

val output = session.run(mapOf("video_input" to inputTensor))

预处理 (所有平台)

输入形状: [1, 3, 30, 224, 224]  (批次, 通道, 帧数, 高度, 宽度)
通道顺序: RGB
归一化: (像素值 / 255.0 - 均值) / 标准差
  - 均值 = [0.485, 0.456, 0.406]
  - 标准差 = [0.229, 0.224, 0.225]
调整大小: 224x224 (双线性插值)
帧数: 均匀采样30帧

数据集

视频总数: 243,979
样本总数 (窗口): 1,371,062
窗口大小: 30帧
步长: 15帧
分辨率: 224x224

训练进度

轮次	准确率	宏平均F1
5	97.35%	0.9666
6	97.74%	0.9720
7	98.05%	0.9757

许可证

本模型仅供研究用途。

引用

@misc{driver-behavior-detection-2026,
  title={Driver Behavior Detection using Video Swin Transformer},
  author={C-Team},
  year={2026}
}

koreashin/Driver_monitoring

作者 koreashin

video-classification

↓ 1 ♥ 0

创建时间: 2026-01-15 00:24:47+00:00

更新时间: 2026-01-19 04:21:13+00:00

在 Hugging Face 上查看

文件 (8)

.gitattributes

FP32 ONNX

README.md

config.json

convert_coreml_macos.py

model.onnx ONNX

model.py

pytorch_model.bin