说明文档

模型描述

该模型预测 URL 为钓鱼网站的概率。
要了解什么是钓鱼攻击，请参阅维基百科页面：
https://en.wikipedia.org/wiki/Phishing ——这不是一个钓鱼链接 😜

模型类型： LinearSVM
任务： 二分类
许可证： MIT
仓库： https://github.com/pirocheto/phishing-url-detection

评估

指标	数值
roc_auc	0.986844
accuracy	0.948568
f1	0.948623
precision	0.947619
recall	0.949629

如何开始使用该模型

在 Python 中使用 pickle 是不推荐的，因为数据反序列化过程中存在安全风险，可能导致代码注入。它还缺乏跨 Python 版本的移植性以及与其他语言的互操作性。请在 Hugging Face 文档中阅读更多相关内容。

我们建议使用 ONNX 模型，它更加安全。除了更轻量、更快速之外，它还可以被 ONNX runtime 支持的语言使用。

以下是一些入门示例。其他语言请参考 ONNX 文档。

<details> <summary>Python - ONNX - [推荐 👍]</summary>

import numpy as np
import onnxruntime
from huggingface_hub import hf_hub_download

REPO_ID = "pirocheto/phishing-url-detection"
FILENAME = "model.onnx"
model_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)

# 使用预训练模型初始化 ONNX Runtime 会话
sess = onnxruntime.InferenceSession(
    model_path,
    providers=["CPUExecutionProvider"],
)

urls = [
    "https://clubedemilhagem.com/home.php",
    "http://www.medicalnewstoday.com/articles/188939.php",
]
inputs = np.array(urls, dtype="str")

# 使用 ONNX 模型对输入数据进行预测
results = sess.run(None, {"inputs": inputs})[1]

for url, proba in zip(urls, results):
    print(f"URL: {url}")
    print(f"为钓鱼网站的可能性: {proba[1] * 100:.2f} %")
    print("----")

</details>

<details> <summary>NodeJS- ONNX - [推荐 👍]</summary>

const ort = require('onnxruntime-node');

async function main() {
    
    try {
        // 确保已下载 model.onnx
        // 使用指定模型创建 ONNX 推理会话
        const model_path = "./model.onnx";
        const session = await ort.InferenceSession.create(model_path);

        const urls = [
            "https://clubedemilhagem.com/home.php",
            "http://www.medicalnewstoday.com/articles/188939.php",
        ]
        
        // 从输入数据创建 ONNX 张量
        const tensor = new ort.Tensor('string', urls, [urls.length,]);
        
        // 使用输入张量执行推理会话
        const results = await session.run({"inputs": tensor});
        const probas = results['probabilities'].data;
        
        // 显示每个 URL 的结果
        urls.forEach((url, index) => {
            const proba = probas[index * 2 + 1];
            const percent = (proba * 100).toFixed(2);
            
            console.log(`URL: ${url}`);
            console.log(`为钓鱼网站的可能性: ${percent}%`);
            console.log("----");
        });

    } catch (e) {
        console.log(`ONNX 模型推理失败: ${e}.`);
    }
};

main();

</details>

<details> <summary>JavaScript - ONNX - [推荐 👍]</summary>

<!DOCTYPE html>
<html>
  <header>
    <title>JavaScript 入门</title>
  </header>
  <body>
    <!-- 从 CDN 导入 ONNXRuntime Web -->
    <script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.min.js"></script>
    <script>
      // 使用 async 上下文调用 onnxruntime 函数
      async function main() {
        try {
          const model_path = "./model.onnx";
          const session = await ort.InferenceSession.create(model_path);

          const urls = [
          "https://clubedemilhagem.com/home.php",
          "http://www.medicalnewstoday.com/articles/188939.php",
          ];

          // 从输入数据创建 ONNX 张量
          const tensor = new ort.Tensor("string", urls, [urls.length]);

          // 使用输入张量执行推理会话
          const results = await session.run({ inputs: tensor });
          const probas = results["probabilities"].data;

          // 显示每个 URL 的结果
          urls.forEach((url, index) => {
            const proba = probas[index * 2 + 1];
            const percent = (proba * 100).toFixed(2);

            document.write(`URL: ${url} <br>`);
            document.write(
              `为钓鱼网站的可能性: ${percent} % <br>`
            );
            document.write("---- <br>");
          });
        } catch (e) {
          document.write(`ONNX 模型推理失败: ${e}.`);
        }
      }
      main();
    </script>
  </body>
</html>

</details>

<details> <summary>Python - Pickle - [不推荐 ⚠️]</summary>

import joblib
from huggingface_hub import hf_hub_download

REPO_ID = "pirocheto/phishing-url-detection"
FILENAME = "model.pkl"

# 从 Hugging Face 模型中心下载模型
model_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)

urls = [
    "https://clubedemilhagem.com/home.php",
    "http://www.medicalnewstoday.com/articles/188939.php",
]

# 使用 joblib 加载下载的模型
model = joblib.load(model_path)

# 预测每个 URL 的概率
probas = model.predict_proba(urls)

for url, proba in zip(urls, probas):
    print(f"URL: {url}")
    print(f"为钓鱼网站的可能性: {proba[1] * 100:.2f} %")
    print("----")

</details>

pirocheto/phishing-url-detection

作者 pirocheto

text-classification sklearn

↓ 0 ♥ 13

创建时间: 2023-11-17 18:36:30+00:00

更新时间: 2023-12-01 09:59:20+00:00

在 Hugging Face 上查看

文件 (5)

.gitattributes

README.md

config.json

model.onnx ONNX

model.pkl