说明文档

人脸解析

示例图像和输出

语义分割模型，基于 nvidia/mit-b5 使用 CelebAMask-HQ 进行微调，用于人脸解析。其他选项请参阅 Transformers Segformer 文档。

用于 Web 推理的 ONNX 模型由 Xenova 贡献。

Python 使用方法

完整的标签列表可从 config.json 中提取。

id	label	备注
0	background	背景
1	skin	皮肤
2	nose	鼻子
3	eye_g	眼镜
4	l_eye	左眼
5	r_eye	右眼
6	l_brow	左眉毛
7	r_brow	右眉毛
8	l_ear	左耳
9	r_ear	右耳
10	mouth	嘴唇之间
11	u_lip	上唇
12	l_lip	下唇
13	hair	头发
14	hat	帽子
15	ear_r	耳环
16	neck_l	项链
17	neck	颈部
18	cloth	衣服

import torch
from torch import nn
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation

from PIL import Image
import matplotlib.pyplot as plt
import requests

# 自动确定设备的便捷表达式
device = (
    "cuda"
    # NVIDIA 或 AMD GPU 设备
    if torch.cuda.is_available()
    else "mps"
    # Apple Silicon 设备 (Metal Performance Shaders)
    if torch.backends.mps.is_available()
    else "cpu"
)

# 加载模型
image_processor = SegformerImageProcessor.from_pretrained("jonathandinu/face-parsing")
model = SegformerForSemanticSegmentation.from_pretrained("jonathandinu/face-parsing")
model.to(device)

# 接收 PIL.Image 或 torch.Tensor
url = "https://images.unsplash.com/photo-1539571696357-5a69c17a67c6"
image = Image.open(requests.get(url, stream=True).raw)

# 对图像进行推理
inputs = image_processor(images=image, return_tensors="pt").to(device)
outputs = model(**inputs)
logits = outputs.logits  # shape (batch_size, num_labels, ~height/4, ~width/4)

# 调整输出大小以匹配输入图像尺寸
upsampled_logits = nn.functional.interpolate(logits,
                size=image.size[::-1], # H x W
                mode='bilinear',
                align_corners=False)

# 获取标签掩码
labels = upsampled_logits.argmax(dim=1)[0]

# 移至 CPU 以在 matplotlib 中可视化
labels_viz = labels.cpu().numpy()
plt.imshow(labels_viz)
plt.show()

浏览器中使用 (Transformers.js)

import {
  pipeline,
  env,
} from "https://cdn.jsdelivr.net/npm/@xenova/transformers@2.14.0";

// 重要：防止错误，因为模型文件可能在 HF hub 上
env.allowLocalModels = false;

// 使用预训练的人脸解析模型实例化图像分割 pipeline
model = await pipeline("image-segmentation", "jonathandinu/face-parsing");

// 异步推理，因为可能需要几秒钟
const output = await model(url);

// 每个标签是一个单独的掩码对象
// [
//   { score: null, label: 'background', mask: transformers.js RawImage { ... }}
//   { score: null, label: 'hair', mask: transformers.js RawImage { ... }}
//    ...
// ]
for (const m of output) {
  print(`Found ${m.label}`);
  m.mask.save(`${m.label}.png`);
}

p5.js

由于 p5.js 使用动画循环抽象，我们需要妥善处理模型加载和预测。

// ...

// 异步加载 transformers.js 并实例化模型
async function preload() {
  // 使用动态导入加载 transformers.js 库
  const { pipeline, env } = await import(
    "https://cdn.jsdelivr.net/npm/@xenova/transformers@2.14.0"
  );

  // 重要：防止错误，因为模型文件在 HF hub 上
  env.allowLocalModels = false;

  // 使用预训练的人脸解析模型实例化图像分割 pipeline
  model = await pipeline("image-segmentation", "jonathandinu/face-parsing");

  print("face-parsing model loaded");
}

// ...

完整的 p5.js 示例

模型描述

开发者： Jonathan Dinu
模型类型： 基于 Transformer 的语义分割图像模型
许可证： 非商业研究和教育目的
更多信息资源： Transformers 上关于 Segformer 的文档和/或原始研究论文。

局限性和偏见

偏见

虽然计算机视觉模型的能力令人印象深刻，但它们也可能强化或加剧社会偏见。用于微调的 CelebAMask-HQ 数据集虽然很大，但不一定完全多样化或具有代表性。而且，这些图像只是……名人的照片。

jonathandinu/face-parsing

作者 jonathandinu

image-segmentation transformers

↓ 318K ♥ 209

创建时间: 2022-07-06 01:22:42+00:00

更新时间: 2026-02-18 22:38:54+00:00

在 Hugging Face 上查看

文件 (10)

.gitattributes

README.md

config.json

demo.png

model.safetensors

onnx/model.onnx ONNX

onnx/model_quantized.onnx ONNX

preprocessor_config.json

pytorch_model.bin

quantize_config.json