DAD-3DHeads的官方回购：用于从单个图像进行3D头部对齐的大规模密集、准确和多样化的数据集(CVPR2022)

共106个文件

py：46个

npy：25个

yaml：19个

版权申诉

198 浏览量 2023-04-30 23:59:26 上传评论收藏 55.87MB ZIP 举报

《DAD-3DHeads：大规模密集、准确和多样化的3D头部对齐数据集在CVPR2022中的应用》 DAD-3DHeads是计算机视觉领域的一项重要研究，它在CVPR2022（计算机视觉与模式识别会议）上被提出，专注于3D头部对齐问题。该数据集旨在为从单个二维图像中恢复三维头部信息提供一个大规模、高精度且多样化的资源，这对于推动面部识别、虚拟现实、增强现实以及人机交互等领域的技术发展具有重要意义。我们来探讨一下3D头部对齐的重要性。传统的二维面部识别主要关注面部特征点的定位，而3D头部对齐则更进一步，它可以捕捉到面部的立体信息，包括深度和空间关系，这对于理解和模拟人类表情、姿态以及头部运动至关重要。在虚拟现实和增强现实中，精确的3D头部对齐可以提升用户体验，使虚拟人物的动作更加自然逼真。 DAD-3DHeads数据集的“大规模”特性意味着它包含了大量的图像样本，这为训练深度学习模型提供了充足的素材。数据集的多样性体现在不同年龄、性别、种族、表情和头部姿态的涵盖，这有助于提高算法在实际应用中的泛化能力。此外，“准确”意味着每个样本都经过精心标注，确保了3D头部重建的精确度，这对研究者来说是宝贵的资源。数据集中提供的主要元素可能包括： 1. 高分辨率的彩色图像，用于视觉输入。 2. 3D头部模型，每个模型由一系列关键点或特征点表示，这些点对应于面部的重要结构，如眼睛、鼻子、嘴巴等。 3. 对应于每个2D图像的精确3D头部姿态和形状参数，这些参数可用于重建3D模型。 4. 可能还包括不同光照条件、遮挡情况和面部表情的变化，以增加数据的复杂性和真实性。在实际应用中，研究人员可能会采用基于深度学习的方法，如卷积神经网络（CNNs）和图神经网络（GNNs），来学习从2D图像到3D头部表示的映射。通过利用DAD-3DHeads数据集进行训练，模型能够学习到如何从单一视角的图像中提取出深度信息，从而实现3D头部的准确对齐。在CVPR这样的顶级会议上发布，表明DAD-3DHeads数据集对于推动3D头部对齐技术的发展有着显著的贡献。对于研究人员和开发者而言，这个数据集不仅提供了评估现有方法的基准，也为探索新的算法和模型提供了宝贵的资源。未来，我们可以期待这个领域的进步将带来更加智能和逼真的虚拟交互体验。

资源推荐

资源详情

资源评论

收起资源包目录

DAD-3DHeads的官方回购：用于从单个图像进行3D头部对齐的大规模密集、准确和多样化的数据集(CVPR2022)_.zip （106个子文件）

.gitattributes 43B

.gitignore 23B

1.jpeg 274KB

sample_submission.json 134B

LICENSE 20KB

README.md 8KB

README.md 5KB

head_edges.npy 171KB

face_w_ears_edges.npy 161KB

face_edges.npy 97KB

flame_dynamic_embedding.npy 42KB

head_indices.npy 29KB

face_w_ears.npy 29KB

head.npy 27KB

face.npy 16KB

eyeballs.npy 9KB

nose.npy 2KB

contour.npy 2KB

indices_2d.npy 2KB

eyes.npy 1KB

cheeks.npy 1KB

lips.npy 1KB

lips.npy 983B

nose.npy 967B

brows.npy 947B

eyes.npy 941B

brows.npy 849B

contour.npy 680B

temples.npy 602B

forehead.npy 568B

forehead.npy 467B

flame.pkl 50.57MB

flame_static_embedding.pkl 4KB

demo_vis.png 3.15MB

banner.png 3.03MB

landmarks_7_annotated.png 195KB

flame_mesh_faces.pt 235KB

flame_lightning_model.py 17KB

utils.py 10KB

flame.py 9KB

flame_dataset.py 9KB

predictor.py 8KB

benchmark.py 8KB

utils.py 7KB

mixins.py 7KB

bifpn.py 6KB

demo_utils.py 6KB

schedulers.py 5KB

keypoints.py 5KB

layers.py 5KB

flame_regression.py 4KB

utils.py 4KB

loss_module.py 4KB

utils.py 3KB

head_mesh.py 2KB

trainer.py 2KB

visualize.py 2KB

iou.py 2KB

encoders.py 2KB

model_checkpoint.py 2KB

demo.py 2KB

train.py 2KB

reprojection_loss.py 1KB

vertices_3d_loss.py 1KB

optimizers.py 1KB

generate_gt.py 1KB

transforms.py 1KB

landmarks_loss_w_visibility.py 977B

config.py 959B

keypoint_losses.py 909B

coder.py 856B

early_stop.py 602B

__init__.py 446B

__init__.py 345B

__init__.py 296B

utils.py 292B

base.py 191B

__init__.py 175B

__init__.py 24B

__init__.py 0B

requirements.txt 270B

train_loss.yaml 1KB

dad_3d_heads.yaml 1KB

train.yaml 382B

flame_landmarks.yaml 346B

resnet_regression.yaml 259B

backbone.yaml 213B

dad_3dnet.yaml 187B

flame_constants.yaml 138B

local_fast.yaml 130B

4gpu.yaml 101B

2gpu.yaml 95B

local.yaml 84B

1gpu.yaml 74B

共 106 条

# DAD-3DHeads Benchmark This is the official repository for [DAD-3DHeads Benchmark](https://www.pinatafarm.com/research/dad-3dheads/) evaluation. DAD-3DHeads is a novel benchmark with the evaluation protocol for quantitative assessment of 3D dense head fitting, i.e. 3D Head Estimation from dense annotations. ## Evaluation metrics Given a single monocular image, the aim is to densely align 3D head to it. The goodness-of-fit of the *predicted mesh* to the *pseudo ground-truth one* (from here on - *GT mesh*) provided in [DAD-3DHeads dataset](https://www.pinatafarm.com/research/dad-3dheads/dataset) measures the pose fitting, and both face and head shape matching. DAD-3DHeads Benchmark consists of 4 metrics: 1) **Reprojection NME**: normalized mean error of the reprojected 3D vertices onto the image plane, taking X and Y coordinates into account. We use head bounding box size for normalization. The metric is computed on [MultiPIE](https://www.researchgate.net/publication/240446286_Multi-PIE) 68 landmarks. We resort to this classical configuration in order to cover the widest range of methods, as not all of them follow the same topology ([FLAME](https://flame.is.tue.mpg.de)) as DAD-3DHeads dataset and DAD-3DNet do. **To have this metric evaluated, your submission has to contain 68 predicted 2D landmarks.** 2) **$Z_n$ accuracy**: a novel metric in our evaluation protocol. As our annotation scheme is conditioned only upon model prior and the reprojection onto the image, we cannot guarantee the absolute depth values to be as accurate as sensor data. We address this issue by measuring the *relative* depth as an ordinal value of the $Z$-coordinate. For each of $K$ vertices $v_i$ of the GT mesh, we choose $n$ closest vertices $\{v_{i_1} , ..., v_{i_n} \}$, and calculate which of them are closer to (or further from) the camera. Then, we compare if for every predicted vertex $w_i$ this configuration is the same: $$\text{\quad}Z_n = \frac{1}{K}\frac{1}{n}\sum_{i=1}^K \sum_{j=1}^n\Big((v_i \succeq_z v_i^j) == (w_i \succeq_z w_i^j)\Big).$$ We do so on the ”head” subset of the vertices only (see Fig.12 in Supplementary of the [DAD-3DHeads paper](https://arxiv.org/abs/2204.03688)). **To have this metric evaluated, the submission has to contain predicted mesh in FLAME topology (5023 3D vertices).** 3) **Chamfer distance**: as $Z_n$ metric is valid only for predictions that follow FLAME mesh topology, we add Chamfer distance to measure the accuracy of fit. To ensure generalization to any number of predicted vertices, we measure a one-sided Chamfer distance from our GT mesh to the predicted one. We align them by seven keypoint correspondences (see [RingNet](https://ringnet.is.tue.mpg.de) and [NoW Benchmark](https://github.com/soubhiksanyal/now_evaluation) for the reference), and compute the distances over the ”face” subset of the vertices only (see Fig. 12 in Supplementary). **To have this metric evaluated, the submission has to contain 7 aforementioned 3D landmarks** (see the image below, and go to [NoW Benchmark](https://github.com/soubhiksanyal/now_evaluation) for more details). It is required for rigid alignment between the predicted mesh and the GT mesh. <p align="center"> <img src="images/landmarks_7_annotated.png"> </p> 4) **Pose error**: we measure accuracy of pose prediction based on rotation matrices: $$\text{\quad\quad\quad\quad}Error_{pose} = ||I-R_{pred} R_{GT}^{T}||_F$$ To compare the matrices $R_{pred}$ and $R_{GT}$, we calculate the difference rotation $R_{pred} R_{GT}^T$ , and measure Frobenius norm of the matrix $I − R_{pred} R_{GT}^T$. **To have this metric evaluated, the submission has to contain a $3\times3$ rotation matrix.** For more details, see the [DAD-3DHeads paper](https://arxiv.org/abs/2204.03688). ``` DAD-3DHeads: A Large-scale Dense, Accurate and Diverse Dataset for 3D Head Alignment from a Single Image Tetiana Martyniuk, Orest Kupyn, Yana Kurlyak, Igor Krashenyi, Jiři Matas, Viktoriia Sharmanska CVPR 2022 ``` ## Installation Make sure you have DAD-3DHeads requirements installed (see **Installation** section in the ***README.md*** in the parent **DAD-3DHeads** folder). Install [*kaolin*](https://kaolin.readthedocs.io/en/latest/notes/installation.html) by running the commands below: ``` git clone --recursive https://github.com/NVIDIAGameWorks/kaolin cd kaolin git checkout v0.12.0 python setup.py develop ``` Please check this [webpage](https://kaolin.readthedocs.io/en/latest/notes/installation.html) if you run into any trouble with *kaolin* installation. ## Evaluation Download the DAD-3DHeads dataset from the [DAD-3DHeads project webpage](https://www.pinatafarm.com/research/dad-3dheads/dataset), and predict 3D faces for all validation/test images. Your submission should be a ```.json``` file with the following contents: ``` { 'item_ID': { '68_landmarks_2d': list (len 68) of lists (len 2) of floats - 68 predicted 2D landmarks, 'N_landmarks_3d': list (arbitrary len) of lists (len 3) of floats - N predicted 3D landmarks, '7_landmarks_3d': list (len 7) of lists (len 3) of floats - 3D coordinates of 7 landmarks for rigid alignment, 'rotation_matrix': list (len 3) of lists (len 3) of floats - 3x3 matrix }, 'item_ID': {...}, ... } ``` In other words, it should be a `dict` with the `item_ID`s as keys, and the corresponding predictions as values. Each prediction itself is also a dict with the keys (or subset of these keys): `'68_landmarks_2d', 'N_landmarks_3d', '7_landmarks_3d', 'rotation_matrix'`, while the values are lists of lists of floats. Please be careful with following this particular file format, and the way to arrange your predictions. Please see the `data/sample_submission.json` for the reference. To evaluate on DAD-3DHeads validation set, * generate GT `.json` for validation set: * run `python generate_gt.py <base_path>` * `<base_path>` is the path to the folder where you store `DAD-3DHeadsDataset` * run `python benchmark.py <your_submission_path>`. Note that GT annotations are only provided for the validation set. **In order to evaluate your model on DAD-3DHeads test set, please submit the test set predictions to the following e-mails (please cc all of them):**. * dad3dheads@gmail.com ## License This work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] [cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ [cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png [cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg By using this code, you acknowledge that you have read the license terms, understand them, and agree to be bound by them. If you do not agree with these terms and conditions, you must not use the code. ## Citing The codebase for DAD-3DHeads Benchmark belongs to the [DAD-3DHeads project](https://www.pinatafarm.com/research/dad-3dheads/). If you use the DAD-3DHeads Benchnmark code and/or its evaluation results - implicitly or explicitly - for your research projects, please cite the following paper: ``` @inproceedings{dad3dheads, title={DAD-3DHeads: A Large-scale Dense, Accurate and Diverse Dataset for 3D Head Alignment from a Single Image}, author={Martyniuk, Tetiana and Kupyn, Orest and Kurlyak, Yana and Krashenyi, Igor and Matas, Ji\v{r}i and Sharmanska, Viktoriia}, booktitle={Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)}, year={2022} } ```

评论收藏

内容反馈

版权申诉