建筑设计人工智能论文参考资源-CSDN文库

人工智能

需积分: 6 48 浏览量 2023-01-27 22:03:18 上传评论收藏 24.18MB PDF 举报

资源推荐

资源详情

资源评论

Building-GAN: Graph-Conditioned Architectural Volumetric Design Generation

Kai-Hung Chang

, Chin-Yi Cheng

, Jieliang Luo

, Shingo Murata

, Mehdi Nourbakhsh

, and

Yoshito Tsuji

Autodesk Research, United States

Obayashi AI Design Lab, Japan

Abstract

Volumetric design is the ﬁrst and critical step for pro-

fessional building design, where architects not only depict

the rough 3D geometry of the building but also specify the

programs to form a 2D layout on each ﬂoor. Though 2D

layout generation for a single story has been widely studied,

there is no developed method for multi-story buildings. This

paper focuses on volumetric design generation conditioned

on an input program graph. Instead of outputting dense 3D

voxels, we propose a new 3D representation named voxel

graph that is both compact and expressive for building ge-

ometries. Our generator is a cross-modal graph neural

network that uses a pointer mechanism to connect the in-

put program graph and the output voxel graph, and the

whole pipeline is trained using the adversarial framework.

The generated designs are evaluated qualitatively by a user

study and quantitatively using three metrics: quality, diver-

sity, and connectivity accuracy. We show that our model

generates realistic 3D volumetric designs and outperforms

previous methods and baselines.

1. Introduction

Volumetric design (also called massing design or

schematic design) is the ﬁrst step when an architect designs

a building on a given land site. Based on the local building

codes applied to the site, the building can only be designed

within a valid design space, which is usually not a regular

cuboid. For instance, the daylight restrictions prevent the

building from casting too much shadow over its neighboring

building by drawing a slant line as upper bound. Within the

valid design space, a volumetric design not only depicts the

volumetric 3D shape of the building, but also produces 2D

program layouts for each story. An example is illustrated in

Figure 2. The architect then uses the ﬁnalized volumetric

Contributed equally.

Figure 1. Our model takes in a program graph (also called bubble

diagram) and a design space in voxel graph representation, and

outputs a variety of volumetric designs. Professional architects

can convert the output into detailed building design efﬁciently.

design to gradually develop all the details for construction,

including fac¸ade design, interior design, structure systems,

etc. While volumetric design is the foundation of the design

and construction process, making a good volumetric design

usually requires a signiﬁcant amount of time and effort. An

efﬁcient pipeline to generate volumetric design will bring a

great impact on the architecture and construction industry.

Generating realistic 2D room layouts has been a pop-

ular topic for many years. Existing methods include

arXiv:2104.13316v1 [cs.LG] 27 Apr 2021

Figure 2. Left: an example of valid design space. Right: an exam-

ple of volumetric design within the valid design space

optimization-based [14, 1] and learning-based [28, 17, 11,

5] approaches. Recently, researchers start looking at how to

integrate program graphs into layout generation tasks using

graph neural networks (GNNs) [17, 11, 5]. Program graph,

also called bubble diagram, is a graph that illustrates the

relations between programs or rooms and is a common rep-

resentation used by professional architects to explore design

ideas. Similar to House-GAN [17], this paper also focuses

on the graph-conditioned layout generation task. The task

requires the output layouts to be compatible to the condi-

tion input program graphs. However, there is no literature

on extending the task to 3D. Our goal is to produce multiple

layouts, which stack up and form a volumetric design for a

multi-story building.

Though it might seem straight-forward to transfer previ-

ous 2D approaches to 3D, there are several challenges and

limitations when applying previous approaches:

• Compared to the 2D counterparts, 3D program graphs

are not only larger in size, but also more complex

with additional inter-story relations. The output design

space also increases by the number of stories.

• The raw rasterized output used in previous works can-

not produce clean corners and edges due to the ﬁne dis-

cretization of pixels. For instance, boundaries are usu-

ally jagged, rooms can be poorly aligned and overlap-

ping each other, there might be small dents or bulges

in some rooms, etc.

• Volumetric images (usually deﬁned as 3D regular grids

with uniformly discretized voxels) have the closest

structural similarity to rectangular buildings than other

3D representations, such as point clouds or meshes.

However, it is not computational and memory efﬁcient

to use this dense representation for polygonal rooms.

Moreover, there are voxels within the regular grid but

not in the irregular valid design space that take un-

needed memory and computation.

To overcome these challenges and limitations, we pro-

pose voxel graph, a novel 3D representation that can en-

code irregular voxel grids with non-uniform space partition-

ing. To bridge between the input program graph and the

output voxel graph, we design a pointer-based cross-modal

modules in our generative adversarial graph network. The

pointer module can be used not only for message passing,

but also as a decoder to output probability over a dynamic

set of valid programs.

We also work with professional architects to create a

synthetic dataset that contains 120,000 volumetric designs

based on realistic building requirements. We evaluate our

model qualitatively and quantitatively, and it outperforms

existing method by a large margin in all the three metrics:

quality, diversity, and connectivity accuracy.

In summary, our main contributions are: 1) a new 3D

representation, voxel graph; 2) a graph-conditioned gener-

ative adversarial network (GAN) using GNN and pointer-

based cross-modal module; 3) an automated pipeline to gen-

erate valid volumetric designs through simple interaction;

and 4) a synthetic dataset that contains 120,000 volumetric

design and their corresponding program graphs. We will

share the code, model, and dataset.

2. Related Work

2.1. Voxel Representations

Regular grid representation using voxels, such as occu-

pancy grids, has been studied since the 3D extension of 2D

convolution. To achieve 3D shape synthesis, researchers

build encoder-decoder models, such as deep belief network

[29], variational auto-encoder (VAE) [12], generative ad-

versarial network (GAN) [27, 22], and energy-based model

[30]. However, due to the dense representation for sparse

occupancy, voxel representation is notorious for its cubic

computational cost and poor scalability to higher resolu-

tions and larger sizes. Existing methods to mitigate the

problem include sparse convolution [8, 7, 4] and octree rep-

resentation [20, 25, 26].

Our proposed voxel graph combines voxel-based and

graph-based representations by encoding voxels into graph

nodes. Similar idea was proposed in Point-Voxel CNN [13].

To enhance the local modeling capability, it has a high-

resolution point-based branch as well as a low-resolution

voxel-based branch for point cloud encoding. Another fea-

ture of our voxel graph is the ability to support non-uniform

space partition. Polyﬁt [15, 6] reconstructs 3D models by

selecting space partition planes extracted from point clouds.

BSP-Net [3] learns to generate compact meshes using bi-

nary space partitioning. NeuralSim and NeuralSizer [2] also

use graphs to represent structure grids (i.e., columns and

beams) of buildings instead of dense voxels.

Figure 3. Left: the hierarchical program graph. Right: the irregular grid with non-uniform voxel size and the equivalent voxel graph.

2.2. Graph-conditioned Layout Generation

To the best of our knowledge, there is no prior works

on learning-based 3D layout generation. Alternatively, we

review several work on graph-conditioned 2D layout gener-

ation. Graph2Plan [11] generates bounding boxes for each

room, and reﬁnes box locations with a cascaded reﬁnement

network. The input graphs are retrieved based on user con-

straints and outline similarity. The user can get various lay-

outs by feeding different graphs, but the model itself can-

not produce variation. House-GAN [17] proposes a graph-

conditioned GAN, where the generator and discriminator

are built upon relational architecture - ConvMPN [31]. Xin-

han Di et al. [5] uses a similar adversarial approach on in-

terior design with doors, windows, and furniture. Layout-

GMN [19] learns to predict structural similarity between

two layouts with an attention-based graph matching net-

work. Wamiq Para et al. [18] explores the idea of generative

modeling using constraint generation for layouts.

3. Representation and Data Collection

The goal of this paper is to generate 3D volumetric de-

signs given a program graph and a valid design space. The

program graph illustrates the intra-story and inter-story re-

lations between programs. Besides program graph and valid

design space, there are other design conditions that are con-

sidered by architects in industry practice. Floor area ratio

(FAR, derived by dividing the total area of the building by

the total area of the parcel), should not exceed a regulation

limit. In addition, target program ratio (TPR) deﬁnes the

approximate ratio between programs. For example, ofﬁce :

corridor : restroom : elevator : stairs = 50 : 20 : 15 : 5 : 10.

Both TPR and FAR are encoded into the program graph as

described in Section 3.2 and are used as the model input.

Another input is a valid design space, which may be ir-

regular due to building codes. The design space can be

further partitioned freely based on architect’s decisions or

statistical heuristics. In practice, before starting the design

process, architects usually partition the space by consider-

ing construction standards, structure systems, and conven-

tional modules. Inspired by this partitioning process, we

invent the representation, voxel graph, as described in Sec-

tion 3.3.

3.1. Data Collection

Since there is no publicly available dataset for volu-

metric designs from real buildings, we create a synthetic

dataset with 120,000 volumetric designs for commercial

buildings using parametric models. The site of each de-

sign is bounded within 40×40 ×50m

, where different site

conditions are randomly generated. The heuristics behind

the parametric models are based on the rules and knowl-

edge provided by professional architects. Although these

parametric models are able to explore possible volumet-

ric designs, they are not capable of ﬁtting the constraints.

Therefore, we generate the designs ﬁrst and then compute

the voxel graph, program graph, FAR, and TPR for each

design. Please refer to the supplementary for more details

and visualization of the synthetic dataset. The dataset can

also be used to explore other learning-based design tools or

relevant tasks in computer vision and graphics.

3.2. Hierarchical Program Graph

Given a building datum, we ﬁrst construct 2D program

graphs for each story. Each program node feature includes

the program type and the story level. Here, we consider

6 program types: lobby/corridor, restroom, stairs, eleva-

tor, ofﬁce, and mechanical room. A program edge shows

the two programs are connected by a door or opening. To

construct the 3D program graph, we stack all 2D program

graphs and chain the stairs and elevators, since they are the

only paths for moving vertically. In practice, the 3D pro-

gram graph also represents the circulation of the building.

Recall that there are two other design condition inputs:

FAR and TPR. The FAR limit is stored as a graph-level fea-

ture. As for TPR, we add one hierarchy on top of the 3D

program graph. We create one master program node for

each program type and connect them to all program nodes

of the same type. The edges allow the master node to al-

locate different area sizes on each program node through

message passing. Please refer to left of Figure 3.

3.3. Voxel Graph

To overcome the challenges and limitations listed in Sec-

tion 1, we invent a 3D representation called voxel graph.

Each node represents a voxel and the voxel information (co-

ordinate and dimension) is stored as node features. Differ-

ent from volumetric images with voxel grids, voxel graph

does not assume regular grids and consumes memory only

for occupied voxels. Moreover, it allows non-uniform space

partitioning, which avoids over-discretization when using

the uniform voxel size.

Theoretically, voxel nodes can encode arbitrary 3D prim-

itives, but in this paper, only cuboids with varying sizes are

used to build up the approximated valid design space. When

parsing the data, the space partition is deﬁned by the projec-

tion of all 2D layouts. In real-world practice, walls tend to

align across different stories for structural stability or con-

struction considerations, which leads to a reduced amount

of voxels in the space partition. Next, we turn the voxels

into graph nodes and store the voxel information (location

and dimension) as node features and program type as node

labels. Node mask is also stored in case of nodes that are

left unused and does not have any program type. Lastly, a

voxel edge connects two voxel nodes if they share a face.

The ﬁnal voxel graph should look like an irregular cubic

lattice as illustrated in the right of Figure 3.

4. Method

We formulate the framework as a graph-conditioned

GAN. The generator is composed by two GNNs for the pro-

gram graph and voxel graph, connected by a cross-modal

pointer module. The discriminator is composed by a GNN

with two decoders to evaluate design from both building

and story level. An overview of our model is illustrated in

Figure 4.

4.1. Generator

4.1.1 Program GNN

Our generator starts with a program graph neural network

to encode the input program graph. Denote random pro-

gram noise as z

, FAR limit as F , program node feature i

as x

, neighbor of node i as N e(i), node cluster of i’s pro-

gram type as Cl(i), target program ratio of i’s program type

as r

Cl(i)

, multi-layer perceptron as M LP , mean pooling as

Mean, and concatenation operator as [·, ·]. We ﬁrst map

the node feature to the embedding space (1), then compute

message passing T times. In each message passing step, we

compute the message from neighboring nodes (2) and mean

pool all nodes with the same program type as the master

node embedding (3). Lastly, we update the node embed-

dings with residual learning to avoid gradient vanishing (4).

After T = 5 steps of message passing, the ﬁnal embedding

of program node i is denoted as x

= MLP

enc

([x

, z

, F ]) (1)

|Ne(i)|

j∈Ne(i)

MLP

messag e

([x

, x

]) (2)

= Mean

j∈Cl(i)

({x

}) (3)

t+1

= x

+ M LP

update

([x

, m

, r

Cl(i)

, F ]) (4)

4.1.2 Voxel GNN

The input voxel features v

and voxel noise z

are ﬁrst en-

coded by the voxel GNN encoder. To better encode the story

index, we choose positional encoding (PE) as proposed in

[23] and add it to the processed embedding (5). Instead

of appending the absolute coordinates in voxel features, we

use the relative displacements p

− p

in message computa-

tion (6). Voxel node embeddings are updated with residual

learning (7).

= MLP

enc

([v

, z

]) + P E(story

) (5)

l∈Ne(k)

MLP

messag e

([v

, v

, p

− pl]) (6)

= v

+ M LP

update

, n

) (7)

4.1.3 Pointer-based Cross-Modal Module

After processing the program graph with the program GNN,

the ﬁnal embedding of program nodes can be viewed as the

virtual ”blueprint” of a design. Therefore, it is necessary to

”look” at this blueprint to generate the output. To bridge

between the program graph and the voxel graph, we intro-

duce a pointer-based cross-modal module. Inspired by the

application [21, 16] of the Pointer Network [24] in natu-

ral language processing and mesh generation tasks, we con-

struct a pointer module to achieve message passing between

the voxel nodes and all the program nodes on the same

story. We cannot use a ﬁxed length output to model program

type distribution since 1) different stories can have different

numbers of program nodes to choose from, for example,

one ﬂoor has ﬁve rooms and another one has seven rooms;

and 2) if there are two program nodes with the same pro-

gram type, we want to differentiate between the two nodes,

such as two restrooms in the same ﬂoor.

The pointer module returns three terms: mask

, att

and v

t+1

(8). mask

is used as a soft prediction whether

the voxel node k is used or not (9). If it is not used, it

is left unused and has no program type. Otherwise, att

Figure 4. An overview of Building-GAN. Top: the Program GNN, Voxel GNN, and Cross-Modal Pointer Module for the generator. Bottom:

the discriminator with the building and story level decoders.

is the attention distribution over the set of program nodes

on the same ﬂoor (10, 11). An updated embedding v

t+1

computed by the weighted sum of the program embeddings

multiplied by the soft prediction mask

with residual

learning (12).

mask

, att

, v

t+1

= P ointer(v

, {x

}) (8)

mask

= σ(M LP (v

)) (9)

k,i

= θ

tanh(W

+ W

) (10)

att

= gumbel softmax(e

) (11)

t+1

= v

+ mask

att

k,i

(12)

We experiment different ways to integrate the pointer

module. It can be placed after every several message pass-

ing steps in voxel GNN. Our baseline model uses 12 steps

of message passing and call the pointer module once every

2 steps. Please refer to the supplementary for the complete

model and algorithm. Conceptually, these pointer modules

should gradually improve the design. Note that the output

att

indicates which program node is associated to the pro-

gram type of the voxel node, instead of merely the program

type prediction.

4.2. Discriminator

Our discriminator is trained to distinguish if a given de-

sign is generated by the generator or sampled from the

dataset. Therefore, we take a similar architecture as voxel

GNN, but without using the pointer modules. The program

type predictions are concatenated to the encoded voxel node

features. After T = 12 message passing steps, two sep-

arate decoders are used. A graph-level max-pooling de-

coder evaluates the design as a whole while a story-level

max-pooling decoder evaluates the per-story layouts indi-

vidually.

g lobal

= MLP

dec

g lobal

(

) (13)

story

= Mean

story s

(MLP

dec

story

(

k∈s

)) (14)

4.3. Loss

We use the WGAN-GP [9] loss with gradient penalty set

to 10. The two decoder outputs from the discriminator are

equally weighted. The gradient penalty is computed by lin-

early interpolating the cross-modal attention between real

data and generated output, while ﬁxing the voxel graph con-

nectivity.

4.4. Evaluation Metric

We evaluate the generated design in terms of quality, di-

versity, and connectivity accuracy. The quality and diversity

of the output design is evaluated with the Fr

echet Inception

Distance (FID) score [10]. FID score has demonstrated high

correlation to human judgement and has been widely used

in many 2D and 3D studies. Our reference model is based

on a larger version of 3D Descriptor Net [30]. We replace all

convolution layers with 6 residual blocks due to the higher

complexity of our data. Then we ﬂatten the embedding to a

128-dimension tensor using convolution operation and pass

it to a dense layer for loss computation. The FID score is

measured over 10,000 samples. We also run a user study

with architects to measure the quality in Section 5.5.

The connectivity accuracy (Con.) is measured by the

number of the program (room) connections observed from

both the generated design and in the program graph, divided

by the amount of all edges in the program graph. Note that

only when two rooms are connected in the program graph

but disconnected in the voxel graph, it is considered as in-

accurate, since there is no shared wall to put a door. It is

accurate when two rooms are connected in voxel graph but

剩余23页未读，继续阅读

评论收藏

内容反馈

Schulynn

粉丝: 1
资源: 10

建筑设计人工智能论文参考

人工智能论文.docx

人工智能助力建筑设计行业发展

关于人工智能的发展论文

人工智能感想的论文范文参考.docx

人工智能论文合集

人工智能 AI 论文 集合

人工智能论文：基于深度学习的目标检测技术综述.docx

人工智能论文 论文 论文

人工智能论文3000字三篇.doc

人工智能 有关论文

人工智能论文，相关人工智能

对人工智能看法的论文范文参考.docx

人工智能论文 20181126

人工智能论文 学习的模板

人工智能人工智能论文

基于双目视觉的室内标签定位系统(2018年广东省电子设计大赛人工智能专题三等奖)论文参考

中国象棋人工智能论文集

人工智能论文 人工智能的发展

相关实用应用程序（Windows可用）

免费可用的ChatGPT网页版.zip

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）

chromedriver-win64.zip

全国计算机二级WPSoffice精选350道选择题题库（含答案）.pdf

农村公交与异构无人机协同配送优化

哈尔滨工业大学-ChatGPT调研报告-2023.3.6-94页.pdf

李飞飞自传 我看见的世界 The World I see

4个亲测好用的ChatGPT4渠道

华工信号与系统实验三实验报告

最新资源

人工智能 AI 论文集合

人工智能论文论文论文

人工智能有关论文

人工智能论文学习的模板

人工智能论文人工智能的发展

李飞飞自传我看见的世界 The World I see