このページは http://www.slideshare.net/hirokiwaterfield/28th-cv-3 の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

- 第28回 CV勉強会@関東

コンピュータビジョン

最先端ガイド5

Multi-View Stereo #3

Mar. 28, 2015

Hiroki Mizuno

1 - 話す場所

• 複数画像からの三次元復元手法

1. はじめに

2. システム概要と構成上の注意

1. 画像収集

2. カメラパラメータ推定

3. 密な形状復元

1. 最先端のMVS研究例

3. 多眼ステレオ (Multi-View Stereo)

1. フォトメトリックコンシステンシー

2. デプスマップ復元

3. デプスマップからのメッシュ復元

4. 復元結果

4. むすび

2 - 密な形状復元

• Structure from Motion (SfM) の結果

– カメラパラメータ

– カメラ間の対応点 (特徴点) の三次元点群

= 疎な形状復元

SfM

bundlerのExamplesより借用

3 - 密な形状復元

4 - 密な形状復元

http://www.di.ens.fr/pmvs/gal ery.html 5 - 密な形状復元

http://www.di.ens.fr/pmvs/gal ery.html 6 - Multi-View Stereo

• What's Multi-View Stereo

– キャリブレーション済みの多視点画像から高

精度・高密度な三次元形状復元

• 長所

– 高解像度

– 撮影速度

– 価格

• 短所

– テクスチャが必要

– リアルタイム復元が困難

7 - Multi-View Stereo Algorithm サーベイ

• A Comparison and Evaluation of Multi-View

Stereo Reconstruction Algorithm

– CVPR 2006

– Authors :

• Steven M. Seitz

• Brain Curless

• James Diebel

• Daniel Scharstein

• Richard Szeliski

• 各種アルゴリズムの分類

• 復元性能のベンチマーク・ランキング

8 - Multi-View Stereo taxonomy

• Scene representation

• Photo-consistency measure

• Visibility model

• Shape prior

• Reconstruction algorithm

• Initialization requirements

9 - Scene representation

• 取り扱う3次元空間の表現方法

– Volume

– Polygon Mesh

– Set of depth maps

10 - Scene representation

• Volume

– 3次元空間をGrid状に分割した空間

• Voxel

• Level-Set

Voxel

• 各Gridにオブジェクトの占有率(二値)を格納

Level-Set

• 各Gridに、最も近い面までの距離を格納

11 - Scene representation

• Polygon Mesh

– 頂点とそれを繋ぐ面のセット

12 - Scene representation

• Set of depth maps

– 各カメラのピクセルごとの深度情報集合

13 - Photo-consistency measure

• 画像間の「見え」の対応を計算する方法

– Scene space

– Image space

• 反射モデル

– 多くのアルゴリズムは"Lambertモデル"を仮定

• 見えが視点位置に依存しない

• 陰影は光源と面の傾きのみに依存

– 最近 (2006年時点) の新しいアルゴリズムでは

BRDFなどを想定したものもある

14 - Photo-consistency measure

• Scene space

– シーン中の点を各カメラに投影しPhoto-

consistencyを計算

– Photo-consistencyはSSDやNCCで計算するこ

とが一般的

538

Computer Vision: Algorithms and Applications (September 3, 2010 draft)

p∞

p∞

p

p

epipolar plane

x1

x

x

x

1

0

0

epipolar

l

l

1

c

0

lines

0

e0

e1

c1

c0

e0

e1

c1

(R,t)

(R,t)

15

(a)

(b)

Figure 11.3 Epipolar geometry: (a) epipolar line segment corresponding to one ray; (b)

corresponding set of epipolar lines and their epipolar plane.

11.1.1 Rectification

As we saw in Section 7.2, the epipolar geometry for a pair of cameras is implicit in the

relative pose and calibrations of the cameras, and can easily be computed from seven or more

point matches using the fundamental matrix (or five or more points for the calibrated essential

matrix) (Zhang 1998a,b; Faugeras and Luong 2001; Hartley and Zisserman 2004). Once this

geometry has been computed, we can use the epipolar line corresponding to a pixel in one

image to constrain the search for corresponding pixels in the other image. One way to do this

is to use a general correspondence algorithm, such as optical flow (Section 8.4), but to only

consider locations along the epipolar line (or to project any flow vectors that fall off back onto

the line).

A more efficient algorithm can be obtained by first rectifying (i.e, warping) the input

images so that corresponding horizontal scanlines are epipolar lines (Loop and Zhang 1999;

Faugeras and Luong 2001; Hartley and Zisserman 2004).2 Afterwards, it is possible to match

horizontal scanlines independently or to shift images horizontally while computing matching

scores (Figure 11.4).

A simple way to rectify the two images is to first rotate both cameras so that they are

looking perpendicular to the line joining the camera centers c0 and c1. Since there is a de-

gree of freedom in the tilt, the smallest rotations that achieve this should be used. Next, to

determine the desired twist around the optical axes, make the up vector (the camera y axis)

2 This makes most sense if the cameras are next to each other, although by rotating the cameras, rectification can

be performed on any pair that is not verged too much or has too much of a scale change. In those latter cases, using

plane sweep (below) or hypothesizing small planar patch locations in 3D (Goesele, Snavely, Curless et al. 2007) may

be preferable. - Photo-consistency measure

• Image space

– カメラ画像をシーンに投影してPhoto-

consistencyを計算

11.1 Epipolar geometry

541

Homography:

u = H x

d

d

y

x

v

u

y

k

Input image k

Virtual camera

x

k

(a)

(b)

16

Figure 11.6 Sweeping a set of planes through a scene (Szeliski and Golland 1999) c 1999

Springer: (a) The set of planes seen from a virtual camera induces a set of homographies in

any other source (input) camera image. (b) The warped images from all the other cameras can

be stacked into a generalized disparity space volume ˜

I(x, y, d, k) indexed by pixel location

(x, y), disparity d, and camera k.

1997)), the last row of a full-rank 4 ⇥ 4 projection matrix ˜

P can be set to an arbitrary plane

equation p =

3

s3[ˆ

n0|c0]. The resulting four-dimensional projective transform (collineation)

(2.68) maps 3D world points p = (X, Y, Z, 1) into screen coordinates xs = (xs, ys, 1, d),

where the projective depth (or parallax) d (2.66) is 0 on the reference plane (Figure 2.11).

Sweeping d through a series of disparity hypotheses, as shown in Figure 11.6a, corre-

sponds to mapping each input image into the virtual camera ˜

P defining the disparity space

through a series of homographies (2.68–2.71),

˜

1

x

˜

k ⇠ ˜

P kP

xs = ˜

Hk ˜

x + tkd = ( ˜

Hk + tk[0 0 d])˜

x,

(11.3)

as shown in Figure 2.12b, where ˜xk and ˜x are the homogeneous pixel coordinates in the

source and virtual (reference) images (Szeliski and Golland 1999). The members of the fam-

ily of homographies ˜

Hk(d) = ˜

Hk + tk[0 0 d], which are parametererized by the addition of

a rank-1 matrix, are related to each other through a planar homology (Hartley and Zisserman

2004, A5.2).

The choice of virtual camera and parameterization is application dependent and is what

gives this framework a lot of its flexibility. In many applications, one of the input cameras

(the reference camera) is used, thus computing a depth map that is registered with one of the

input images and which can later be used for image-based rendering (Sections 13.1 and 13.2).

In other applications, such as view interpolation for gaze correction in video-conferencing - Visibility model

• 各カメラでの可視・不可視の判断方法

– オクルージョンの問題

• Geometric

– 真面目に取り組むアプローチ

– 基本的にチキン・エッグ問題なので、カメラ配置に制約を

持たせるなどで対応

• Quasi-geometric

– 近似情報を使うアプローチ

– Visual Hul などで粗い復元をしてからPhoto-Consistencyを

計算

• Outlier-based

– 外れ値を無視するアプローチ

– "複数の画像からのphoto-consistency"で説明されているア

プローチもこれに該当

17 - Shape prior

• 形状に対する事前知識モデル

– Photo-consistencyだけでは失敗する

– 特にTextureのない領域

• Minimal Surface

– 面は滑らかである

– 曲率の高い部分は苦手

– Level-set, mesh-based algorithm

• Maximal Surface

– 空間を削る系のアプローチ

– 輝度が一致する解が見つかればその場で停止

– 高い曲率も表現できる

– 全体的に復元結果が大きくなる傾向になる

– Voxel-coloring, Space carving

• Image-based

– 近傍PixelのDepthはSmooth

– 2D Markov Random Field

18 - Reconstruction algorithm

• 3D Volume

– Volumeの各格子でコスト関数を計算

– その後、Surfaceを抽出

– Voxel-coloring, Volumetric MRF

• Evolving surface

– 徐々に面を形成してくアプローチ

– Level-set, Space carving

• Depth map

– 複数のDepth mapを独立に計算し、統合

• Feature Point

– 疎な再構成を行ってから、それらを補間

19 - Initialization requirements

• 初期化の要件

– Rough Bounding Box or Volume

• Space carving

• Level-set (質の高い初期値が必要)

– Foreground/background segmentation

• silhouette

– Range of disparity or depth values

• Image-space algorithm

20 - multi-view image sets and corresponding ground truth 3D

mesh models. Similar data are available for surface light-

field studies [59, 60]; we have followed similar procedures

for acquiring the images and models and for registering

them to one another (although we add a step to automati-

cally refine the alignment of the ground truth to the image

sets based on minimizing photo-consistency). The surface

lightfield data sets themselves are not, however, suitable for

temple

temple model

this evaluation due to the highly specular nature of the ob-

jects selected for those studies. We note that a number of

other high quality multi-view datasets are publicly available

(without registered ground truth models), and we provide

links to many of these through our web site.

The target objects for this study were selected to have

a variety of characteristics that are challenging for typi-

cal multi-view stereo reconstruction algorithms. We sought

objects that broadly sample the space of these character-

istics by including both sharp and smooth features, com-

dino

dino model

multi-view image sets and corresponding ground truth 3D

plex topologies, strong concavities, and both strongly and

mesh models. Similar data are available for surface light-

weakly textured surfaces (see Figure 1).

field studies [59, 60]; we have followed similar procedures

The images were captured using the Stanford spherical

for acquiring the images and models and for registering

gantry, a robotic arm that can be positioned on a one-meter

radius sphere to an accuracy of approximately 0.01 degrees.

them to one another (although we add a step to automati-

Images were captured using a CCD camera with a resolu-

cally refine the alignment of the ground truth to the image

tion of 640 × 480 pixels attached to the tip of the gantry

sets based on minimizing photo-consistency). The surface

Benchmark Datasets

arm. At this resolution, a pixel in the image spans roughly

lightfield data sets themselves are not, however, suitable for

0.25mm on the surface of the object (the temple object is

bird

dogs

10

this evaluation due to the highly specular nature of the ob-

cm × 16cm × 8

temple cm, and the dino is 7

temple cm × 9cm

model

× 7cm).

Figure 1. Multi-view datasets with laser-scanned 3D models.

multi-view image sets and corresponding ground truth 3D

The system was calibrated by imaging a planar calibra-

jects selected for those studies. We note that a number of

mesh models. Similar data are available for surface light-

tion grid from 68 viewpoints over the hemisphere and using

other high quality multi-view datasets are publicly available

[61] to compute intrinsic and extrinsic parameters. From

field studies [59, 60];

(without

we

re

hav

gisterede followed

ground

similar

truth

procedures

models), and we provide

these parameters, we computed the camera’s translational

for acquiring the

links toimages

many of and models

these through and

our for

web registering

site.

and rotational offset relative to the tip of the gantry arm, en-

them to one another

The tar(although

get objects we

for add

this a step

study

to

wereautomati-

abling us to determine the camera’s position and orientation

selected to have

as a function of any desired arm position.

cally refine the

a v alignment

ariety of

of the ground

characteristics thattruth

are to the image

challenging for typi-

The target object sits on a stationary platform near the

sets based on minimizing

cal multi-view photo-consistenc

stereo

y).

reconstruction

The surf

algorithms. ace

We sought

Figure 2. The 317 camera positions and orientations for the temple

center of the gantry sphere and is lit by three external spot-

カメラ配置 47視点

Temple

Dino

lightfield data sets

objects themselv

that

es

broadlyare not,

sample howe

the

ver,

space suitable

of these for

character-

dataset. The gaps are due to shadows. The 47 cameras correspond-

lights. Because the gantry casts shadows on the object in

カメラ解像度 640x480

10x16x8 cm

7x9x7 cm

istics by including both sharp and smooth features, com-

ing to the ring dataset are shown in blue and red, and the 16 sparse

certain viewpoints,

dino

we double-covered

dino the hemisphere

model

with

temple

temple model

this evaluation due to the highly specular nature of the ob-

plex topologies, strong concavities, and both strongly and

ring cameras only in red.

two different arm configurations, capturing a total of 790

jects selected for those studies. We note that a number of

images. After shadowed images were manually removed,

weakly textured surfaces (see Figure 1).

other high quality multi-view datasets are publicly available

we obtained roughly 80% coverage of the sphere. From the

The images were captured using the Stanford spherical

hull [46] that serves as an initial estimate of scene geom-

resulting images, we created three datasets for each object,

(without registered ground truth models), and we provide

gantry, a robotic arm that can be positioned on a one-meter

21

etry [5, 19, 31, 47, 48].

corresponding to a full hemisphere, a single ring around the

links to many of these through our web site.

Image-space algorithms [33, 35–37] typically enforce

radius sphere to an accuracy of approximately 0.01 degrees.

object, and a sparsely sampled ring (Figure 2).

constraints on the allowable range of disparity or depth val-

The reference 3D model was captured using a Cyber-

The target objects

Images

for

were

this

capt

study

ured

were

using a

selected

CCD camerato ha

with vaeresolu-

ues, thereby constraining scene geometry to lie within a

ware Model 15 laser stripe scanner. This unit has a single-

a variety of characteristics

tion of 640 × 480that

pix are

els

challenging

attached to the for

tip

typi-

of the gantry

near and far depth plane for each camera viewpoint.

scan resolution of 0.25mm and an accuracy of 0.05mm

cal multi-view stereo

arm. At

reconstruction

this resolution, a algorithms.

pixel in the

We

image sought

spans roughly

to 0.2mm, depending on the surface characteristics and

0.25mm

objects that broadly sample

on the

the

surfacespace

of the of these

object

character

(the temple -

object is

3. Multi-view data sets

the viewing angle.

bird

For each object, roughly

dogs

200 individ-

10cm × 16cm × 8cm, and the dino is 7cm × 9cm × 7cm).

Figure

ual

1.

scans were

Multi-view captured,

datasets

aligned

with

and

laser

merged

-scanned

on

3D a 0.25mm

istics

models.

by including both sharp and smooth features, com-

To enable a quantitative evaluation of multi-view stereo

dino

grid, with the resulting

dino

mesh

model extracted with sub-voxel preci-

The system was calibrated by imaging a planar calibra-

plex topologies, strong concavities, and both strongly and

reconstruction algorithms, we collected several calibrated

sion [62]; the accuracy of the combined scans is appreciably

tion grid from 68 viewpoints over the hemisphere and using

weakly textured surfaces (see Figure 1).

[61] to compute intrinsic and extrinsic parameters. From

The images were

these

captured

parameters,

using

we

the Stanford

computed the

spherical

camera’s translational

gantry, a robotic

and arm that

rotational can

offset be positioned

relative to the on

tip

a

of one-meter

the gantry arm, en-

radius sphere to an

abling accurac

us to

y of approximately

determine the camera’s

0.01 de

position

grees.

and orientation

Images were capt

as a

ured using

function of an a

y CCD

desired camera

arm

with

position. a resolu-

tion of 640

The target object sits on a stationary platform near the

× 480 pixels attached to the tip of the gantry

Figure 2. The 317 camera positions and orientations for

arm.

the

At

temple

this resolution,

center of the a

g pixel

antry

in the

sphere

image

and is lit spans

by

roughly

three external spot-

dataset. The gaps are due to shadows. The 47 cameras

lights. Because the gantry casts shadows on the object in

0.25mm

correspond-

on the surface of the object (the temple object is

bird

ing to the ring dataset

dogs

are shown in blue and red, and the 16 sparse

certain viewpoints, we double-covered the hemisphere with

10cm

ring cameras only in red.

× 16cm × 8cm, and the dino is 7cm × 9cm × 7cm).

Figure 1. Multi-view datasets with laser-scanned 3D models.

two different arm configurations, capturing a total of 790

The system was

images. calibrated

After

by

shado

imaging

wed images a planar

were

calibra-

manually removed,

tion grid from 68

we

viewpoints

obtained

ov

roughly er the

80%

hemisphere

coverage of the and using

sphere. From the

hull [46] that serves as an initial estimate of

[61]

scene

to

geom- compute intrinsic

resulting

and

images, we extrinsic

created

parameters.

three datasets forFrom

each obje - 最先端のMVS研究例

• "Silhouette and stereo fusion for 3D object modeling"

– CVIU 2004

– ターンテーブルを使い、10度ごとに画像取得

– Visual Hul → Polygon Mesh復元

– レーザレンジセンサーレベルの復元に成功

Input Image

Reconstructed Model 頂点数 114,496点

Gouraud shading

Textured

23

Fig. 16. Reconstructions using our proposed approach. Left: one original image used in the

reconstruction. Middle: Gouraud shading reconstructed models (45843, 83628 and 114496

vertices respectively). Right: textured models.

23 - 最先端のMVS研究例

• "A Global y Optimal Algorithm for Robust TV-L1"

– ICCV2007

– 中間データとしてDepth-Mapを保持

– 複数のDepth-Mapを併合することでポリゴン

メッシュ復元

(a) Depth image #1

(b) Depth image #2

(c) Mesh view #1

(d) Mesh view #2

Figure 3. Selected depth images and the final mesh (379958 triangles) for the “Dino” dataset.

24

regions. Since this noise is largely inconsistent in multiple

the range image ri. Such an approach coincides with select-

views, the final integrated model (Figures 4(c) and (d)) is

ing the width ! 0. The TV-L1 energy in Eq. 2 simplifies

very clean. Parts of the pedestal are missing due to depth

to

outliers induced by specular reflections.

Z n

E =

|ru| + N+(~x)|u(~x)

1|

(12)

⌦

o

+ N (~x)|u(~x) + 1| d~x,

where N+(~x) is the number of range images voting for a

carved voxel, i.e. N+(~x) = |{i : di(~x)

0}|. N (~x) is the

number of range images confidently voting for an occluded

voxel, namely N (~x) = |{i : di(~x) 2 (0, ⌘)}|.

The minimizer for the energy in Eq. 12 can be again

found by an alternating optimization procedure as described

in the previous section. It is easy to see, that the solution to

(a) One source view (b) Depth image (c) Front view (d) Back view

the intermediate point-wise minimization step

Figure 4. The “Statue” dataset (consisting of 40 source views).

⇢

The final mesh has 230460 triangles.

1

min

(u

v)2 +

N +|v

1| + N |v + 1|

(13)

v

2✓

The finally presented dataset comprises a sequence of fa-

is now given by

cade images used for terrestrial city modeling (Figures 5(a)

and (c)). We employ a fast dynamic programming approach

v = max( 1, min(1, u + ✓(N +

N ))).

(14)

for depth estimation to obtain better results in textureless fa-

Of course, this scheme is more efficient than the procedure

cade regions (Figures 5(b) and (d)), which still have incor-

outlined in Proposition 2, and the overall computing time is

rect matches e.g. at mirroring display windows. Figure 5(e)

reduced to about 60% in our implementation. However, this

displays the mesh generated by our proposed integration

approach is very vulnerable to aliasing artefacts in practice,

method.

which are clearly visible in Figure 6(a). An analysis for the

case of pure binary input fields fi in the spirit of [10] still

5. Discussion

needs to be done.

This section briefly discusses the relationship of our ap-

5.2. Weighted Total Variation

proach with pure binary image and shape denoising, and

The homogeneous total variation regularization can be

suggests the integration of additional knowledge into our

replaced by a weighted TV-regularization [4], which en-

framework for range image fusion using weighted total vari-

ables an efficient solution procedure for the geodesic ac-

ation.

tive contour model [7]. The isotropic TV-L1 energy func-

tional in Eq. 1 can be extended to incorporate a weighted

5.1. Distance Field/Shape Denoising

TV-norm: 8

9

It is tempting to ask, whether (robust) averaging of dis-

Z <

X

=

tance fields near the hypothetical surface is strictly neces-

Eg =

g(~x) |ru| +

wi(~x)|u

fi| d~x,

sary, or if pure binary input fields f

⌦ :

;

i 2 {

1, 1} are suffi-

i2I(~x)

cient, where fi(~x) = 1 indicates carved voxels according to

(15) - 最先端のMVS研究例

• Depth-Map復元の利点

– 三次元でなく、二次元の画像でドメインでの

問題

– リアルタムでの復元も可能

25 - Towards Internet-scale 最先端の

Multi-view MVS

Ster 研究例

eo

• "Towards Internet-scale Multi-view Stereo "

Yasutaka Furukawa1

Brian Curless2

Stev –

en

1,2

CVPR

M.

Seitz 2010 Richard Szeliski3

1Google Inc.

2University of W

–

ashington

3Microsoft Research

法線付きの点群として3次元復元

– 大規模MVS

Abstract

This paper introduces an approach for enabling exist-

ing multi-view stereo methods to operate on extremely large

unstructured photo collections. The main idea is to decom-

pose the collection into a set of overlapping sets of photos

that can be processed in parallel, and to merge the result-

ing reconstructions. This overlapping clustering problem

Pizza San Marco

is formulated as a constrained optimization and solved it-

(Venice)

eratively. The merging algorithm, designed to be parallel

視点数 : 13,703

and out-of-core, incorporates robust filtering steps to elim-

点群数 : 27,707,825

inate low-quality reconstructions and enforce global visi-

26

bility constraints. The approach has been tested on several

Figure 1. Our dense reconstruction of Piazza San Marco (Venice)

large datasets downloaded from Flickr.com, including one

from 13, 703 images with 27,707,825 reconstructed MVS points

(further upsampled x9 for high quality point-based rendering).

with over ten thousand images, yielding a 3D reconstruc-

tion with nearly thirty million points.

Given recent progress on Internet-scale matching and SFM

1. Introduction

(notably Agarwal et al.’s Rome-in-a-day project [1]), we fo-

cus our efforts in this paper on the last stage of the pipeline,

The state of the art in 3D reconstruction from images has

i.e., Internet-scale MVS.

undergone a revolution in the last few years. Coupled with

MVS algorithms are based on the idea of correlating

the explosion of imagery available online and advances in

measurements from several images at once to derive 3D

computing, we have the opportunity to run reconstruction

surface information. Many MVS algorithms aim at recon-

algorithms at massive scale. Indeed, we can now attempt to

structing a global 3D model by using all the images avail-

reconstruct the entire world, i.e., every building, landscape,

able simultaneously [9, 13, 20, 24]. Such an approach is not

and (static) object that can be photographed.

feasible as the number of images grows. Instead, it becomes

The most important technological ingredients towards

important to select the right subset of images, and to cluster

this goal are already in place. Matching algorithms (e.g.,

them into manageable pieces.

SIFT [17]) provide accurate correspondences, structure-

We propose a novel view selection and clustering scheme

from-motion (SFM) algorithms use these correspondences

that allows a wide class of MVS algorithms to scale up to

to estimate precise camera pose, and multi-view-stereo

massive photo sets. Combined with a new merging method

(MVS) methods take images with pose as input and produce

that robustly filters out low-quality or erroneous points, we

dense 3D models with accuracy nearly on par with laser

demonstrate our approach running for thousands of images

scanners [22]. Indeed, this type of pipeline has already been

of large sites and one entire city. Our system is the first to

demonstrated by a few research groups [11, 12, 14, 19],

demonstrate an unstructured MVS approach at city-scale.

with impressive results.

We propose an overlapping view clustering problem [2],

To reconstruct everything, one key challenge is scala-

in which the goal is to decompose the set of input images

bility.1 In particular, how can we devise reconstruction al-

into clusters that have small overlap. Overlap is important

gorithms that operate at Internet-scale, i.e., on the millions

for the MVS problem, as a strict partition would undersam-

of images available on Internet sites such as Flickr.com?

ple surfaces near cluster boundaries. Once clustered, we

1There are other challenges such as handling complex BRDFs and

apply a state-of-the-art MVS algorithm to reconstruct dense

lighting variations, which we do not address in this paper.

3D points, and then merge the resulting reconstructions into

1 - 最先端のMVS研究例

• 大規模MVSのChallenge

– ビュークラスタリング問題

• SfMの出力からMVSに必要な視点をクラスタリング

– PCクラスタで並列化

• とはいえ、数時間はかかる

27 - 公開されている無償ソフトウェア

Structure from Motion (SfM)

– Bundler

• Photo Tourismはこれを使ってる

– Voodoo Camera Tracker

• 動画からのSfM

Multi-View Stereo (MVS)

– Patch-based Multi-view Stereo (PMVS)

– Poisson Surface Reconstruction

• 法線付き点群からのMesh生成

Web Service

– My3DScanner (サービス終了？？？)

• Bundler + PMVS + Poisson Surface Reconstruction

– Photosynth

– Automatic Reconstruction Conduit

Viewer

– MeshLab

28