reconstruct 3D structures from collection of 2D images taken from diff viewpoints
triangulation: 2D points to 3D structure
Given:
- Two or more images of the same scene.
- Known camera matrices P and P’ (camera intrinsics and extrinsics)
- Corresponding 2D points x and x’ in images
Goal: Find the 3D point X that projects to x and x’
and
-
applying camera matrix to 3D point gives pixel point
-
homogeneity , lose depth
-
backprojection
- apply pseudo-inverse of P on x, and connect the points — this doesn’t give full 3D position tho since we don’t know depth
-
fix: we enforce co-linear constraints:
- cross-product
- this equality removes scale factor from
-
with two linear equations per view and at least two views, we can solve for X with singular value decomposition (SVD) (similar to camera calibration & pose estimation)
-
triangulation requires two cameras to have enough equations to solve for all unknowns
- rays intersect at 3D object point
challenges
- noise may prevent rays from intersecting well
- fix: add more rows to matrix with more cameras
- singular value decomposition (SVD) provides least-squares solution (best fit)
epipolar geometry: constraints between views
by enforcing constraints, reduce complexity of matching points between images
- baseline: line connecting camera optical centers O and O’
- epipoles (e,e’): where baseline intersects the image planes
- projection of o’ on the image plane
- epipolar plane: plane formed by baseline and 3D point X
- epipolar line: intersection of epipolar plane and image plane (all possible matches are here) epipolar constraint
- for point x in first image, its match x’ in second image must lie on epipolar line l’
- reduces search space to 1D, so matching is more efficient importance
- no need for depth sensors - pure geometry-based 3D reconstruction
- fundamental
Where are epipoles?
essential (E) & fundamental (F) matrices
encodes camera motion
- - encodes rotation and translation between cameras
- constraint: for points in 2D camera coordinates,
- properties
- rank = 2 (due to cross-product)
- singular values
- multiplying a point by E tells us the epipolar line in the second view
- diff from image homographies coz
- E maps point to line
- homography maps point to point
-
- constraints: for points in image coordinates,
- estimation:
- use 8-point algorithm and RANSAC
- solvable from correspondences, doesn’t need known camera poses
- Recovering camera motion
- decompose E into R and t (up to scale ambiguity)
- triangulate 3D points with recovered poses
SfM Pipeline
combines everything for full 3D reconstruction
Given many images, how can we
- figure out where they were all taken from?
- build a 3D model of the scene?
Calibrate → Triangulate
- Feature Matching
- extract features
- find correspondences
- Estimate motion between images by calculating F
- use RANSAC for outliers
- Recover poses
- decompose F (or E) into R and t
- Triangulate 3D points to estimate 3D structure
- solve for X with svd
- bundle adjustment
- non-linear optimization
- for added views
- determine motion using all known 3D points that have correspondence in new image
- add structure by estimating new points in new image