(KinectFusion)_Kintinuous Spatially Extended KinectFusion.pdf

所需积分/C币:34 2019-10-16 14:49:20 746KB PDF
收藏 收藏

(KinectFusion)_Kintinuous Spatially Extended KinectFusion.pdf
2) If above a specified threshold, virtually translate the Integration and 2) Raycasting, both of which access individual TSDF SO that the camera is once again centred elements in the tsdf volume through direct array indexing a)Extract surface from region no longer in the TSDF Functionality with a continuously changing base index has and add to pose-graph been achieved by modifying how this indexing is performed b)Initialise new region entering the TSDF as un- When looking up an element at index(a, y, 2) in a cubic 3D mapped array stored in row major order with dimension a'g the ID The main configurable component of this system is a position a can be calculated as movement threshold b which denotes the distance in meters in all directions from the current origin that the camera may a=(m+yU+0) (6) move before the tsdf recentres The system functions identically to the original KinectFu- Ill- the latest time i the gi vector described in Section sion algorithm while the camera remains within the region dimension By use of the modulo operation the indices passed encompassed by the movement threshold b. Upon crossing this boundary in any one of the three translation dimensions into Equation 6 can be modified to function seamlessly with ,y and 2, the TsdF volume is virtually translated about the a cycling base index provided by the value in g camera pose (in discrete voxel units) to shift the camera to within a distance of less than um from the tsdf origin. The '=(a+gia) mod vs (7) new pose of the camera Ci+1 after a boundary crossing is y=(y+giy) mod vs given as ′=(2+g;z) mod v C2+1=(R2-1,t +ys+25) Where Ri+1 is obtained from the standard Kinect Fusion Also mentioned in Section III-A is the extraction of the algorithm and ti+1 is calculated by firstly obtaining the surface that falls outside of the boundary of the TSDF when number of whole voxel units crossed u since the last boundary it recentres. The u quantity calculated in Equation is used in crossing conjunction with g: to index a 3d slice of the TsdF to extract t+1 (3) surface points from. Surface points are extracted by casting rays orthogonally along each dimension of the TSDf slice Subtracting u from the unshifted ti+1 giveS, where zero crossings in the distance values are extracted as vertices of the reconstructed surface the 3d slice which was (4) just ray cast is then zeroed to allow new surface measurement integration The extracted vertices are downloaded from GPU Finally, we compute the global position of the new TSDF, Memory and added to the point cloud set Mi for the camera gi+1 as g2+1=g+u (5) pose at time i This orthogonal ray casting can result in duplicate points if where the position of the intial TSDF is given by go- the TSDF voxel array is obliquely aligned to the reconstructed (0, 0, 0). Note in the case where the camera does not cross surface. In order to remove these duplicate points we apply any boundary between two frames the vector u is zero and as a voxel grid filter. This filter overlays a voxel grid (in our such git1 = gi implementation with a leaf size of vm)on the extracted point Following a boundary crossing, gi+1 can be used in con- cloud and returns a new point cloud with a point for each junction with ti+1 to produce a new pose in our pose-graph voxel that represents the centroid of all points that fell inside (detailed in Section IIl-Ch while the u quantity is used to that voxel extract out a"slice'' of the tsdf surface which moves out of one side of the cubic 3D voxel array before we calculate a C. Pose-Graph Representation cyclical offset for future interactions with the array(detailed To represent the external mesh we employ a pose-graph rep- next in Section III-B) resentation,Q, where each pose stores an associated surface lice. Each time the camera pose moves across the movement B. Implementation boundary and triggers a tsdf cycle, a new element is added In order to efficiently allow the TsdF voxel array to move into Q. In reality we only add to the pose-graph on boundary relative to the tracked surface and recentre itself around the crossings that result in a minimum number of points extracted camera, as opposed to moving considerable regions of voxels from the TSDF surface as a pose without any associated within the array, we simply treat the array as cyclical and hence surface is useless. Figure 2 shows a simplified 2D example of need only to update the base index offsets for each dimension the movement of the camera and the tsdf virtual frame over of the array. the course of five consecutive frames while Figure 3 shows the There are two main parts of the Kinect Fusion algorithm final step with greater detail highlighting the t' vector. At the which require indexed access to the TSDF volume; 1) Volume end of the example shown Q, would contain three elements the time to extract the data, delegating the mesh construction tsss33H8r88rrrr1 procedure to the cPu As new poses and associated Tsdf volume slices are teamed from the gpu memory they are immediately fed into a greedy mesh triangulation algorithm described by marton g2:g"" et aL. [1O]. The result is a high polygon count surface rep- resentation of the mapped environment. Additionally as this is stored in system memory the size of the computed mesh go surface is bounded only by available memory, with sample 11a1手222i values detailed in Table ll frrREraaErarnersmai E. Visual odometry A critical shortcoming of the original KinectFusion sys tem is its inability to function in environments with a low number of 3d geometric features. An obvious approach to Fig. 2. In this figure the global position of the tsdF(denoted by gi)is ameliorating this issue is to incorporate visual odometry (vo) updated twice over the course of five frames. The continuous pose of the into the processing pipeline. As part of one of the more camera Pi is shown for each of the five frames. The current TSDF Virtual experimental aspects of our system we have substituted the window in global space is shown in teal, the current movement houndary is KinectFusion ICP odometry estimate with the output of the shown in dark brown and the underlying voxel grid quantization is shown in light dashed lines. This simplified example has b= l and vs feature-based FOVIS visual odometry system [7]. Results from this development are presented independently in Section VI which demonstrate the potential advantage of combining both approaches 84 IV SYSTEM ARCHITECTURE In order to provide truly constant time mapping and locali sation, bounded only by memory capacity, we adopted a multi level threaded system architecture based on parallel processin and thread wait and signal mechanisms in between each level This design allows the"front-end"TSDF tracking and surface integration to run without having to wait for operations on the outputted point cloud slices to complete Upon encountering a boundary crossing and subsequently outputing a new pose-graph element, the top level TSDF the t' quantity at the end of the example shown in Figure/ Presented by thread pushes back this new element to a vector of such Fig. 3. The figure gives a more detailed look at the value elements and signals the CloudsliceProcessor thread to begin processing the new data. The CloudsliceProcessor determines from times i=0, 2 and 4. Each element n in Q is composed the transformation for the extracted point cloud of each slice of four components computed at the corresponding pose time required for map rendering and stitching and also carries 2 such that: out the voxel grid downsampling discussed in Section III-B Once this component is finished processing, it signals all other Qn=(gi, ti, Ri, M; Component Threads so they may begin processing of their own A scalable and easily extendable modular system of threaded independent components is achieved by inheritance D. Mesh generation and poly morphism. By means of inheritence and polymor- The process described in the three previous sections results phism in C++ we derive a base class for all bottom level in a system which continuously streams a dense point cloud ComponentThreads, allowing easy creation and destruction of to main system memory as the camera moves through an such threads and sharing of commonly used data(e. g. the pose environment. Given the quality, density and incremental nature graph and mapped surface so far). As an example the mesh of this point cloud, real-time streaming mesh computation is generation functionality of our system discussed in Section feasible. By extracting with a slight overlap in each TSDF III-D is implemented as such a thread, along with many of volume slice(two voxels in our implemention) we can produce the work in progress extensions we discuss later in Section a seamless and continuous mesh VIl This scalable and easily extendable modular system of We note that there are other approaches to surface extraction threaded independent components provides a very cohesive such as marching cubes, however in the current work we and maintainable interface for processing the output of the have exploited the existing raycasting operation to minimise main TSDF tracking and surface construction thread. The only limitations are those imposed by processing power, such as number of cPu cores and amount of main system memory Using this incremental system architecture in combination with the techniques described in Section [Il we have be able to densely map extended areas in real-time with no performance impact as the size of the surface grows V. EXPERIMENTAL EVALUATION As part of the ongoing development of the system we have carried out a number of experiments evaluating both the ualitative performance of the system and the computational performance of each of the components. This section describes the experimental setups we used A. hardware Fig 4. Map reconstruction of indoor handheld research lab dataset For all tests we ran our system on a standard desktop PC running 32-bit Ubuntu 10.10 Linux with an Intel Core 17- 26003 4GHz CPU, 8GB DDR 1333MHZ RAM and an n Vidia 0 5 and 4 respectively. As can be seen from the results GeForce GTX 560 Ti 2GB GPU. The RGB-D camera used in the video along with the data presented in Table I the was a standard unmodified microsoft kinect scale of the environments mapped is considerably larger than what was previously possible with the standard KinectFusion B. Datasets algorithm. Furthermore, the detail of the resulting maps and All data was recorded on a laptop computer with a human their ability to model occlusions within the environment is controlling the Kinect. The capture rate was 15FPS due to maintained(e. g. see inset on Figure 4). A reduction in model the hardware limitations of the capture platform. We evalu- smoothness is apparent in the Fovis dataset, this is due to ated three datasets in the context of continuous dense visual the fact that the ICp odometry matches depth frames against SLAM and mesh generation as well as a fourth dataset which the dense volume accumulated from numerous past frames evaluates the performance of FoViS odometry in place of ICP whereas Fovis odometry is purely based on trame to frame odometry. A value of vs=512 and b-14 was used for all correspondences tests The four datasets were as follows 1)Walking within a research lab, (LAB) B. Computational Performance 2)Walking within and between two floors of an apartment, Given the asynchronous nature of the different components (APT). of the system we evaluate the computational performance of 3)Driving within a suburban housing estate at night, with each level of the thread hierarchy discussed in Section D the camera pointing out of a passenger window, (CAR). separately. In particular we are interested in the framerate of 4)Walking the length of a straight corridor (VO VS. ICP the top level TSDF front-end tracking and surface mapping evaluation dataset).(CORR) thread, the speed of the intermediate CloudSliceProcessor (CSP) thread and finally the performance of the bottom level VI. RESULTS ComponentThreads, namely the Mesh Generation module In this section we present both qualitative and computational We also evaluate the computational performance of FoVIs erformance results for all four datasets visual odometry estimation versus the original ICP odometry estimation A. Qualitative Performance In Table it we present execution time results for each Qualitative results are provided in the associated video con- system component on all datasets. These measurements were tribution where we show the output of the system over the four recorded with all visualisation modules disabled. The values datasets mentioned in the previous section. It should be noted shown for the TsdF and CloudSliceProcessor components are that in the associated video the visualisation thread was run- the maximum average execution times in milliseconds where ning at a lower priority than the main TSDF update thread. For the average is computed over a thirty frame sliding window this reason at points where the surface has grown quite large throughout the experiments. The values listed in the Mesh the system will appear to slow down. However, this slow down Generator row indicate the average and maximum number is restricted to the visualisation thread only while tracking and of cloud slices in the mesh generation thread queue when mapping is in fact running in real-time and not dropping any the data is played in real-time at capture rate(15FPS in all frames. Our video contribution and point clouds are available cases ), formatted in the order average/maximum/total amount athttp://www.cs.nuim.ie/research/vision/data/rgbd2012 of cloud slices. The execution time of the tsdf update for We have also provided figures of the final outputs of the each dataset shows that although the data was only captured systen on the APT, CAR, and LaB datasets in Figures at 15FPS the system is capable of executing at the full Dataset LAB APt CAR CORR TSDF Volume Size(m) 6 Pose-to-pose Odometry(m) 31.07 42.31 136.18 56.08 Bounding Cuboid(m)Ix, y, Z 1405,499,10.54m.22,6.93,6,281.56,2.7,8.4「[.25,8.3,5867 Bounding Cuboid Volume(m) 739.7 483.8 151332 2558.77 Vertices(without overlap)size(MB)I 12×10°18.341.2×10019.8.8×1013.571.6×10°24.59 Mesh Triangles(with overlap.)sie(MB)23×10°(822.5×109324.×1062.9731×107 TABLE I SUMMARY OF PROCESSING AND MAP STATISTICS FOR EACH OF THE DATASETS DESCRIBED IN SECTIONV-B Fig. 5. Map reconstruction of an outdoor datasct captured from a car. Insct: zoomed rcgion showing dctail lcvcl of thc final map Dataset LAB API CAR CORR TSDP(ms)『3394±3.5433.83±4.853679±44941.25±7:031 Such runs. CSP (ms 3.533.43.52士3.382.62士2.394.7士3.87 Analysing the performance of the FOvIs odometry replace Mesh Gen. 1. 275/135 136/7/1782. 57/25/254 2.28/6/220 ment we note that it has poorer computational performance Comet ICP ICP FOVIS than the original CUDa implemented KinectFusion ICP TABLE II odometry estimator at 14.71-4.39ms versus 10.54+0.2lms COMPUTATIONAL PERFORMANCE RESULTS FOR LAB. APT CAR AND CORR DATASETS SIOWING TIIE MAXIMUM AVERAGE COMPUTATION respectively. This is to be expected given that the FoVIs TIMES FOR THE KINECT FUSION(TSDF, CLOUDSLICEPROCESSOR implementation is CPu based. However it is still sufficient (CSP), AND THE AVERAGE/MAX/TOTAL QUEUE SIZES FOR MESH to execute in real-time given the frame rate of the captured GENERATION data. This increase in execution time is reflected in the tSDF value for the corr dataset in Table [l VI CURRENT INVESTIGATIONS 30FPS frame rate of the Kinect sensor. Additionally it can In the long term the aim of this work is to incorporate a be seen that our extension to the original algorithm does not full SLAM approach including loop closure detection, mesh affect the real-time performance of the system The processing reintegration (i.e. into the TSDF), and global pose and mesh done by the CloudSlice Processor intermediate thread, while optimisation. In this section we discuss ongoing work on asynchronous to the TSDf thread, does not make the pool of extensions to the system with regard to these and other issues Component Threads wait more than 2-4 ms for an outputted slice from the TSDF tracking module. The performance of A. Loop Closure Detection and Handling this particular thread is affected by the number of points In order to provide visual loop closure detection we have extracted from the surface in a given slice of the TSDF volume. integrated the Dbow place recognition system [5) in conjunc With regards to mesh generation a queue length of 1 implies tion with SURF feature descriptors as a separate Component that the mesh generator is keeping up with the output of Thread. Based on the well established bag-of-words model the CloudSliceProcessor and real-time mesh construction is when the system identifies a loop closure a pose constraint is succeeding. On average our system keeps up in producing the computed between the matched frames. As this pose constraint mesh surface and catches up when computationally feasible. is between two rGB images it is then propagated back to the The large maximum queue length value is expected in the centre of the TsDf virtual frame in order to properly adjust Car dataset due to the fast motion of the camera; many large the associated poses. This is made possible by transforming slices are outputted by the TSDF and CloudSliceProcessor in back by the t' vector associated with the matched frames. We evaluated its potential in increasing robustness and reducing overall drift The system is organised as a set of hierarchical multi threaded components which are capable of operating in real time. The software framework we have developed facilitates the easy creation and integration of new modules which then also have a minimal performance impact on the main front-end Fig. 6. Shown is the cumulative generated mesh as viewed from the camera dense TSDf tracking and surface reconstruction module In the estimate and a depth map synthesized from this render future we will extend the system to implement a full SLAM approach including loop closure detection, mesh reintegration and global pose and mesh optimisation have also experimented with integrating our pose-graph with iSAM 8 for smoothing and global pose-graph optimisation ACKNOWLEDGMENTS Given the dense structure that is produced by the TsDF front- end we have observed that a mesh deformation of some kind Research presented in this paper was funded by a Strategic Research Cluster grant(07/RC/168)by Science Foundation is required in conjunction with the iSAM optimisation which Ireland under the Irish National Development Plan and the adjusts poses Embark initiative of the irish research council for science B. Map reintegration Engineering and Technology. Another aspect of the system we are currently investigating REFERENCES is the reintegration of previously mapped surfaces to the front end Tsdf as they are encountered, aiding in loop closure and 1 Kinect Fusion extensions to large scale environments drift reduction. One of the approaches we are experimenting http://www.pointclouds.org/blog/srcs/theredia/index.php with involves rendering the computed mesh surface at the May 1th 2012 estimated position of the camera in the global frame from [2 C. Audras, A.I. Comport, M. Meilland, and P. Rive the cameras perspective. Subsequently we synthesize a depth Real-time dense rgB-D localisation and mapping. In map from this mesh render (similar to the work of Fallon et Australian Conference on Robotics and Automation D) and either merge this with the current Kinect depth Monash university, Australia, December 2011 nap or interleave it with the live Kinect data and pass it in [3 F. Endres, J. Hess, N. Engelhard, J. Sturm, D. Cremers independently. Figure 6 shows some preliminary results of this and w. Burgard. An evaluation of the RGB-D SLAM work system. In Proceedings of the IEEE Int. Conf. on Robotics and Automation (ICRA), St. Paul, MA, USa C. Mapping Improvements May 2012 [4 M. F. Fallon, H Johannsson, and JJ. Leonard. Efficient One of the advantages of capturing such a dense and rich scene simulation for robust Monte Carlo localization model in real-time is the possibility of higher level processing using an RGB-D camera. In Proceedings of the Ieee and reasoning such as object recognition and other semantic International Conference on Robotics and Automation One of the first steps forward in this direction (CRA), St Paul, MN, May 2012 we are currently exploring is the integration of texture in the [5] D. Galvez-Lopez and j. D. Tardos. Real-time loop surface building process. In addition to this we are looking at detection with bags of binary words. In Intelligent robots fitting surface models to the data, such as planes and curves and Systems(IROS), 2011 IEEE/RS International Con using sample consensus methods. The extraction of such ference on, pages 51-58, September 2011 primitives from the data provides higher level representations [6] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox more suitable for semantic understanding and also reduces the RGB-D mapping: Using Kinect-style depth cameras for overall complexity of the data dense 3d modeling of indoor environments The inter- national Journal of robotics Research. 2012 VIII. CONCLUSION [7 A.S.Huang, A. Bachrach, P. Henry, M. Krainin, D. Mat We have developed an algorithm which extends the Kinect urana,D. Fox, and N. Roy. Visual odometry and mapping Fusion framework to overcome one of the principal limitations for autonomous light using an rgb-d camera. In of the original system by Newcombe et al the inability to International Symposium on Robotics Research(isrr) work over extended areas. In addition to this we have also Flagstaff. Arizona. USA. August 2011 implemented a real-time triangular mesh generation module [8] M. Kaess, A. Ranganathan, and F. Dellaert. ISAM for representing the extended scale map thereby maintaining Incremental smoothing and mapping. IEEE Transactions the ability to characterise topological and occlusion relation- on Robotics(TRO), 24(6): 1365-1378, December 200 ships within the environment. We have also integrated the FO [9]G. Klein and D. Murray. Parallel tracking and mapping VIS visual odometry library into the processing pipeline and for small AR workspaces. In Proceedings Sixth IEEe and ACM International Symposium on Mixed and Augmented Computer Vision(ICCv), 2011 IEEE international Con- Reality(ISMAR'O7), Nara, Japan, November 2007 ference on, pages 2320-2327, November 2011 [1o Z C. Marton, R. B. Rusu, and M. Beetz. On fast surface 13K. Pirker, M. Ruther, G. Schweighofer, and H. Bischof reconstruction methods for large and noisy datasets. In GPSlam: Marrying sparse geometric and dense proba- Proceedings of the IeEE International Conference on bilistic visual mapping. In Proceedings of the british Robotics and Automation (ICRA), Kobe, Japan, May Machine vision Conference, pages 1. 12. BMVA 2009 Press. 2011. [11R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, [14]R. B. Rusu and S. Cousins. 3D is here: Point cloud D. Kim, A.J. Davison, P. Kohli, J. Shotton, s. Hodges, library (PCL). In JEEE International Confe and A. fitzgibbon. Kinect Fusion Real-time dense Robotics and Automation(ICRA), shanghai, China, May surface mapping and tracking. In Proceedings of the 2011 20111Oih IeeE International symposium on Mixed [15 J. Stuehmer, S. Gumhold, and D. Cremers. Real-time and Augmented Reality, ISMAR'II, pages 127-136, dense geometry from a handheld camera. In Pattern Washington, DC, USA, 2011 IEEE C omputer Societ Recognition(Proceedings DAGM), pages 1 1-20, Darm [121 R. A. Newcombe, S.J. Lovegrove, and A. J. Davison. stadt, Germany, September 2010 DTAM: Dense tracking and mapping in real-time. In

试读 8P (KinectFusion)_Kintinuous Spatially Extended KinectFusion.pdf
立即下载 低至0.43元/次 身份认证VIP会员低至7折
(KinectFusion)_Kintinuous Spatially Extended KinectFusion.pdf 34积分/C币 立即下载
(KinectFusion)_Kintinuous Spatially Extended KinectFusion.pdf第1页
(KinectFusion)_Kintinuous Spatially Extended KinectFusion.pdf第2页

试读结束, 可继续读1页

34积分/C币 立即下载