ORB-SLAM vs. DSO: Possible to develop DSO to the same level?

Hello SIlvan,

It seems you have many questions that span different aspects of a very long subject, so I am taking things from the core of your question: "Is ORBSLAM better than DSO and in what ways"?

About ORBSLAM:

Well, ORBSLAM is just an opensource package that successfully puts together most algorithms involved in the standard monocular visual SLAM pipeline with POINT-FEATURES (make a note here, because DSO is essentially a new approach to tracking), i.e.:

1) Tracking (the so-called "ORB" features which are essentially the FAST features).

2) Reconstruction of the scene (Use the point-correspondences to reconstruct a scene up-to-scale from a homography or essential matrix) and creation of a map (i.e. , 3D points).

3) Camera Resectioning (Use the current map and the newly tracked features to recover the new camera pose - i.e., position and orientation).

4) Bundle Adjustment. Refine the map and the camera pose by doing iterative optimization over the reprojection error.

5) (Optionally) Loop Closure.

DSO:

From what I have read, DSO is not actually intended as an opensource tool in the same way as ORBSLAM, but you can obviously download the code as welll and play with it. It ionvolved the same steps in the pipeline (i.,e., reconstruction, resectioning and bundle djustment), but, in addition, it also makes use of a new approach to tracking. Instead of taking point-features, they are using more robust types of features in images such as edges. This is a more reliable type of tracking, but in general is a difficult approach if you are a beginner.

The COMMON ancestor: PTAM ( http://www.robots.ox.ac.uk/~gk/PTAM/ )

Now, both ORBSLAM and most of the work by the group of Sturm, Engels, Cremers, etc. have a common point of reference, which is "Parallel Tracking and Mapping" (PTAM). I would dare say that ORBSLAM is essentially an improved version of PTAM with added features (such as loop closure), but its is essentially the same code as far as tracker functionality is concerned (which is what really made PTAM successful). DSO is probably more advanced in that aspect (in the sense that elements of PTAM are not recognizable anymore). I am also pretty sure that both packages require ROS for most of their functionality, which may or many not be very convenient (I for example I don't prefer writing code dependent on ROS for a number of reasons).

What should you use?

If you are now beginning the Visual SLAM journey, then perhaps PTAM or ORBSLAM would be the most suitable place to start. I know that PTAM has been forked by numerous people, but the code is very old and it will be almost impossible to compile it right off the bat, because of the dependencies which are all opensource in one hand, but completely undocumented on the other and the makefile will not work. Note that the original PTAM does not do loop closure, but this is a secondary issue for a beginner and there are there many algorithms to do it. There exists an ROS version of PTAM called ETHPTAM -which is essentially the original code with additions such as the OctMap representations class and IMU feedback - which to the best of my knowledge is still being used and can probably compile without significant effort.

OpenCV PTAM:

If you want to consider more options, maybe you want to take a look at my recent work on PTAM. I invested 3 months in removing all the original dependencies of PTAM and modifying extensively the original code so that, amongst others, it runs only with OpenCV (from linear algebra to image operations). So, if you are familiar with OpenCV and you don't wish to use ROS (or anything else for that matter). You can find the link to the repository (gptam) in the comments below the video:

https://www.youtube.com/watch?v=5-focgGIF_A&t=7s

Bear in mind that I am working on a beta version (which will not be called PTAM anymore) which has different options for SLAM initialization, bundle adjustment, etc., but it is not complete at the moment and haven't uploaded it to a repository. I you decide to use it, feel free to contact me anytime.

I sincerely hope that the above are helpful in your work.

George

Silvan Heim

Hello George,

First I want to thank you for your long and interesting answer.

I have read about PTAM but I considered it as old and outdated and I had not in mind using it, but it seems that I might have to reconsider my judgment... :)

You list the features ORB-SLAM provides. I would add two more features, which are actually quite important for the pipeline I plan to implement:

1) Relocalization; When camera tracking is lost, ORB-SLAM can automatically relocalize itself (assuming the camera is in a region which has been mapped before) by searching the most similar keyframes in the bag-of-words tree ORB-SLAM builds when mapping.

2) map reusing; ORB-SLAM can be put in a map reusing mode, where no new keypoints are triangulated and it only localizes the camera, assuring long-time operation without map size going to infinity.

Well, I'm not sure if these are features which are very difficult to implement, but in ORB-SLAM they are already implemented, and that's very good. :)

Regarding loop closure, DSO's predecessor coming from the same lab, LSD-SLAM, which uses also the direct approach (but outputting dense maps) has loop closing capabilities. As far as I have understood from it's paper, the algorithm tries to close loops by comparing the current keyframe with older close-by keyframes, and when it finds a valid transformation, the loop is closed.

But I'm not so sure how easy relocalization would be with direct methods. The features provided by feature-based algorithms can be directly used for the bag-of-words approach, which makes possible an easy search of keyframes. I don't see how one could do the bag-of-words approach without features, so for a bag-of-words in a direct algorithm I'm not so sure what one would need to do.

I will definitely have a look at OpenCV PTAM! Although I will use ROS, as the system environment where my pipeline will "live" in, is a ROS environment.

kind regards,

Silvan

George Terzakis

No problem Silvan, without knowing what is the nature of your SLAM application, just want to make a few notes:

1) Map-closure is a major issue and requires a lot of effort depending on the application. I agree, you should take what ORBSLAM offers, yet bear in mind, in case you haven't looked into it, OpenCV now comes with "OpenFABMAP" which is essentially one-step ahead of simple naive bag-words scene recognition (it basically assigns probabilistic interactions to the features via a Bayes net called the "Chow-Liu" tree. Not sure if already in ORBSLAM (probably not, but you could integrate it with some effort).

Alternatively, there exists the PTAMM (PTAM with multiple maps) which actually does map matching, by Robert Castle ( http://www.robots.ox.ac.uk/~bob/research/research_ptamm.html ). It is also a bit old now, but you can actually "snatch" the part that does loop-closure (the rest is exactly the original code). I wanted to look into this for a while now, but I couldn't find the time.

2) Now, relocalization as If understand correctly from you description, is also something that PTAM does (it is intuitive, it is not super-reliable, but works for reasonable camera motion and favorable indoor conditions). I am not sure if ORBSLAM is any better in this regard, I am guessing yes. However, the only way to tell is to compare code and of course, try it.

3) The environment of your application is crucial. PTAM performs well indoors, but things are way different in outdoor sequences and you really-really need to change things around, primarily in terms of how the key-frames are registered (you don't have the luxury of moving the camera yourself, but its rather on a vehicle that doesn't regulate its speed with the appearance of how good SLAM works). Again, there is a good chance that ORBSLAM improved in this aspect, but then again there are many similarities in the way the pipeline is implemented, so it may not deliver as you expect it in outdoor scenes.

4) Now finally, I like the Sturm, Engles, etc. group's approach to take tracking to the next level (i.e., edgelets instead of points). Although I haven't done anything in this direction, I think it is worth going down this path (Klein and Murray have already discussed this as an improvement to PTAM).

Again, glad to have helped if I did.

Best

George

Silvan Heim

Hello George,

1) ORB-SLAM relies on DBow2 for bag-of-Words, but I don't know whether DBow2 incorporates the non-naive approach. But thank your for the tip, I'll keep openFABMAP in mind.

2) PTAM seems to have the problem (at least the version ORB-SLAM is compared to) that even if the scene does not change, keyframes are constantly added, while ORB-SLAM stops adding new keyframes if the scene content does not change. The ORB-SLAM authors show a graph in their paper comparing the two algorithms exactly to show this.

3) I will just have to try and see how well it works. :) The goal of my work right now is more to get a good pipeline working as soon as possible. The use-case of the pipeline is actually an indoor scene, more specifically SLAM in a lecture / conference room to localize the camera(s) and build a map of the scene. So very basic Computer vision stuff.

4) I find the approach also very interesting, especially because in DSO they are able to combine nonlinear optimization of the whole map with the direct approach. If I understand it right, DSO is not only using edgelets, but actually all regions with intensity gradients.

You helped a lot. I am rather new to this field, and I find it thus very interesting to discuss what I learned from the papers with people. Also to get a more realistic view of the papers and algorithms, because of course people tend to show their algorithms in papers only from the best side... :)

Can you suggest reliable sources defining "3D mesh" and "3D city models"?

Are there any instruments for studying time similar to the way it is in space?

What are the shear and normal stiffness values of an LLDPE liner in 3D numerical modeling of a stockpile?

Difficulty with permittivitt and Magnetic Permeability Calculations?

CAD File of human's & rat's respiratory airways ?

How to use Desmond in HPC ?

What is the relationship between protein structure and N or C terminal tagging choosing?

All math can be explained by iterator of code?

What is human-computer interaction (HCI)?

Cell optimization in 1D Nanotubes?