Is is legitimate to use geographic occurrence, specifically altitudinal/bathymetric range of species, as character states to use in an ancestral-states reconstruction? Can origins be inferred this way? Are there examples in the literature?
Yes. Though the efficacy of this as a character remains little tested. Recent work by Rick Ree is the best way to go. Look into GeoSSE (Goldberg, Lancaster, & Ree, 2011) or DEC as presented by Clark et al. 2008. A bigger tree is better. DIVA method is most commonly used but this is an inferior method.
I remember some examples from palaeontology, see e.g. Nesbitt and colleagues' (2009) paper on Triassic dinosaur Tawa and references therein - they employ both, DEC and DIVA methods:
If you want to try DIVA, DEC, etc, you should definitely have a look at this R package, that can compare those (an other) models using a likelihood framework: http://cran.r-project.org/web/packages/BioGeoBEARS/index.html
I'm not sure about coding the occurrence, but If you also have a phylogeny, you might want to check Phylomapper out: http://www.evotutor.org/LemmonLab/PhyloMapper1.html
One approach that has been used is to optimize (there are multiple ways to do this) your areas of occurrence on your phylogeny. A recent paper that did this was Nicholson et al 2012 in Zootaxa. They used the method to infer ancestral areas and historical biogeography.
Inferring ancestral areas via ancestral-state reconstructions is actually common practice in biogeography and systematics. I encourage you to take a look at BioGeoBEARS which "... is an R package, authored by Nicholas J. Matzke, that is designed to perform inference of biogeographic history on phylogenies, and also model testing and model choice of the many different possible models of how biogeography may evolve on a phylogeny (dispersal, vicariance, founder-event speciation, DEC, DIVA, BAYAREA, etc.)
"BioGeoBEARS" is short for "BioGeography with Bayesian (and likelihood) Evolutionary Analysis in R Scripts". It implements the LAGRANGE (Ree & Smith 2008) DEC model (2 free parameters) as well as models with fewer or more free parameters. Standard model-testing procedures may then be applied."
Although there are several methods for this, I have an opinion somewhat contrary to that idea. I'm working on a paper in which I will discuss this further, but basically the uncertainty of inference increases with time. So infer ancestral areas in a very deep time is harder than infer them to more recent times. Thus, it is worth reflecting when the results ..
A late comment, but I'll say it all the same. One can infer ancestral areas if one believes in them and believes that they will follow a particular recipe. But just because a recipe can be made it does not mean that the ingredients are edible (realistic).
I'm currently writing an article about it. There are several factors in the distribution of species that can not be "mapped" in a tree, since they are not inheritable.
This is interesting. Given that the lithosphere has changed over time and that the distribution of species has changed over time, then there were ancestral areas. Discovering them is the hard part as both Grehan and Oliveira noted. To simplify this a little when we combine areas with species distributions we have areas of endemism, and I am not so sure there is not some form of inheritance, descent with modification, with regard to areas of endemism. They have histories. They are hierarchical in organization in time and space. Changes in distribution are rarely just one-off random events, thus the descent with modification events of species distributions may mimic genetic inheritance more than we think. Some of the histories may be depicted as tree structures, but other histories may include lots of reticulation making the inference perhaps impossible, but it does not mean the history did not necessarily occur in some way not similar to inheritance (or descent with modification).
I think very interesting placement of Crother. However, I worry how this is being done today. I think that current approaches are too simplistic. Do not take into account a number of empirical knowledge that we have about the distribution of species. Thus, what I mean is: It may even be possible to estimate an ancestral area, but it is not a simple task as rebuilding an ancestral state of a character in a phylogeny (at least in the most cases).
I think yes, because today there are several methods for mapping characters for example methods based on likelihood like extinction and cladogenesis (DEC) which is implemented in LAGRANGE, DIVA which does not take account branch lengths, methods based in Bayesian inference like BAYAREA so forth. Parsimony was the first method proposed, in this all evolutionary events has a likelihood of one. After came Maximum Likelihood that takes into account the likelihood of various events. In the end, came Bayesian Inference which accounts the relation between the likelihood of an event to the likelihood of the tree and the uncertainty with the tree. All methods have disadvantages and advantages that we can see below; One of the most widely used methods for reconstructing ancestral states is parsimony. Inside their assumptions are (1) the tree that is used is the true tree (2) all relevant extant taxa are included (3) characters are coded correctly. These assumptions are shared for other methods like maximum likelihood. But even meeting all assumptions doesn't guarantee reliable reconstructions. One of the assumptions of the parsimony is that changes on all branches are equally likely. Therefore parsimony ignores information about branch lengths in contrast to maximum likelihood which takes into account differente branch lengths but there is a dependence on rates of evolution because for more quiclky evolucing characters the reconstruction is less accuracy. Furthermore Maximum Lilekihood offers advantages when there is asymmetric gainloss probabilities. On the other hand phylogenetic uncertainty and brach lengths on the phylogeny can lead to other interpretations of the ancestral states at a node. Exist different ancestral states reconstruction methods to multistates discrete data and continous data and the topology under these methods has an influence on reconstruction accuracy. Reconstructions based in parsimony are sensility when there is rapid evolution and inequal probabilitiles of gains and losses and it has been found to be unable to distinguish accurately between homology and convergence. Inside of the limitations of the Maximum Likelihood the most common is: every character evolves at a constant rate throughout time and along all branches that is a disadvantage because of Maximum Likelihood falls to infer changes on short branches. Regarding to the other limitatios like dependence on taxon sampling, some autors have demonstrated that ins't necessary more taxa to improve ancestral reconstruction under the Fitch method with the assumption that the true phylogenetic tree was given and this tend was also found with the maximun likelihood method under several implications. In conclusion, I think any method could be used under different approaches taking into account the differences between all methods and their algorithms.
Some papers that you can read:
Clifford W. Cunningham et al. 1998. Reconstructing ancestral character states: a critical reappraisal. TREE vol. 13, no. 9.
Omland. 1999. The assumptions and Challenges of Ancestral State Reconstructions. Syst.Biol. 48(3): 604-611.
Huelsenbeck & Bollback. 2001: Empirical and HIerarchical Bayesian Estimation of Ancestral States. Sysy.Biol. 50(3): 351-366.
Webster & Purvis. 2002: Testing the accuracy of methods for reconstructing ancestral states of continuous characters. Proc.R. Soc. Lond. B 269: 143-149.
Li et al. 2008. More taxa are not necessarily better for reconstruction of ancestral character states. Syst. Biol. 57(4): 647-653
The description given by Omar certainly describes the recipe, but whether such recipes with their inbuilt assumptions about what congruence or incongruence of taxa area relationships is supposed to represent really means anything in reality is another matter. I would treat such methods only as unsubstantiated theoretical speculation.