Major roads, freeways, railroads, rivers, mountains, etc are good boundaries. Here in the United States, we tend to use census tracts, as that makes it simple to assign demographic data to a zone.
As we need to get socio-demographic data from each zone, the simplest criteria would be based on the data availability. I usually use district boundary as transportation zone as the statistical data are available widely from the statistical or municipal offices. If we use natural boundary such as river or even-major man made-boundary such as major highway, there will be a need to interpolate of the socio-demographic data which will reduce the accuracy.
In my case I have problem of availability. Because some data about residents are not available from statistic on the lower level then. I have data on two level: municipality and settlement. I agree with you if you do not have data that you need to interpolate of the socio-demographic data which will reduce the accuracy. Do you have some overview of doing error with interpolation?
If you don't have readily available statistics or census data then what I have done for operations and maintenance is create zones based on geography accessibility. For example, where a railroad track bisected a neighborhood with no connections across the tracks, I broke the neighborhood into two separate zones for maintenance purposes, as it was effective two different neighborhood accessed from two different points of the arterial network.
The simplest interpolation is based on the area. Unfortunately this kind of interpolation assumes that the density is uniform over all areas (which sure it is not the case). The error of interpolation can be measured (if you have the actual distribution of data) between the actual distribution and the assumed uniform distribution.
Merriam-Webster defines Crowdsourcing as "the practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people and especially from the online community rather than from traditional employees or suppliers" (source: http://www.merriam-webster.com/dictionary/crowdsourcing).
To suppliment existing datasets, one might obtain O-D flows from sources such as a social networking site (like Facebook, where users enter fields identifying where they live and where they work). There are other methods for crowdsourcing, including surveys.
I support all that you mentioned. I am familiar with survey method, but I am not familiar with collecting data from social site. How is your experience with last one, because I think that is very interesting?
Personally, I've never used social networks for data collection. So far, my research involves modeling smaller networks, and I've been able to collect the data I need via conventional methods. The few times I've modeled larger networks, the data was collected and provided ahead of time.
The link below takes you to Facebook's Automated Data Collection Terms and application page to perform Automated Data Collection using Facebook--maybe this will provide better insight: