Due to a recent "BigData" hype, I've recently tried to analyze the possible sources of such data for research projects. The interesting finding is: there does not appear to be such a huge amount of "usable" data in the open space. But I would love to be wrong.

The definition of "usable" in this context is roughly:

  • has spatial and temporal context,
  • contains some information that is relevant for the individuals or for the society
  • What I see is:

    • more than half of the internet traffic is the streaming video. Good part of this is porn, then blockbusters (netflix and co) and then "cats and dogs" videos (youtube).
    • Next big thing are the photos. The situation appears to be better with photos, and quite a few freely available geo-referenced photos can be found on photo sharing platforms.
    • Geo-references tweets and such don't qualify as big data. Same for blogs and such,
    • Facebook, google+ & co. have more data, but these platforms are mainly used for sharing of information with friends & family => not open data.
    • Environmental open data other than those from the satellites probably does not qualify as big data either. Satellites seem to be OK.

    To cut it short, the amount of "useful" open data on the web (see definition of useful above) appears to be quite small. And much of the company-owned data cannot be explored because of the privacy concerns (unless you work for NSA). Besides, the companies such as Google and Facebook base their business on exclusive access to this data.

    Does this mean that open data research is essentially about companies digging through the data they somehow collect on their own (+satellites) or am I missing the point?

    Similar questions and discussions