Cassandra is a distributed database software , Spark is a platform on which to run parallel programming and Hadoop also allows to run distributed programming. they are software managed by the Apache foundation.
Transportation data comes from either government sources, private studies (usually under government sponsorship to build new roads or justify additional investment on an existing road) and finally crowdsourcing. The first and third are the ones more likely to be available. You would need to find out the country under study to see which sources are available.