I'm working on a piece of research for this - the data set is in the area of inpatient activity (hospitals). I would value your thoughts/ideas on this.
not sure to understand the question (so many "users" there, not sure who they are and even if they're all the same !)
anyway if the question is related to the improvement in understanding, cleaning etc a large data set by allowing the users of this data set ("users" = "analysts") to interact with each other, I think the netflix challenge forum gives an interesting insight of what can happen when different teams can disclose their findings to others discuss and receive feedback about them.
the forum is still on line : http://www.netflixprize.com/community/
here, of course, "users" means "data-miners" or "analysts"
now, if the question is about people ("users" = "patients") interacting with their own data through some kind of network, well, that's another story !
Hello thank you both for the comments. I didn't explain my question very well. To clarify the 'user' I refer to are not patients. The 'users' are separate groups who use the data. They are all healthcare professionals - Doctors in the main - but have different uses for the data. 1 group would use the data for disease surveillance, another group would use the data for public health decisions, another group would use the data for casemix and the final one for economic health decisions. So in fact their uses are different. This is why I am researching whether when all of these different groups meet as a "user group" of this data - that this is an effective network (I'm not talking about the social or technical merits of meeting). I am presuming that it is a good use of their time, but I am going to try to find some real life examples to find some evidence to back this up. Thanks again.
Generally, a users' group can serve to help its members more fully understand the depth of detail available in the data, as well as its limitations. Analysts do this all of the time, either formally or informally. In this context, the collaboration is more data-driven: how can it be used to answer questions A, B, or C?
When the users are healthcare providers, I'm not so sure; members of the healthcare community tend to be focused on their own area of interest, and are less likely to "cross-pollenate," unless the area of interest overlaps with that of another set of healthcare professionals. In that sense, the collaboration (which is how they would probably view a user group) is more outcome-driven. If the healthcare provider can't anticipate how the others' use of the data can benefit their own, they're not as likely to participate.
If you had a group of users where each user presented how they have used the data, then the group could collectively learn from each other's experiences. The same applies for users of software (for instance SAS has users' groups). People often stumble upon issues for which they either find their own resolution or could ask other users for suggestions.
Cross-department and cross-discipline analytical work is a wonderful thing.
But, I think healthcare providers are somewhat less likely to engage in it unless there's a clear, tangible benefit that they can readily see. One reason for this is very likely that, unlike many analysts, heathcare providers tend to have additional demands on their time: they're often initiating, editing, and grant proposals; they have to address IRB issues; they have to teach and/or administer patient care and/or engage in administeration activities; the list goes on.
Given the right motivation, I think the different data users would be very willing to collaborate across areas. But first you have to get (and hold) their attention.
At my work we have a listserve to post questions and thoughts about a very complex health care data set. I find it more useful than a users group meeting. The members of the list are made up of a mix of individuals that have in common a need to access and utilize various portions of the data. The listserve allows a direct answer to a question from people who have been around the data longer and had to deal with the same or similar issues. Basically it keeps us from reinventing the wheel over and over. Just monitoring the questions begins to give on a feel about the depths of the data and best approaches to questions that I need to answer with the data. Often the problem is that there are several locations for the same data like date of death. There are discussions on the reliability of the various locations in the data set.
Returning to your question, as for a users group meeting I would assume that there other venues where the results of the data analyses are presented and consider partnering with those meetings to provide a connection point.
It is a worthwhile aim, especially if the different users have different interfaces to the same database. If they use different databases (even within the same organisation), it is likely to be an uphill battle.
To answer to your question I would think about scenarios where analytics performed from one group of experts would help on the decision of another group. For instance: If the disease surveillance group, noticed that a certain disease has a high chance to spread rapidly in the comming months, this information would help the group incharge for the economical decision to invest more on treatments for that diseas. Defining such patterns would bring to a clear view how different groups can benefit from each-others expirencies and how this information helps in better decisions.
Thanks Desara for your insight. I tend to agree with you about the pattern identification. After having done some literature searching there is nothing at all directly on this research question. I found some good reviews and articles on the benefits of clinical networks. Generally in healthcare there are three types of networks: quality improvement collaboratives, communities of practice and multidiscplinary/interprofessional teams. There is literature on this. There is very little which combines any of these networks and their use of datasets in the healthcare context. Outside of healthcare there is plenty in the ICT/Computer industry literature on 'user groups' and a massive amount on datasets, but again little on my exact question. Thanks for all the comments.
We work with a variety of enormous population health databases that we maintain longitudinally (1970 forward) . We established a SAS user group for programmers who work with the data sets we use. The most important benefit has been the sharing of data management skills and particularly SAS macros so that researchers using the same data can get the same results. An example is our macro that incorporates CDC decision rules to impute race/ethnicity when unknown or missing or that standardizes classification of race/ethnicity over time when definitions have changed . Another example is our macro based on CDC rules to calculate indicators such as inter-pregnancy interval or macros to classify hospital diagnoses into standard categories based on the AHRQ Clinical Classification System. Sharing methods to standardize the calculation of healthcare data is enormously important to improving the use of these data sets. The user group is of true benefit to organizations that lack the high-level expertise needed to really mine the enormous resource available in large data sets. User groups are an efficient use of scarce high-level programming skills, and strengthen the skills of junior programmers. Definitely recommend user groups.
The analysis of "Big Data" is very much an emerging area of analysis that requires an in inter-disciplinary approach almost by necessity. Current approaches require deep skills on the business/clinical side, the computer science side for algorithm design utilizing current data structures and architecture, definitely advanced stats and math modeling are needed as old methods may need to be rethought or adapted to the challenges/questions at hand, and specialized knowledge in the use of industry standard stat packages such as SAS, Hadoop, Java/C++/C# macros or extensions, etc. may require utilization. And I haven't even gotten into pattern analysis, fuzzy hamming, information theory, propensity modeling, dynamic multilayer perceptron classification systems using new maxent modeling (now available in SAS these days), Big O concerns, NoSQL vs. SQL, MAD skills (yes this is a skill set). Whew! Soooo... I think the need to leverage the resources of various user groups tends to be rather obvious when dealing with big data.