Web Science and the Mind

Sunday, 8 June 2014

Open Science and the Web

TONY HEY
Microsoft Research Connections

VIDEO

OVERVIEW: Turing award winner, Jim Gray, envisioned a world where all research literature and all research data were online and interoperable. He believed that such a distributed, global digital library could significantly increase the research "information velocity" and improve the scientific productivity of researchers. The last decade has seen significant progress in the move towards open access to scholarly research publications and the removal of barriers to access and re-use. But barrier-free access to the literature alone only scratches the surface of what the revolution of data intensive science promises. Recently, in the US, the White House has called for federal agencies to make all research outputs (publications and data) openly available. But in order to make this effort effective, researchers need better tools to capture and curate their data, and Jim Gray called for 'letting 100 flowers bloom' when it came to research data tools. Universities have the opportunity and obligation to cultivate the next regeneration of professional data scientists who can help define, build, manage, and preserve the necessary data infrastructure. This talk will cover some of the recent progress made in open access and open data, and will discuss some of the opportunities ahead.

READINGS:

Fox, G., Hey, T., & Trefethen, A. (2013). Where Does All the Data Come From?. Data-Intensive Science, 115.
Hey, T. (2010). The next scientific revolution. Harv Bus Rev, 88(11), 56-63. The Fourth Paradigm: Data-Intensive Scientific Discovery Book 2009

http://research.microsoft.com/en-us/collaboration/fourthparadigm/default.aspx
http://eprints.rclis.org/9202/1/heyhey_final_web.pdf

33 comments:

Ishan Walpola9 July 2014 at 06:47
With the increasing popularity of machine learning in scientific research, are we moving away from research constrained by a priori hypotheses? What are the implications of favouring data-driven over hypothesis driven research?
ReplyDelete
Replies
vveitas9 July 2014 at 06:50
Tony Hey: "The data will be the next battle ground". Dame Wendy Hall yesterday mentioned that software used for analysing the data should also published together with the research results, data, etc.

I think all the macros, scripts, software packages, etc. should be made available in order for the research to be really reproducible. It could work just like open source software development: you clone a repository, you compile the source, you run script and that reproduces the whole analysis.

There are two problems with that: you cannot publish commercial software (eg. from MS :)). The second is psychological: making all the whole research procedure accessible increases probability of finding mistakes in the research by others...
ReplyDelete
Replies
Unknown9 July 2014 at 06:51
Est-ce qu’il existe un dépôt central Libre Accès (Open Access) qui se démarque et qui devient la norme pour le dépôt d’articles scientifiques dans les universités?
ReplyDelete
Replies
Maxwell J. D. Ramstead9 July 2014 at 06:57
I find the idea of a “fourth paradigm” in scientific research quite fascinating. Tony Hey has mentioned the appearance of specialized “data analysts.” My question for Tony Hey is whether or not he thinks that this “fourth paradigm” might lead to a more extensive division of labor in research, such that some members of the research community might specialize in analyzing publicly available data sets, rather than generating for new data themselves.
ReplyDelete
Replies
Unknown9 July 2014 at 06:58
Est-ce qu’il existe des méfaits à rendre les articles scientifiques complètement publique sous forme d’Accès Libre (Open Access)?

Même des articles non-révisé par un comité pourrait être mis dans un dépôt Open Access. Comment vérifie-t-on la validité de ces articles? Il y a-t-il un niveau de confiance? Est-ce que les administrateurs des dépôts Libre Accès effectuent une évaluation de ces articles selon des critères?
ReplyDelete
Replies
Eltaani Redha9 July 2014 at 06:59
Do we need in future to Have an organisation like WWW to manage the open data access in order to harmonise metadata and also data ?
ReplyDelete
Replies
Louis Chartrand9 July 2014 at 07:02
About the idea that scientists and programmers take so much time to get working together—how do we speed that up? Because when you're dealing with teams of students, 6 months over and over is a lot of time!
ReplyDelete
Replies
Robert Thibault9 July 2014 at 07:13
How do researchers follow the new law that NIH-funded research must be published open-access? Don't most academic journals reserve the rights to the publication? Does this force researchers to only publish in certain journals, or to only label NIH funding on only some of their work?

The journals that do give the option to publish open access charge upwards of $2,000 to do so. Does the researcher have to front this money?

Finally, if all researchers begin publishing in repositories such as arXiv, what will we use as 'academic' currency? Currently the impact factor of journals seems to matter the most. Will we take a positive step and start evaluating each paper individually? Perhaps through the number of citations to the specific paper rather than the journal?
ReplyDelete
Replies
Unknown9 July 2014 at 07:16
Thanks Tony Hey. Interesting comments about Watson Jeopardy. I agree, It is really a simulation of a part of the intelligence.
ReplyDelete
Replies
Spaceweaver9 July 2014 at 07:23
In the very near future all data management and processing will be conducted by machine agents such as Watson (who 'knows' nothing about 'anything'). Already today it supports medical diagnosis decision making based on huge DB of medical data. Another example is a machine AI agent becoming a board member in a venture capital company: http://www.huffingtonpost.co.uk/2014/05/15/artificial-intelligence-board-directors_n_5329370.html. Point is that raw data will become opaque to human agents and mostly if not entirely mediated by machines with increasing processing and cognitive competences.
ReplyDelete
Replies
Eltaani Redha9 July 2014 at 07:29
The data set tools visualisation are very interesting artifacts (especially maps), Can those tools make link between different dataset directly on this map in interactive way ?
ReplyDelete
Replies
Fernanda Pérez-Gay Juárez9 July 2014 at 07:32
I believe that universal open science is a wonderful goal to pursue, despite the conflicts that may arise because of economic interests of some particulars.I also believe the obstacles vary according to the scientific discipline.
In medicine, for example, the existence of free platforms that gather recent advances on medical knowledge (i. e., medscape, epocrates) have changed the. clinical practice of its users for good. Having immediate access to trustable medical information is definitely beneficial not only for clinicians but also for their patients. However, given the private interests of the pharmaceutical industry, I believe the universal access to the original data that gave rise to some of the publications is still far from where we are. Also, some people believe that the universalization of medical knowledge may lead to auto-medication and mis-understanding of the information by patients, increasing the risk for disease worsening and secondary effects of certain drugs.
How can we fight these arguments and spread the idea of open science to as many knowledge areas as possible?
ReplyDelete
Replies
Fernanda Pérez-Gay Juárez9 July 2014 at 07:33
I also have a question on Open Data: depending on where and how a rsearch project is pursued, differences in methodology and data collection may arise. Is there any kind of regulation software or tool through which we can assess the quality/homogeneity ofthe uploaded data, so that the analysis is valid everywhere?
ReplyDelete
Replies
Unknown9 July 2014 at 17:44
I have been very interested on the idea of open source ontology, an ambitious project. Could anyone give me some references about it?
ReplyDelete
Replies
Instructor12 July 2014 at 19:37
OSO in G and OSO in GS...
ReplyDelete
Replies
Unknown16 July 2014 at 19:12
I’m very interested by the ideas of accessibility and the research "information velocity". I don’t think accessibility and connectivity between researchers and documents are sufficient to benefit all of the content of a scientific document. For my PhD project, I’m working on the notion of readability and transmission of complex ideas, and I’m studying how Wikipedia can help a reader to have a better understanding of a scientific article. How do you think it’s possible and urgent to improve the infrastructure and connectivity between documents to improve easyness and velocity in the transmission of innovation and complex scientific models?
ReplyDelete
Replies
Unknown10 August 2014 at 07:40
The fourth paradigm sounds like a digitized and automated form of grounded theory (http://www.groundedtheoryonline.com/what-is-grounded-theory).
ReplyDelete
Replies

Add comment