Sunday 8 June 2014


Network Ready Research: 
The Role of Open Source and Open Thinking




CAMERON NEYLON
PLOS (Public Library of Science)



OVERVIEW: The highest principle of network architecture design is interoperability. Metcalfe's Law says a network's value can scale as some exponent of the number of connections. Our job in building networks is to ensure that those connections are as numerous, operational, and easy to create as possible. Informatics is a science of networks: of physical interactions, genetic control, degree of similarity, or ecological interactions, amongst many others. Informatics is also amongst the most networked of research communities and amongst the most open in the sharing of research papers, research data, tools, and even research in process in online conversations and writing. Lifting our gaze from the networks we work on to the networks we occupy is a challenge. Our human networks are messy and contingent and our machine networks clogged with things we can't use, even if we could access them. What principles can we apply to build our research to make the most of the network infrastructure we have around us. Where are the pitfalls and the opportunities? What will it take to configure our work so as to enable "network ready research"?

READINGS:
    Molloy, J. C. (2011). The open knowledge foundation: open data means better sciencePLoS biology9(12), e1001195.
    Whyte, A., & Pryor, G. (2011). 
Open science in practice: Researcher perspectives and participationInternational Journal of Digital Curation6(1), 199-213.


21 comments:

  1. "How do you support the unexpected?" This is the best single statement/question raised at the summer school till now. I think this touches the point of the evolution of intelligence and catalyzing innovation. Normally any dogma is always busy at reinforcing itself and does very little to stay open to those opportunities of profound change that often present themselves unexpectedly.

    ReplyDelete
  2. One way to identify opportunities for change is by examining the dynamics of systems. Peter Csermely (http://www.amazon.com/Weak-Links-Universal-Stability-Collection/dp/3540311513) developed a few criteria that identify turning points in network dynamics. Might be a direction to address the problem.

    ReplyDelete
    Replies
    1. This is an excellent example of what I had in mind. That using analytical approaches that help us understand when the dynamics of a network are going to change could help us decide when and where to put our efforts and limited resources.

      Delete
  3. I found quite interesting Cameron Neylon’s discussion of different kinds of openness. One of the main points I took home from this talk is that being open, over and above questions of open access to data, means that “my work can help someone.” How are the different kinds of openness related? Can they “catalyze” each other?

    ReplyDelete
    Replies
    1. Good question. My intuition is that when we talk about 'network effects' and how the system changes at scale that that is a form of catalysis but its hard to demonstrate that in a formal sense. One way to ask the question is whether all the different forms of openness contribute to reducing some common parameter, what I labelled as "friction".

      This was in part the point I was making when I said I was unsure whether my model was an analogy or a true model of the system. That's a question that will require more analysis.

      Delete
  4. Talking about sharing, how do you get people to recognize your contribution when your facing, say, credibility prejudice? (in Miranda Fricker's sense)

    ReplyDelete
    Replies
    1. The right to speak does not necessarily imply the right to be listened to. This cuts both ways, sometimes it comes out as a way of saying that engaging does not mean you have to always respond. But as you point out it means that those without power will still struggle to be heard even when they are enabled (formally) to speak.

      There aren't any simple answers to power imbalance beyond trying to create systems that recognise the possibility, seek to identify them, and work to ameliorate them. More open systems of credit and prestige can help here as can orthogonal systems that challenge existing power structures.

      Delete
  5. Therefore we expect to be rewarded? I think we do not need to be rewarded. It's true we need money for almost everything. No plans for unexpected. However we focus in the present. We are following the technologies innovations. We are already in the transition.

    ReplyDelete
  6. The idea to exploit an existing network’s resources in order to make an innovative project it seems very interesting idea, however the difficulty here is about how to estimate the cost of this kind of project (comparing doing the project with expert people vs with large public). For me, researchers can’t go with a project without having the right estimate for the next step.

    ReplyDelete
    Replies
    1. Agreed. The question of how to estimate costs/resource needs, particularly externalities and how these might play out are is very hard. I would argue that studying these systems will offer ways to make those estimates that will improve over time.

      But also there is a place for intuition and creativity. Having sufficient headspace to take some risks is also important, not just taking a pure return on investment approach.

      Delete
  7. J’ai trouvé la question de savoir si une collaboration massive en mathématique est réellement productive ou le contraire. Il me semble qu’il est important de relativiser la productivité de la collaboration en fonction du domaine d’étude.

    ReplyDelete
    Replies
    1. Translation:

      Collaboration varies with the field.

      Delete
  8. Dr. Neylon's talk went very nicely with Dr. Gloor's ideas of intrinsically motivated innovation and Dr. Heylighen's ideas about alignment and the reduction of friction. I will ponder Dr. Neylon's questions about how to support the unexpected. I like that he stresses usability and reach in whatever research people will choose to pursue.

    ReplyDelete
  9. I find the metaphor of reducing friction very powerful. It's all about building infrastructure to support an activity. How can we apply this to open source research? Where is the friction which prevents open source research from taking off, and how can it be reduced?

    ReplyDelete
    Replies
    1. I think you means either open access (to research article) or open data, Nicole. "Open source" is specific to software (open your code, or even give it away). The friction with articles is not the authors but the publishers. With data it is the author's need of first-exploration rights and with software it's whether or not they want to reveal it, and/or to sell it...

      Delete
    2. There are many different types of friction. Some of them have to do with just access to see and find, some to do with licensing and rights to actually use things. But many other things can cause friction. A badly written paper creates more friction than necessary, data that is made available in a non-standard or unhelpful way can also cause friction.

      And all of these things may interact in unexpected ways. For instance it is conceivable that providing access to a lot of data which is poorly formatted could increase friction over all. The core question is which kinds of friction are causing the most problem in a particular space at the moment - where is the point where our efforts can make the most difference. It's not always the case that we agree on where that point is but I think we can agree that we want to apply our limited resources (mostly time and money) to those places that will make the most difference.

      Delete
    3. What you're saying here relates well to what has been discussed about data visualization and different semantic networking strategies. When it is difficult to understand data that is presented in an unintuitive way, this creates friction for the user. Similarly, when different companies use proprietary strategies to create semantic webs, leading to an incompatibility between systems, we, as a user community, cannot benefit from the combined efforts of multiple groups. Jim Hendler touches on this in his talk.

      Delete
  10. During the discussion after these presentations we talked about how open we can, or should, be with our data. Dr. Harnad pointed out that he wouldn’t mind publishing his data as soon as he acquired it (very commendable), but that some of his friends would. Although I would like to share Stevan’s opinion, I think I fall into the same category as his friends.

    When working on research, or any other project, there is always one step that appears way bigger than all other steps. It feels somewhat disappointing for someone to collect all the data just to have someone else take their hard work and complete the final step. Both our school system and industry encourage my camp. We are evaluated individually. If we don’t highlight our contribution we get little recognition. I’m sure we can manage to share data immediately, we’ll just need to consider some things ahead of time.

    ReplyDelete
    Replies
    1. The question is very much one of credit. Because credit is given for published articles, not data that gets used, people are fully focussed on this. This doesn't just limit data sharing it means whole classes of people contributing to research are under valued. So this is a bigger problem than just data sharing, but if we can find ways to solve it in particular places (and we have in particular places) then benefits will flow from that.

      Delete
  11. Dear Cameron, Thank you to share your ambitious visions ! I have one question : I’m working on the way of a selection of Wikipedia articles can improve the understanding of a specific scientific article and I’m studying which models are better to predict the dynamic construction of knowledge. I would like to work with ontologies. Do you think it’s possible to model the possible positive rythm of knowledge diffusion thanks to open access? Which ontological, statistical or machine learning models can we use for that?

    ReplyDelete
    Replies
    1. I think its hard. I don't know enough about all the technical choices to give you a detailed answer but naively there seem to be two broad approaches. One can build agent-based models in which parameters to do with knowledge diffusion can be manipulated or you can attempt to observe actual knowledge diffusion "in the wild" and seek to find models that describe that activity.

      Both of these have fairly deep problems of observer bias - agent based (or abstract) models will be biased by the model design, observation approaches will be biased by what you decide to observe and what can be observed. Also if you seek to compare closed access to open access you will have a bias because you will (by definition) have limited access to the closed access corpus!

      In either case careful construction of comparisons between closed and open can help to make results more reliable. Overall my approach would be to explore the parameter space using an agent based modelling approach and then seek observational means of attempting to match the models to real behaviour. I'd probably use machine learning and topic modelling to attempt to observe knowledge diffusion in real systems but others have had good results with simple word frequencies as well (for new terms for instance).

      But always check and re-check how your accessible corpus is biasing your results. That will be the big issue to tackle I suspect.

      Delete