Sunday, 8 June 2014

Web Science: It's All in the Mind 

University of Southampton, Electronics & Computer Science

OVERVIEW: This year we celebrate the 25th Anniversary of the World Wide Web. Twenty-five years ago there were no web sites, by 1994 there were 800, today it is estimated there are nearly a billion. The reason for this is not solely down to the technology, it is because we - as individuals, organisations and society - create the content that makes the Web grow. This socio-technical aspect of the Web was the founding principal of Web Science. In this talk we will discuss the theory and practice of Web Science – past, present and future – and conjecture the nature of collective intelligence on the Web. Will the Web ever develop a mind of it’s own?

    Berners-Lee, T., Hall, W., Hendler, J., Shadbolt, N., & Weitzner, D. (2006). Creating a Science of the WebScience, 313(5788), 769-771.
    Berners-Lee, T., Hall, W., Hendler, J. A., O'Hara, K., Shadbolt, N., & Weitzner, D. J. (2006). A framework for web scienceFoundations and trends in Web Science, 1(1), 1-130.
    Hendler, J., Shadbolt, N., Hall, W., Berners-Lee, T., & Weitzner, D. (2008). Web science: an interdisciplinary approach to understanding the webCommunications of the ACM, 51(7), 60-69.
    O'Hara, K., Contractor, N. S., Hall, W., Hendler, J. A., & Shadbolt, N. (2013). 
Web Science: understanding the emergence of macro-level features on the World Wide WebFoundations and Trends in Web Science4(2-3), 103-267

    Tiropanis, T., Hall, W., Shadbolt, N., De Roure, D., Contractor, N., & Hendler, J. (2013). The Web Science ObservatoryIEEE Intelligent Systems28(2), 100-104.

Towards a Global Brain: 
The Web as a Self-organizing, Distributed Intelligence

Vrije Universiteit Brussel, ECCO - Evolution, Complexity and Cognition research group


OVERVIEW: Distributed intelligence is an ability to solve problems and process information that is not localized inside a single person or computer, but that emerges from the coordinated interactions between a large number of people and their technological extensions. The Internet and in particular the World-Wide Web form a nearly ideal substrate for the emergence of a distributed intelligence that spans the planet, integrating the knowledge, skills and intuitions of billions of people supported by billions of information-processing devices. This intelligence becomes increasingly powerful through a process of self-organization in which people and devices selectively reinforce useful links, while rejecting useless ones. This process can be modeled mathematically and computationally by representing individuals and devices as agents, connected by a weighted directed network along which "challenges" propagate. Challenges represent problems, opportunities or questions that must be processed by the agents to extract benefits and avoid penalties. Link weights are increased whenever agents extract benefit from the challenges propagated along it. My research group is developing such a large-scale simulation environment in order to better understand how the web may boost our collective intelligence. The anticipated outcome of that process is a "global brain", i.e. a nervous system for the planet that would be able to tackle both global and personal problems.


    Heylighen, F. (2014). Return to Eden? Promises and Perils on the Road to a Global SuperintelligenceThe End of the Beginning: Life, Society and Economy on the Brink of the Singularity, B. Goertzel and T. Goertzel, Eds.
    Heylighen, F. (2013). Self-organization in Communicating Groups: the emergence of coordination, shared references and collective intelligence. In Complexity Perspectives on Language, Communication and Society (pp. 117-149). Springer Berlin Heidelberg.

Mapping the Brain Connectome

Montreal Neurological Institute 
McGill University, Biomedical Engineering

OVERVIEW: The study of macroscopic neural connectivity using neuroimaging has exploded in recent years, with applications in many areas of clinical and basic neuroscience.  These approaches yield metrics of information flow across a network that are not accessible with focal metrics such as functional activation, metabolism or anatomical morphometry. However, there remain fundamental issues, both technical and conceptual, in reducing connectivity information from different imaging techniques into a holistic model of neural connectivity.  We will discuss different forms of connectivity, as defined by structural and functional correlation (MRI, fMRI, PET) and DTI tractography, with illustrations in normal and disordered brain.
    He, Y., & Evans, A. (2010). Graph theoretical modeling of brain connectivityCurrent opinion in neurology, 23(4), 341-350.
    Bullmore, E. T., & Bassett, D. S. (2011). Brain graphs: graphical models of the human brain connectomeAnnual review of clinical psychology, 7, 113-140.
    Sporns, O., Tononi, G., & Kötter, R. (2005). The human connectome: a structural description of the human brainPLoS computational biology, 1(4), e42.

Web Impact on Society 

University of Southampton, Web Science

OVERVIEW: The Web is not just an engineered technical artefact because the Web architecture (HTTP, HTML and URIs) is only the kernel of an enormously complex social-technical machine. Phenomena like online banking, Web TV, internet shopping, e-government and social networking are the names that we give to human activities and human agendas that have co-opted the capabilities of this web architecture. While we may look to the Web to offer a source of "big data" for "social analytics", one of the goals of Web Science is to try to find a perspective that helps us to understand the bigger "socio-technical" picture of the Web, and hence to better interpret the data that we harvest from the Web. By looking at specific examples of how the Web has grown and developed (such as open access, open government data), we can start to see some of the principles and mechanisms of the socio-technical Web.
    Tinati, R.,  Carr, L., Halford, S., Pope, C. (2013) The HTP Model: Understanding the Development of Social Machines, WWW2013 Workshop: The Theory and Practice of Social Machines,
    Tinati, R., Carr, L., Halford, S., Pope C. (2014) (Re)Integrating the Web: Beyond ‘Socio-Technical’, WWW2014 

Open Science and the Web

Microsoft Research Connections


OVERVIEW: Turing award winner, Jim Gray, envisioned a world where all research literature and all research data were online and interoperable. He believed that such a distributed, global digital library could significantly increase the research "information velocity" and improve the scientific productivity of researchers. The last decade has seen significant progress in the move towards open access to scholarly research publications and the removal of barriers to access and re-use. But barrier-free access to the literature alone only scratches the surface of what the revolution of data intensive science promises. Recently, in the US, the White House has called for federal agencies to make all research outputs (publications and data) openly available. But in order to make this effort effective, researchers need better tools to capture and curate their data, and Jim Gray called for 'letting 100 flowers bloom' when it came to research data tools. Universities have the opportunity and obligation to cultivate the next regeneration of professional data scientists who can help define, build, manage, and preserve the necessary data infrastructure. This talk will cover some of the recent progress made in open access and open data, and will discuss some of the opportunities ahead.

    Fox, G., Hey, T., & Trefethen, A. (2013). Where Does All the Data Come From?. Data-Intensive Science, 115.
    Hey, T. (2010). 
The next scientific revolutionHarv Bus Rev88(11), 56-63. The Fourth Paradigm: Data-Intensive Scientific Discovery Book 2009

Scholarly Big Data: Information Extraction and Data Mining

Pennsylvania State University
Information Sciences and Technology

Overview: Collections of scholarly documents are usually not thought of as big data. However, large collections of scholarly documents often have many millions of publications, authors, citations, equations, figures, etc., and large scale related data and structures such as social networks, slides, data sets, etc. We discuss scholarly big data challenges, insights, methodologies and applications. We illustrate scholarly big data issues with examples of specialized search engines and recommendation systems based on the SeerSuite software. Using information extraction and data mining, we illustrate applications in such diverse areas as computer science, chemistry, archaeology, acknowledgements, citation recommendation, collaboration recommendation, and others.

    Khabsa, M & Giles, C.L. (2014) The Number of Scholarly Documents on the Web. PLOS ONE 10.1371/journal.pone.0093949
    Caragea, C., Wu, J., Ciobanu, A., Williams, K., Fernandez-Ramrez, J., Chen, H. H., ... & Giles, L. (2014). 
CiteSeer x: A Scholarly Big Dataset. In Advances in Information Retrieval (pp. 311-322). Springer International Publishing.

    Flake, G. W., Lawrence, S., Giles, C. L., & Coetzee, F. M. (2002). Self-organization and identification of web communitiesComputer35(3), 66-70.

New Models of Scholarly Communication for Digital Scholarship

University of Pittsburgh, School of Information Science

OVERVIEW: Contemporary research and scholarship increasingly uses large-scale datasets and computationally intensive processing.  Cultural shifts in the scholarly community challenge long-standing of academic institutions and call into question the efficacy and fairness of traditional models of scholarly communication. Scholars are also calling for greater authority in the publication of their works and rights management.  Agreement is growing on how best to manage and share massive amounts of diverse and complex information objects.  Open standards and technologies allow interoperability across institutional repositories.  Content level interoperability based on semantic web and linked open data standards is becoming more common.   Information research objects are increasingly thought of as social as well as data objects - promoting knowledge creation and sharing and possessing qualities that promote new forms of scholarly arrangements and collaboration. This talk will present alternative paths for expanding the scope and reach of digital scholarship and robust models of scholarly communication necessary for full reporting.  The overall goals are to increase research productivity and impact, and to give scholars a new type of intellectual freedom of expression.

    Griffin, S. (2013) Scholarly Communication: New Models for Digital Scholarship Workflows Coalition for Networked Information, Spring 2013 Meeting
    Griffin, S. et al (2014) The Denton Declaration: An Open Data Manifesto 
    Borgman, C.L. (2013) Digital Scholarship and Digital Libraries: Past, Present, and Future Theory and Practice of Digital Libraries Conference, September 2013

    Calhoun, K (2014) Exploring Digital Libraries: Foundations, practice, prospects Facet Publishing London, UK

Transformations in Scholarly Communication in the Digital World

Universite de Montreal
Ecole de bibliotheconomie et des sciences de l'information

Overview: Digital technologies — easy to update, reuse, access and transmit and require little space — have changed how researchers produce and disseminate scientific knowledge. Based on quantitative studies in the sociology of science, this talk will discuss these transformations, higlighting three aspects: the increase of scientific collaboration, the diversification of publication venues, and the use of social media.

    Wallace, M. L., Lariviere, V., & Gingras, Y. (2012). A small world of citations? The influence of collaboration networks on citation practicesPloS one7(3), e33339.
    Lariviere, V., Gingras, Y., & Archambault, E. (2006). 
Canadian collaboration networks: A comparative analysis of the natural sciences, social sciences and the humanitiesScientometrics,68(3), 519-533. 
    Bollen, J., Van de Sompel, H., Hagberg, A., & Chute, R. (2009). 
A principal component analysis of 39 scientific impact measuresPloS one4(6), e6022.

Web Impact Metrics for Research Assessment

University of Wolverhampton, Statistical Cybermetrics

Overview: Web metrics are being increasingly explored in the assessment research impact. Hyperlinks, web citations, and URL citations can today be systematically compared with conventional measures (e.g., Web of Science citation counts). Formal citations are also being extracted from web databases and digital libraries by CiteSeer, Google Scholar, and from the huge digitized database of Google Books. These may prove informative as alternative and supplementary citation impact metrics, especially in the social sciences, arts and humanities, where traditional citation indexes are not available or have insufficient coverage. New web impact metrics come from citations in online syllabi and course reading lists, which reflect the educational impact of research, and from download counts of academic publications, which reflect reading and usage. Social impact metrics or Altmetrics — including social bookmarks, tweets, online reading of scientific publications, and viewings of online academic videos — are also emerging. Web impact metrics need to be used cautiously in research evaluation, however, because they still suffer from a generic lack of quality control compared with traditional citation metrics.
    Kousha, K. & Thelwall, M. (2014). Web Impact Metrics for Research Assessment. In: B. Cronin & C.R. Sugimoto, (Eds), Beyond Bibliometrics: Harnessing Multidimensional Indicators of Scholarly Impact, MIT Press.
    Thelwall, M., Vaughan, L., & Bjorneborn, L. (2005). 
WebometricsARIST39(1), 81-135. 
    Kousha, K., & Thelwall, M. (2007). 
Google Scholar citations and Google Web/URL citations: A multidiscipline exploratory analysisJournal of the American Society for Information Science and Technology58(7), 1055-1065.

Humanexus: Envisioning Communication and Collaboration

Indiana University
Department of Information and Library Science


Overview: This presentation opens with a screening of Humanexus, an award-winning semi-documentary that visualizes human communication from the Stone Age to today and beyond. The film aims to make tangible the enormous changes in the quantity and quality of our collective knowledge and the impact of different media and distribution systems on knowledge exchange. It follows a presentation and discussion of recent collaborative work on scholarly communication and collaboration. Last but not least, everyone will be invited to explore the Information Visualization MOOC (for free or for IU credits) and to visit the Places & Spaces: Mapping Science exhibit on display at the summer school.

    Stipelman, Brooke A., Hall, Kara L., Zoss, Angela, Okamoto, Janet, Stokols, Dan, and Börner, Katy (submitted) Mapping the Impact of Transdisciplinary Research: A Visual Comparison of Investigator Initiated and Team Based Tobacco Use Research Publications. The Journal of Translational Medicine and Epidemiology.
    Bollen, Johan, David Crandall, Damion Junk, Ying Ding, and Katy Börner. 2014. From funding agencies to scientific agency: Collective allocation of science funding as an alternative to peer reviewEMBO Reports 15 (1): 1-121.
    Mazloumian, Amin, Dirk Helbing, Sergi Lozano, Robert Light, and Katy Börner. 2013. Global Multi-Level Analysis of the 'Scientific Food Web'Scientific Reports 3, 1167.
    Börner, Katy, Noshir S. Contractor, Holly J. Falk-Krzesinski, Stephen M. Fiore, Kara L. Hall, Joann Keyton, Bonnie Spring, Daniel Stokols, William Trochim, and Brian Uzzi. 2010. A Multi-Level Systems Perspective for the Science of Team ScienceScience Translational Medicine 2 (49): 49(cm)24. 

Relevant books:
    Börner, Katy, and David E. Polley. 2014. Visual Insights: A Practical Guide to Making Sense of Data. Cambridge, MA: The MIT Press.
    Scharnhorst, Andrea, Katy Börner, and Peter van den Besselaar, eds. 2012. Models of Science Dynamics: Encounters Between Complexity Theory and Information Science. Springer Verlag.
    Börner, Katy, Mike Conlon, Jon Corson-Rikert, and Ying Ding, eds. 2012. VIVO: A Semantic Approach to Scholarly Networking and Discovery. Morgan & Claypool Publishers LLC.
    Börner, Katy. 2010. Atlas of Science: Visualizing What We Know. The MIT Press.
        Information Visualization MOOC
        Places & Spaces: Mapping Science exhibit 

Visualizing Dynamic Interactions

Institut National de Recherche en Informatique et Automatique (INRIA) Saclay - ile-de-France

OVERVIEW: Graphs are powerful mathematical structures for modeling and representing many natural phenomena. In trying to explore and make sense of graphs collected in the wild — such as social interactions stored by social network sites or correlations between brain signals obtained using fMRI — visualization is often used. However, traditional visualization techniques are limited to sparse graphs: dense graphs are unreadable. Much progress has been made recently using matrix-based and hybrid visualizations to explore large and dense networks. Although understanding the visualization of the adjacency matrix of a graph is not as immediate as the traditional node-link representation, it does not suffer from most of its drawbacks and only takes a few minutes to grasp, a very reasonable time considering its expressive power. I’ll show how this relatively novel representation can be used to visualize many types of graphs, even dynamic graphs, with no limitation on density and good scalability. I'll show some results on social networks and brain signals.

    Wybrow, M., Elmqvist, N., Fekete, J. D., von Landesberger, T., van Wijk, J. J., & Zimmer, B. (2014). Interaction in the Visualization of Multivariate Networks. In Multivariate Network Visualization (pp. 97-125). Springer International Publishing. 
    Bach, B., Pietriga, E., & Fekete, J. D. (2014, April). 
Visualizing Dynamic Networks with Matrix Cubes. In SICCHI Conference on Human Factors in Computing Systems (CHI).

Visual Tools for Interacting with Large Networks

Mcgill University
School of Information Studies


     OVERVIEW: Useful real-work networks tend to be large and complex, which makes them difficult to browse and navigate by humans. Visual interfaces can mitigate this problem but these tools inevitably suffer from scalability issues, which have led to the development of various clutter reduction techniques such as sampling and filtering. We present and discuss ongoing work concerning visual tools for information exploration and retrieval using large semantic ontology networks (e.g., Library of Congress Subject Headings, Medical Subject Headings, personal information folder structures), which aim to help searchers describe and recognize the information they seek, and discover previously unknown and valuable topics.

    Ellis, G., & Dix, A. (2007). A taxonomy of clutter reduction for information visualisation. IEEE Transactions on Visualization and Computer Graphics, 13, 1216-1223. 
    Gruber, T. (2008). Ontology. In Liu, Ling; Özsu, M. Tamer. Encyclopedia of Database Systems. Springer-Verlag.
    Katifori, A., Halatsis, C., Lepouras, G., Vassilakis, C., & Giannopoulou, E. (2007). Ontology visualiazation methods - a survey. ACM Computing Surveys, 39(4, article 10), 1-43.
von Landesberger, T., Kuijper, A., Schreck, T., Kohlhammer, J., van Wijk, J. J., Fekete, J. D., & Fellner, D. W. (2011). Visual Analysis of Large Graphs: State-of-the-Art and Future Research Challenges. Computer Graphics Forum, 30(6), 1719-1749. 

Collaborative Innovation Networks

Center for Collective Intelligence

OVERVIEW: Every disruptive innovation is not the result of a lone inventor, but of a small group of likeminded individuals, working together in close collaboration to get their cool idea off the ground. They are leveraging the concept of swarm creativity, where this small team - the Collaborative Innovation Network (COIN) - empowered by the collaborative technologies of the Internet and social media, turns their creative labor of love into a product that changes the way how we think, work, or spend our day.
This talk describes a series of ongoing projects at the MIT Center for Collective Intelligence with the goal of analyzing the new idea creation process through tracking human interaction patterns on three levels:
On the global level, macro- and microeconomic indicators such as the valuation of companies and consumer indices, or election outcomes, are predicted based on social media analysis on Twitter, Blogs, and Wikipedia. On the organizational level, productivity and creativity of companies and teams is measured through extracting 'honest signals' from communication archives such as company e-mail. On the individual level, individual and team creativity is analyzed through face-to-face interaction with sociometric badges and personal e-mail logs.    
The talk introduces the concept of coolhunting, finding new trends by finding the trendsetters, and coolfarming, helping the trendsetters getting their idea over the tipping point. The talk also presents the concept of 'Virtual Mirroring', increasing individual and team creativity by analyzing and optimizing five inter-personal interaction variables of honest communication: 'strong leadership', 'rotating leaders', 'balanced contribution', 'fast response', and 'honest sentiment.'

    Gloor, P. A., Krauss, J., Nann, S., Fischbach, K., & Schoder, D. (2009, August). Web science 2.0: Identifying trends through semantic social network analysis. In Computational Science and Engineering, 2009. CSE'09. International Conference on (Vol. 4, pp. 215-222). IEEE.
    Kleeb, R., Gloor, P. A., Nemoto, K., & Henninger, M. (2012). 
Wikimaps: dynamic maps of knowledgeInternational Journal of Organisational Design and Engineering2(2), 204-224.
    Gloor, P. (2010) Coolfarming - Turn Your Great Idea Into The Next Big Thing AMACOM, NY
    Gloor, P.  (2006) Swarm Creativity, Competitive Advantage Through Collaborative Innovation Networks. Oxford 

Network Ready Research: 
The Role of Open Source and Open Thinking

PLOS (Public Library of Science)

OVERVIEW: The highest principle of network architecture design is interoperability. Metcalfe's Law says a network's value can scale as some exponent of the number of connections. Our job in building networks is to ensure that those connections are as numerous, operational, and easy to create as possible. Informatics is a science of networks: of physical interactions, genetic control, degree of similarity, or ecological interactions, amongst many others. Informatics is also amongst the most networked of research communities and amongst the most open in the sharing of research papers, research data, tools, and even research in process in online conversations and writing. Lifting our gaze from the networks we work on to the networks we occupy is a challenge. Our human networks are messy and contingent and our machine networks clogged with things we can't use, even if we could access them. What principles can we apply to build our research to make the most of the network infrastructure we have around us. Where are the pitfalls and the opportunities? What will it take to configure our work so as to enable "network ready research"?

    Molloy, J. C. (2011). The open knowledge foundation: open data means better sciencePLoS biology9(12), e1001195.
    Whyte, A., & Pryor, G. (2011). 
Open science in practice: Researcher perspectives and participationInternational Journal of Digital Curation6(1), 199-213.

Learning Along with Others

Psychological and Brain Sciences
Indiana University


Overview: We have developed internet-enabled experimental platforms to explore group patterns that emerge when people can see and imitate the solutions, innovations, and choices of their peers over several rounds.  Experiments and simulations show that there is a systematic relation between the difficulty of a problem search space and the optimal social network for transmitting solutions. With more complex search spaces, people imitate: prevalent options, options that become increasingly prevalent, high-scoring options, solutions similar to one’s own solution, and during the early stages of an extended search process.  Historical records of baby names show that naming choices are influenced by both the frequency of a name, and increasingly by its “momentum” in the recent past.

    Goldstone, R. L., Wisdom, T. N., Roberts, M. E., & Frey, S. (2013). Learning along with others. Psychology of Learning and Motivation, 58, 1-45. 
    Wisdom, T. N., Song, X., & Goldstone, R. L. (2013). 
Social Learning Strategies in Networked GroupsCognitive science37(8), 1383-1425. 
    Theiner, G., Allen, C., & Goldstone, R. L. (2010). 
Recognizing group cognitionCognitive Systems Research11(4), 378-395.
    Frey, S., & Goldstone, R. L. (2013). Cyclic game dynamics driven by iterated reasoningPLoS One, 8(2)
    Roberts, M. E., & Goldstone, R. L. (2011).  Adaptive Group Coordination and Role Differentiation.  PLoS One, 6, 1-8.
    Gureckis, T. M., & Goldstone, R. L. (2009). How you named your child: Understanding the relationship between individual decision-making and collective outcomesTopics in Cognitive Science, 1, 651-674. 

Enculturated Cognition 

University of Macquarie, Philosophy

 OVERVIEW: What is the relationship between culture and cognition? In this talk I show how we might think of the development of recent cognitive abilities - such as reading, writing and mathematics - as being the result of high fidelity social learning in richly structured socio-cultural niches. The influence of representational systems and new technologies on our cognitive abilities for complex mathematical, narrative and scientific thinking should not be underestimated.

    Menary, R. (2013). Cognitive integration, enculturated cognition and the socially extended mind. Cognitive Systems Research, 25, 26-34. 
    Menary, R. (Ed.). (2010). The extended mind. MIT Press.

Social and Semantic Web: Adding the Missing Links

INRIA Research Center of Sophia-Antipolis


OVERVIEW: Since the mid-90s the Web re-opened in read-write mode and, almost as a side effect, paved the way to numerous new social media applications. Today, the Web is no longer perceived as a document system but as a virtual place where persons and software interact in mixed communities. These large scale interactions create many problems -- in particular, reconciling the formal semantics of computer science (e.g. logics, ontologies, typing systems, etc.) on which the Web architecture is built, with the soft semantics of people (e.g. posts, tags, status, etc.) on which the Web content is built. Wimmics, among other research labs, studies methods, models and algorithms to bridge formal semantics and social semantics on the Web. We focus on the characterization of typed graph formalisms to model and capture these different pieces of knowledge and hybrid operators to process them jointly. This talk will describe the basics of semantic web formalisms and introduce different initiatives using these frameworks to represent reason and support social media and social applications on the web.

    Nicolas Marie, Myriam Ribiere, Fabien Gandon, Florentin Rodio, Discovery Hub: on-the-fly linked data exploratory search,  Proc. of I-Semantics 2013, Graz, Austria
    Michel Buffa, Nicolas Delaforge, Guillaume Erétéo, Fabien Gandon, Alain Giboin, Freddy Limpens: ISICIL: Semantics and Social Networks for Business Intelligence. SOFSEM 2013: 67-85
    Nathalie Aussenac-Gilles, Fabien Gandon, From the knowledge acquisition bottleneck to the knowledge acquisition overflow: A brief French history of knowledge acquisition, International Journal of Human-Computer Studies, Volume 71, Issue 2, Pages 157-165, February 2013
    Guillaume Erétéo, Fabien Gandon, and Michel Buffa, SemTagP: Semantic Community Detection in Folksonomies, IEEE/WIC/ACM International Conference on Web Intelligence, August 2011, Lyon.
    Freddy Limpens, Fabien Gandon and Michel Buffa, Helping Online Communities to Semantically Enrich Folksonomies, Web Science Conference, April, 2010, Raleigh, NC, USA.
    Guillaume Erétéo, Michel Buffa, Fabien Gandon, and Olivier Corby. Analysis of a Real Online Social Network using Semantic Web Frameworks. In Proc. International Semantic Web Conference, ISWC'09, Washington, USA, October 2009

Mining Patterns from Linked Data

OVERVIEW: The Web of Data (WoD) can be seen as global database made of multiple datasets. These datasets are published separately — by using new or reusing existing schemas on the Web — yet get interlinked through either direct references between data items or indirect ones, i.e., identity links between items representing the same entity. The technology underlying the WoD, called Linked Data (LD) allows for the construction of a global data graph in which data items are vertices related by edges of different nature. Entities, aka resources, as well as their links, aka properties, are globally identified through URLs. Beside this inherent graph structure, parts of the WoD can behave as a traditional, i.e., relational, database. 
    After substantial efforts on the standards for publishing and querying of LD on the Web, and lately the interlinking and cleansing of sets of LD, the next big issue is properly extracting new knowledge from the WoD. Data Mining (DM) discipline is about finding chunks of useful knowledge hidden in the data. DM methods are roughly divided into predictive ones, where past experience is analyzed in order to guess what the outcome of an unfolding situation, and descriptive ones whose aim is to provide insights into the regularities in the data without a specific goal. Mining LD is both useful and challenging for many reasons, not the least among them being the rich and complex graph structure induced by a large variety of link types, the availability of domain knowledge expressed as schemas, and even fully-blown ontologies, the heterogeneity in the modelling goals behind individual datasets, etc.
    In this talk we discuss the implications of LD for a specific branch of descriptive DM, called pattern mining. We present two different mining methods for that are complementary in many respects. The first one targets usage regularities: It analyses the consumption of resources from the WoD by the users of a specific semantic application and summarizes it as behavioural patterns. The second one mines purely descriptive patterns from a dataset of multiple resource types, which are expressed in a WoD-compliant language and therefore supports ontology design.

    M Rouane-Hacene, M Huchard, A Napoli, P Valtchev, Relational concept analysis: mining concept lattices from multi-relational data Annals of Mathematics and Artificial Intelligence 67 (1), 81-108, 2013
    MH Rouane, M Huchard, A Napoli, P Valtchev, A proposal for combining formal concept analysis and description logics for mining relational data Formal Concept Analysis (vol. of LNCS), 51-65, Springer, 2007
    M Adda, P Valtchev, R Missaoui, C Djeraba, A framework for mining meaningful usage patterns within a semantically enhanced web portal Proc. of the Third C* Conf. on Computer Science and Software,138-147, ACM, 2010
    M Adda, P Valtchev, R Missaoui, C Djeraba, Toward recommendation based on ontology-powered web-usage mining IEEE Internet Computing 11 (4), 45-52, 2007

Bursts, Cascades, and Time Allocation

Northwestern University
Dynamics of Complex Systems and Networks Group


OVERVIEW: In this talk, I will present recent results on three distinct but related problems concerning Web Science and the Mind: bursts in the temporal distribution of words, cascading dynamics in diverse network systems, and human allocation of time. In each case I will discuss key properties, the principles governing these properties, and opportunities their modeling offers for monitoring and controlling complex behavior.

    Cornelius, S. P., Kath, W. L., & Motter, A. E. (2013). Realistic control of network dynamicsNature communications4:1942
    Altmann, E. G., Pierrehumbert, J. B., & Motter, A. E. (2009). 
Beyond word frequency: Bursts, lulls, and scaling in the temporal distributions of wordsPLoS One4(11), e7678.
    Motter A. E.. &  Albert R. (2012), Networks in motion  Physics Today 65(4), 43-48

Controllability and Observability of Complex Systems

Northeastern University
Center for Complex Network Research
Physics Department


OVERVIEW: The ultimate proof of our understanding of complex systems is reflected in our ability to control them. Although control theory offers mathematical tools for steering engineered systems towards a desired state, a framework to control complex systems is lacking. In this talk I will show that many dynamic properties of complex systems can studied be quantitatively, via a combination of tools from control theory, network science and statistical physics. In particular, I will focus on two dual concepts, i.e. controllability and observability, of general complex systems. Controllability concerns our ability to drive the system from any initial state to any final state within finite time, while observability concerns the possibility of deducing the system's internal state from observing its input-output behavior. I will show that by exploring the underlying network structure of complex systems one can determine the driver (or sensor) nodes that with time-dependent inputs (or measurements) will enable us to fully control (or observe) the whole system. 

    Liu, Y. Y., Slotine, J. J., & Barabasi, A. L. (2011). Controllability of complex networksNature473(7346), 167-173. 
    Zhao, C., Wang, W. X., Liu, Y. Y., & Slotine, J. J. (2014). 
Universal Symmetry in Complex Network ControlarXiv preprint arXiv:1403.0041.

You Can't Hide: Predicting Personal Traits in Social Media

University of Maryland
Computer Science


Overview: People share a huge amount of personal information online. With over a billion people on social media, this is opening up new abilities for researchers to predict a range of personal attributes that reveal how we live, think, and interact, even as people may try to keep this information private. This presentation will cover the methods and results in this area and argue for the future science and policy these advances demand.

    Golbeck, J. (2013). Analyzing the social web.
 Golbeck, J., Robles, C., Edmondson, M., & Turner, K. (2011, October). Predicting personality from twitter. In Privacy, security, risk and trust (passat), 2011 ieee third international conference on and 2011 ieee third international conference on social computing (socialcom) (pp. 149-156). IEEE
    Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behaviorProceedings of the National Academy of Sciences, 110(15), 5802-5805.
    Golbeck, J., Robles, C., & Turner, K. (2011, May). Predicting personality with social media. In CHI'11 Extended Abstracts on Human Factors in Computing Systems (pp. 253-262). ACM..