Visual Analytics for Discovering Network Structure Beyond Communities
OVERVIEW: To understand the formation, evolution, and function of complex systems, it is crucial to understand the internal organization of their interaction networks. Partly due to the impossibility of visualizing large complex networks, resolving network structure remains a challenging problem. In this talk, I will describe an approach that overcomes this difficulty by combining the visual pattern recognition ability of humans with the high processing speed of computers to develop an exploratory method for discovering groups of nodes characterized by common network properties, including but not limited to communities of densely connected nodes. Without any prior information about the nature of the groups, the method simultaneously identifies the number of groups, the group assignment, and the properties that define these groups. The results of applying our method to real networks suggest that most group structures lurk undiscovered in the fast-growing inventory of social, biological, and technological networks of scientific interest.
READINGS:
Nishikawa, T., & Motter, A. E. (2011). Discovering network structure beyond communities. Scientific reports, 1, 151Keim, D., Andrienko, G., Fekete, J. D., Görg, C., Kohlhammer, J., & Melançon, G. (2008). Visual analytics: Definition, process, and challenges (pp. 154-175). Springer Berlin Heidelberg.
Federico, P., Aigner, W., Miksch, S., Windhager, F., & Zenk, L. (2011. A visual analytics approach to dynamic social networks. Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies. ACM.
Li, K., Guo, L., Faraco, C., Zhu, D., Chen, H., Yuan, Y., ... & Liu, T. (2012). Visual analytics of brain networks. NeuroImage, 61(1), 82-97.
In the described method, are the results of human input always consistent. If there are inconsistencies how are they resolved by the algorithm?
ReplyDeleteAlthough I expect human inputs to be relatively consistent when the structural groups are clearly defined, it is possible to have variation across different users. It would be interesting to systematically study such variations.
DeleteJe ne crois pas avoir compris quelles sont les attributs utilisées pour calculer la distance de Hamming pour le clustering hiérarchique. Comment on définit la distance entre deux noeuds quelconque?
ReplyDeleteTranslation : I don't think I understood which attributes were used to calculate the Hamming Distance for the Hierarchical Clustering. How do you define the distance between two random nodes?
DeleteThe hamming distance between a given pair of nodes is calculated between the vector of group assignment of the two nodes, which summarizes the feedback provided by the user from the set of projections. The hamming distance in this case can be defined as the number of times the user has put the two nodes in different groups, divided by the total number of projections the user has examined.
DeleteIs there any benchmarking / comparison of the presented method with other advanced community detection methods, e.g. multidimensional scaling (http://en.wikipedia.org/wiki/Multidimensional_scaling)?
ReplyDeleteMultidimensional scaling is a method of visualizing a high-dimensional dataset of objects with prescribed labels by finding the low-dimensional projection that make these prescribed groups as visually separable as possible. The method I described is for solving a clustering problem, whose goal is to discover groups that are not known a priori, given a dataset of unlabeled objects.
DeleteThank you for the talk. My question for Professor Nishikawa concerns the relation between the structure and the dynamics of complex networks. How do structural analysis techniques such as those presented in this talk relate to the dynamics that might unfold in those networks? Is it the case that structural descriptions constrain possible dynamics?
ReplyDeleteYes, that's precisely the idea. An example is coherent oscillation of the instantaneous frequencies of node voltages in power grid, which I think is generally related to communities structure and/or symmetry of the network. There is still a lot of interesting work that remains to be done to make clear how the structure exactly constrains the network dynamics.
DeleteThank you for the reply!
DeleteYou're welcome!
DeleteA followup question: in cases where there are multiple possibilities of division, what are the criteria of chosing one upon the others.?
ReplyDeleteGood point -- It's true that human mind can oscillate between more than one possible pattern of division. The idea for our method is to appeal to the ability of the user to intuitively recognize the clustering pattern. I would thus suggest the user to choose the first one that came to his or her mind. To what extent this affects the consistency of the results is an interesting open question.
DeleteIn your talk you presupposed that number and space representations are connected one to another. Is there some study that shows that this is correct? Have we discovered such thing in human or animal’s mind?
ReplyDeleteThe "number" representation of a network in the node property space does not need to be related to any physical "space" representation of the network. The distance between two nodes in the node property space represents an abstract "distance" that measures the structural difference between these nodes. Does this answer your question?
DeleteIn the social network of karate clubs graph you showed, I'm not sure I understand how each node was classified as well-connected, peripheral, or part of a community. Some peripheral notes had up to 3 links, but there were "well-connected" nodes with as few as 2 links. There were nodes in the bottom right of the graph that looked similar to the "community" nodes in the top. Were there certain subtle rules for classification that I missed?
ReplyDeleteGood point -- there is indeed a node with only 2 links in the "well-connected" group and a node with 4 links in the "periphery" group. Note that the terms I chose to describe these groups provide descriptions of a *typical* node in the group. The description is also approximate in the sense that the separation between the groups is best expressed by a combination of node properties that includes more than what I used to describe in my talk (although others make much smaller contribution to the separation). Depending on the goal of the user, these nodes you pointed out may be considered a type of classification error, and could be removed manually by the user.
DeleteDear Takashi, Thanks for your beautiful presentation ! I have three questions: (1) Which are the visual analytics methods that are performant, easy to apply and easy to combine with ontologies? What do you think about fuzzy hierarchical clustering? (2) Did you have experience of Conditional Random Fields to model and visualize sequences of interactions, is it so a state of the art excellent model ? (3) What’s the difference between hierarchical clustering and decision trees?
ReplyDeleteThank you for your interest! (1) Visual analytics, as I understand, is more of a philosophy/strategy of data analysis and decision making, in which analytical methods are integrated with visual interaction with a human user, than a set of specific methods. I don't claim to be an expert in visual analytics, but my impression is that people develop specific platforms for individual classes of problems they are trying to solve (which could be those related to ontology). I am not too familiar with fuzzy versions of hierarchical clustering. (2) No, I don't have any experience with Conditional Random Fields. (3) A decision tree is a way to visualize relationships between events, decision, and consequences as a tree, and I don't think it is related to clustering problems. Hierarchical clustering solves a clustering problem and produces a hierarchy of nested groups of objects, which can be visualized as a tree (which in that context is called a dendrogram).
DeleteProfessor Harnad's question about subitizing and numerosity:
ReplyDeleteSubitizing but not estimation of numerosity requires attentional resources, David C. Burr.
Thank you very much for the reference!
DeleteI like the fact that the visual analytics are directed to improve human analytical reasoning and this is thus this interaction what leads to the identification of interconnected nodes as groups. But, isn't it possible that individual human performance variability affects the analysis? Has the interobservator variability/agreement on this method been measured?
ReplyDeleteThis is indeed an excellent point that needs to be addressed in future work, as I mentioned in response to a question after the talk. I do believe, though, that human inputs will be fairly consistent across different users if there are well-defined structural groups in the network.
DeleteSubitizing and Numerosity
ReplyDeleteThank you, Stevan, for this interesting point and for giving me an opportunity to interact with the great audience, both online and offline!
DeleteSo, I can use your software in order to create clusters groups? Besides Matlab, do you know another software to create and analyze clusters groups a easy way? Thank you TAKASHI NISHIKAWA.
ReplyDeleteYes, the software allows you to apply the method I described in the talk to a network dataset and identify structural groups. Unfortunately, we currently have only Matlab versions of the software. The one available online is a demo-version that allows you to play with a pre-selected set of network datasets, but not to import new datasets. The full (Matlab) version with the functionality to import arbitrary network datasets is ready to be shared with anyone interested -- just let me know.
DeleteThe method seems robust and my question is about time processing, if you use a multiple projection, this method will take more time to infer (depend on the number of projection). Can we find link between number of projection, network scale and satisfied grouping result in order to reduce time processing ?
ReplyDeleteYou're right, there is a fundamental trade-off here: the more projections you use, the better the chance of capturing structural groups, but the more time it takes to complete the process. The question you asked about the relation between these relevant parameters points to an excellent directions for future research.
DeleteHow does your visual analytics method hold against other similar visual analytics methods? We (the LACTAO unit of the LANCI: www.lanci.uqam.ca) have been doing some naively on our side to extract concepts from texts, but we never had time to survey the litterature on it.
ReplyDeleteIt is hard to compare with other visual analytics methods because, as I mentioned in the talk, the method we developed was inspired by the general approach of visual analytics, but in an atypical way. We developed our method specifically to solve a clustering problem for network, but it is applicable to general clustering problems. The problem of extracting concepts from texts, however, sounds like a harder problem that doesn't quite fit into the class of clustering problems.
DeleteHave you applied these techniques to sets of data which we have already divided into groups? If so, do the results show groupings similar to what we have already done, or do they present alternative groups that are equally or more simple to analyze? For example, we could group facebook users and see if they are divided into the online groups they are members of.
ReplyDeleteOops, I think I posted as a separate comment, not as a reply to your comment, so I'm reposting it as a reply:
DeleteSorry, I apparently overlooked your comment earlier. The networks of karate club and American college football teams, which I presented during the talk, are such examples. The technique led to interpretable groupings of nodes that are different from the known groups. We have not pursued the facebook networks, but they would be an interesting example.
Sorry, I apparently overlooked your comment earlier. The networks of karate club and American college football teams, which I presented during the talk, are such examples. The technique led to interpretable groupings of nodes that are different from the known groups. We have not pursued the facebook networks, but they would be an interesting example.
ReplyDeleteA method that humans use to infer structure in the world is by making analogies with existing concepts or categories. Perhaps we can use metrics based on similarity between networks to facilitate their recognition and description of them, especially between networks in different domains such as biological, social, and technological.
ReplyDeleteI also wonder if multi-dimensional data could be perceived better by using multiple modalities, like audio to represent temporal dimensions.
I think your suggestion of using (automated) similarity metrics to facilitate (human) recognition hits exactly the main point of visual analytics. Such combination could produce results that neither of them can produce alone. Using audio sounds like a potentially interesting idea, although I don't yet see how exactly to use it to represent temporal dimensions. On the other hand, it seems easier to me to think of a way to use animation (temporal dimension) to represent some other dimension.
DeleteThis comment has been removed by the author.
ReplyDelete