The Promises and Pitfalls of Latent Attribute Inference
OVERVIEW: The composition of a group determines much of its behavior (are people old or young, PhDs or illiterate, artists or scientists?). As a result, organizations, governments, and companies are deeply interested in being able to quickly learn the makeup of groups. In order to approach this problem, we've been developing technologies for inferring the demographics of Twitter populations from the textual content and networks that the users in them produce. Our methods stand out as the most accurate in the literature. In this talk, I'm going to give an overview of the latent attribute inference problem, discuss the advances that we've made in solving it, and highlight some of the big issues that still need to be tackled.
READINGS:
Ruths, D. A., Nakhleh, L., Iyengar, M. S., Reddy, S. A., & Ram, P. T. (2006). Hypothesis generation in signaling networks. Journal of Computational Biology, 13(9), 1546-1557.Ruths, J & D. Ruths. (2014) Control Profiles of Complex Networks Science 343: 1373-6
Quand on veut extraire des informations tel que les jouets que les gens veulent pour Noël, il serait intéressant d'utiliser l'extraction de patrons NLP à l'aide d'apprentissage machine. Selon des données d'apprentissage provennant de Twitter, on essaye d'inférer les patrons provenant de ces données pour tenter de trouver de nouvelles données.
ReplyDeleteTranslation : When you want to extract information from Twitter (like, what people what for Chrismas), it will be interesting to use NLP pattern extraction using Machine Learning. You build a training set containing tweets where people say what they want for Christmas extracted manually. From this training set, you can infer patterns and use those patterns to find the gifts people want for Christmas in Twitter.
DeleteIt’s scary to know how much information you can get from social media, even if people hid a lot of information about themselves. People you are friend with tell so much about yourselves. But you have to presuppose that people on twitter follow users that resemble to them, which is not always the case. Someone could like to follow political adversary for example. Do you take this possibility into account in your study?
ReplyDeleteDear Derek, Thank you very much for your very fruitful presentation ! I have one question : How do you think it’s possible to apply your methods to scholarly big data for the improvement of the accessibility of scientific contents by predicting the difficulties with which not experts understand them and by adding or building new cognitive or knowledge resources?
ReplyDeleteVery interesting talk. My question for Professor Ruths concerns what to do with the information obtained through latent attribute inference once we have obtained it. In what sectors do you see this technology being used most in the future (business, government, etc.)? What kind of social and/or political measures, or again platforms, could be used to coordinate, say, supply and demand, once this information has been collected.
ReplyDeleteIf there is a demand for Twitter consensus information, the information itself will change. Perhaps similar to how Google page rank algorithms sparked companies to focus on search engine optimization. The automobile lobby might create bots to follow the traffic news blogs and in turn offset the true cyclist-to-driver ratio and convince policy makers that we need to remove bike lanes to make room for trucks.
DeleteHow a demand for information on gaming system posts could change future blogging, I'm not sure.
People are more incline to use the social tools to express they self and theirs opinion, I found the idea great to use twitter as survey tools. May be we can introduce this use in the clinical research in medicine because it seems to me it easier to response by patients with twitt than to fell a big survey.
ReplyDeleteAbout prediction, I think it’s really a new and very promoting way to exploit people’s needs or wishes, because people are more stress less with social media, but there is a lot to do to avoid redundant twit and to clean all of this mass of information.
Thank you DEREK RUTHS. It is a good beginning. I congratulate you. Now, to predict the behavior human and to predict what they wants to say, we have to deduct from many layers, we need to pass some filters and we need to analyze from different angles. Culture, sex, sexual orientation,.., expression way. Because since everyone has their own way of expressing themselves. I would like to discuss about it. How can I contact you?
ReplyDeleteI find it more than a bit concerning that people are being labelled in general whatever is the reason of doing that. It is not about the accuracy of labeling but the very reflection that human individuals are "quantified" by a vector of a few numbers. Social systems are largely reflexive which means that the very process of labeling is going to affect culture and the way people are describing themselves and fellow humans. It is true that humans form simplified representations of their fellow humans since the beginning of civilization but the extent that this is happening today and the extreme abstraction utilized is unprecedented. It is important to note that this concern is not about privacy - that everybody can derive my sexual preferences, what coffee do I drink and my level of income. It is about the very image of the human as it is drawn by utilizing such technologies.
ReplyDeleteThanks for the talk!
You advocated for building your own algorithms for data mining. I know that some big companies use software like SAS for data mining/social analytics. What is your view on these kinds of software packages? I used SAS a bit at my last job, and my impression was that it operates too much like a black box, where I couldn't fully customize constraints, and this makes it difficult to get meaningful results.
ReplyDeleteThis comment has been removed by the author.
Delete1) I'm curious how much more the Twitter consensus data would match that from formal consensus if we accounted for the different in ages, socioeconomic status, and other variable.
ReplyDelete2) Another interesting point Derek raised afterwards was the mismatch between Facebook and consensus data on language. While only 10% of Montrealers claim English as their first language on the consensus, 40% of Facebook communication Montrealers do is in English. This may have implications for language laws.
Preface: I really, really enjoyed your talk and was very impressed with the research you do. I ask my question because I am interested in making a difference-- I do not intend any hostility toward your work.
ReplyDeleteYou mentioned many times that today's biggest problems have to do with resource allocation. But if we look at the demographics of Twitter users, are these really the people who need resource allocation? Someone in another talk mentioned that Twitter users were primarily urban young adults, but it can be broader than that-- those who have reliable internet access or those who have the time and leisure to tweet about wanting Justin Bieber for Christmas are in the tip top of the world population resource-wise.
I guess I'm just curious if you have any ideas for how your research (or maybe any social web research) can help those (a)without social media, (b)without internet, or (c) who need resources the most.
This author has a similar opinion: "The Unexotic Underclass," MIT Entrepreneurship Review.
I'm curious as to whether twitter users, if they knew that you were using their data to allocate resources, for example, would start trying to manipulate the functioning of the system. I don't think this is so far fetched considering that, in your example, companies were able to throw off your results. Twitter users can also see what's trending and may be motivated to jump on board with some trend on twitter that may throw off the intended functioning of a system attached to these data. For example, a group of bikers may want a city to install bike lanes and find ways of encouraging twitter users to post specific tweets making it appear that they bike to work every morning when in fact they don't by using a contest (post "I love to bike to work" to win a free iphone). Is there some way of controlling for this besides manually removing erroneous results after noticing what's happening (like you did with the fantastic tshirt example)?
ReplyDelete