Sunday 8 June 2014

The Social Data Revolution: Are We Ready?




OVERVIEW: Social Data and Big data are being billed as the next big thing – the key to gaining a competitive advantage and increasing profitability for companies both big and small. The increasing importance of data analysis in decision making has boosted demand for employees with analytical skill sets, popularizing career paths that lead to big data jobs.  But is enterprise ready for the shift?  What are the challenges facing companies in the next 5 years when integrating data based decision making ?  We will explore some of these broad issues in this talk.
    READINGS


    Theoret, Claude G., and Guido Vieira (2012) System and Method for Performing Analysis on Information, Such as Social Media, U.S. Patent Application 13/705,940, filed December 5, 2012.
    Videos
    Start-up Fest

27 comments:

  1. You said the data acquisition problem has been solved by tricking users into giving it away for free. Is there any hope for people who don't want to be part of the data acquisition? Other than completely leaving the internet?

    ReplyDelete
  2. Some numbers:

    If 20,000 (0.05%) twitter accounts post 50% of all tweets, each individual account average 25,000 tweets per day (17 tweets per minute, every minute of the day). This smells of bots.

    If 220 M emails are sent every minute, and there are 4 B internet users (quite an overestimate) this averages about 80 emails/person/day. Again, bots and spam.

    ReplyDelete
    Replies
    1. His presentation said 50% on content read and shared on Twitter is created by the top 20,000. I'm sure there are plenty of bots who create tweets which are never read or shared.

      Delete
    2. Take a look at the original study from the Mashable post:

      http://mashable.com/2011/03/28/twitter-study-consumed/

      Delete
  3. From wikipedia:

    The Chinese word for "crisis" (simplified Chinese: 危机; traditional Chinese: 危機; pinyin: wēijī) is frequently invoked in Western motivational speaking because the word is composed of two sino-characters that can represent "danger" and "opportunity". However this analysis is fallacious because the character pronounced jī (simplified Chinese: 机; traditional Chinese: 機) has other meanings besides "opportunity".

    ReplyDelete
  4. It’s scary to learn that there is so many robots using social media. This gives false information to people that doesn’t know this. We have been talking about influence that social media has on people and also that humans tend to imitated others. So through social media, if it’s not already the case, people can control many people. It could be very dangerous!

    ReplyDelete
  5. L'information que vous avez extrait du Web et les médias sociaux, de quelle façon vous la donnez aux clients? Est-ce simplement un API qui répond à des requêtes comme Google (à la différence par exemple, que vous retournez des Tweets de Twitter qui répond à notre requête)? Ou bien vous effectuez déjà une analyse des données? Comment prévoir le type d'analyse que les clients auront besoin?

    En gros, qui voudra utiliser l'API?

    ReplyDelete
  6. This was a really fascinating talk. My question for Dr. Théoret concerns the analogy between the oil and data industries. Since the 1970s, we have had a number major energy crises, where oil is concerned. Are there signs that there may be a crisis analogous this one in the field of data? If not, what protects the data industry from such a crash?

    ReplyDelete
    Replies
    1. I think all it will take to burst this bubble is a government regulation restricting non-consensual data mining.

      Delete
  7. You said you need a minimum of 5 years (Master's degree equivalent) to have the data mining knowledge required to "ride the wave" of big data. Do you have any ideas about how to make these skills more accessible? Especially in the United States, the price tag of university education encourages economic inequality and stagnation.

    ReplyDelete
    Replies
    1. I'm sure people said the same thing about high-tech skills. But now a lot of IT is outsourced all around the world.

      Probably the same with industrial technology.

      Delete
  8. Very interesting talk and I thank you for that, My question is : in your talk You highlight only the measurement of social analysis information , but what about collective intelligence, collaborative work, Global brain …is there any chance that are going to be a part of this “big Bang “ ?

    ReplyDelete
  9. What can be done in order to change the machine "age + sex + whatever => interest?" to sell us more stuff which drives the dynamics of the web? I do not see how more and better data crunching can not solve this..

    ReplyDelete
  10. Thank you CLAUDE THÉORET. So I think for your question: Are We Ready for The Social Data Revolution (No/Yes)? Your answer will be NO. And you propose to work as team between universitys. That's right?

    ReplyDelete
  11. Dear Claude, Thank you very much for your beautiful presentation ! I have one question on the theme of veracity. It seems to be formalizable/computable. But how to link it with, maybe less formalizable/computable notion of relevance. (1) What definition will give you of it in terms of machine learning and in terms of description logic? (2) Which criteria do you think is more important to progress in text analytics: (a) Relevance>Veracity; (b) Relevance<Veracity.

    ReplyDelete
  12. That was an impressive presentation! Great data :-)
    1. Do you claim that social data will provide a significant number of jobs where most other fields are moving towards labor free productivity?

    2. Can you foresee future limits to social data proliferation? After all, when all humans be connected sometimes in the future, even if they produce data 24/7 (full life logging), at some point there will be saturation. On top of this there will be the internet of things that will be the next big producer of data. Can we forcast a time where the bulk of data will not be produced by humans anymore?

    ReplyDelete
  13. (con't) what role do you see to AI in the future of refining social data?

    ReplyDelete
  14. Analysing on real time the health issue in Montreal could not more precise then the admission information in hospitals and not all people are going to write about their health problem, in this case what credibility can you give to this measurement

    ReplyDelete
  15. You show us the start falling of some social media ( E,g fliker..), do you think that the social big data are going to depend strongly of the interest behavior of people and if it is , how can we use these sources of information if we expect that can collapse in few months

    ReplyDelete
  16. Something that has always intrigues me is the psychology behind how appealimg or non-appealing a social network may be to humans as a group and also to sub-groups of humans. Besides our innate need to share, to be heard and to hear about other people's luves or ideas, what do you think was the main characteristic of social media that led us to pursue this explosion of data sharing? Or the main trait of human needs that led us to it? In other words, WHY do you think this Social Data Revolution happened? What pushed us towards it?

    ReplyDelete
  17. You said in your presentation that social data is useless to us until we know how to refine and process it. Then you mentioned that there will be a big shortage of qualified professionals in this field. Do you think the solution to how we process and refine the social data will come incrementally, from more people working on the problem? Or do you think it will be a few players who revolutionize the way we process and understand this data?

    ReplyDelete
  18. Some people in the academia find it's very hard to find suitable partnerships with the private sector. They complain private actors will often have little regards for the needs of research or formation, and will try to get exclusivity on all the work, even when they never paid a dime. Why is it so hard? How can we get private sector and academia to work together to common benefit?

    ReplyDelete
  19. I wonder how people calculate the amount of social data produced over the history of humanity? Is this number normalized by the number of people? It seems to me that the "big data hype" is related more to the fact that the data is getting accumulated into one place (the web, Twitter, hypertext, etc.) in accessible manner than the fact that there is more data. After all, all living organisms were always generating social data, but it was much more distributed and never aggregated into one giant fire-hose threatening to wash us away.
    So the problem seems to be that we want to see the whole stream at once and not related just to the sheer amount of the data. Maybe some sort of distributed processing (i.e. directing data to relevant 'locations' of the net) rather than trying to capture the whole stream at once could help?

    ReplyDelete
    Replies
    1. I think Claude is talking about documented social data. Unless something is documented (put into writing or numbers) it is not really data. Considering that literacy was never widespread until recently, I'm at ease believing we are now producing more data in a year, than in history.

      On another note, future historians may have troubles sifting through big data to deduce facts and sentiments of our time. Current historians have much fewer documents and artifacts to create a story with.

      Delete
  20. On the career opportunities of data mining.

    I'm sure there are currently lots of job opportunities in data mining and that some of the skills are transferable to other jobs. However, the future of data mining that Claude paints lacks definite support. No one has ever successfully predicted stock market trends over more than a few years. In the discussion period, Claude countered this argument by saying that we can predict what research area is going to win the Nobel prize. This is a poor comparison because the predictions are made shortly before the competition and do not extend ahead for decades.

    A number of events could burst the data mining bubble: government imposed regulations, a devaluation of the information, excess garbage data produced by bots, excess competition, and so on. If we data-mine economic history, it becomes apparent these predictions are spurious.

    I'm not suggesting we don't look into careers in this field. Rather, I'm saying we could benefit from maintaining realistic views.

    ReplyDelete
  21. My concern with this trend towards modelling and decision making based on available data is that it seems like a reactive approach. It could potentially squash innovation. Steve Jobs didn't turn Apple into the wild success that it is by basing his decisions on consumer data, he created something that people wanted before they even knew that they wanted it. Perhaps we can get to a point in our modelling where this is possible. But predictions from a model based on available data is not the same thing as creativity and inspiration.

    ReplyDelete