Andreas Weigend, Social Data Revolution | MS&E 237, Stanford University, Spring 2011 | Course Wiki

DF6: Thomson Reuters


Note: I took a snapshot of the page at 12:15 on 17 May and sent it to the Course Assistant for grading [aweigend].


  • Tom and James are interested in learning from each of you one insight that you gleaned in this class that you did not now know before this quarter.
  • Furthermore, Tom ( is an avid reader (think Reuters!), so if you have any cool links about social data to share, please add them to Part 2 below.
  • Finally, think about how you get your news: Friend feeds (fb, twitter), traditional media (Stanford Daily, TV, radio), blogs (techcrunch), etc.,.... Nothing to add here, this is to be ready for a conversation about the implications of the shift in information consumption for ThomsonReuters.


  • Add one insight in the form of one bullet point (a couple of lines, up to a crisp single paragraph) with your SUNetID in square brackets at the end (see example below).
  • Please read what other students have posted below and add your point at an appropriate location. Feel free to comment on points made by others below it as a sub-bullet (using Tab).


  • Any insight posted here before the deadline will get 2.5 points -- we believe that putting your name behind your point will assure quality. If it sparks good comments, you can get up to 5 points total.
  • Any link posted that the teaching team finds useful will get one additional point.

1. Insights

  • uses social data to help customers make better decisions for themselves [aweigend]
    • Due to their particular contexts and interests, proponents and naysayers of social data collection and usage usually talk right past each other. Taking context further into account might reveal some insights. Consider: On the one hand, examples such as *Amazon purchase suggestions* (see aweigend above) and the video a bit below on the disconnect between old fashioned advertising and consumers suggest that bringing consumers into the data game will humanize the relationship between buyers and sellers. Enlightened consumers will share information with enlightened sellers who will make enlightened purchase suggestions, etc. On the other hand, the TED talk posted below suggests that customization based on consumer data builds users into an information bubble. We end up only seeing what we likely want to see; we are never challenged. (This has been shown by U. Chicago's James Evans to be a problem for academics using Google Scholar and browsing online journals or ISI). This latter picture suggests that social data based-customization, rather than humanizing and enlightening, narrows and impoverishes. In nearly every SDR class discussion, proponents of each perspective have found it difficult to produce synthetic ideas that take into account the very different dynamics attendant on the introduction of social data; conversations have been reduced to "for" or "against" the Revolution. Perhaps it is time for a more nuanced agenda. [jblyon]
  • People are irrational. We freely post data to Facebook for the world to see, yet we are hesitant to give the same data to the class. Nonetheless, we all eventually relinquished some data for the sake of HW3 [dereklim]
  • The scale of modern social data/ networks makes even low probability events nearly certain for some individual in the network (most notably for high risk incidents like data breaches) [klemens]
  • When an individual is a drop in an ocean of 500 million users, he/she is nearly anonymous and the exposure of their data is nearly harmless. It is quite different if the individual is one in a group of 51 people who are close enough to identify each other, but far enough to not be friends. It is the grey area between anonymity and intimacy. [mile]
  • The distinction made in class about two different identities of people , one in a personal life and the other in a private life is an important one. The reason why many people did not want to share their Facebook data in class was most probably because of the said distinction in identity.[skurian]
  • As per usual, South Park captures that distinction really well. "Stan, Poke Your GRANDMA!!!" [ykcelik]
    • What if Facebook were offline? Here's a funny take on the debate, showing how one's willingness to share information is very much based on the context (online vs offline environment). The people are shocked at being asked to share information to strangers on the street, even though they already share it with over 500 million strangers online. Sounds familiar! [mcd3]
  • Actually, we are probably facing a radical transformation of the concept of privacy. Quoting Mark Zuckerberg: "The greater openness social networks bring to human interactions is probably the greatest transformative force in our generation, absent a major war." Social norms appear to be shifting, people are willing to share information about themselves more widely, all the more so if they can benefit from enhanced services in return. [leogrim]
  • The study of influencers in social networks is key to knowing where to put social media marketing dollars so as to maximize return on investment. [rvarshne]
  • The notion of influence on the web is very ambiguous. For one, it seems that everyone has an intuitive definition of influence but the challenge lies in trying to quantify this metric. There is no objective standard but rather there is a race by various private companies to produce the golden standard influence index. Also, the very existence of “influencers” on the web is disputed. Despite the rise in popularity of this concept with much credit to Malcolm Gladwell, there is increasing popularity in the idea that modeling social networks via transition probabilities (e.g. using Markov chains) is a more accurate representation. [davidkim]
    • Indeed, DeGroot provided an early model of how a group might reach consensus by representing belief updating as a Markov process. From this model, we can find a metric for influence. See this blog for an example and here for the original paper. [csholley]
  • The separation between private access and public access to your social data is a permeable dynamic. Social data in forms of pictures or other media uploaded online for the purpose of sharing with friends or other acquaintances in privacy could be later used to stamp a social identity of you in public light. Hence it is imperative to be cautious about the data one puts online, since future implications of such data is unpredictable. [skurian]
  • Will people discovery algorithms that are deployed on social and dating sites ever be as good/fair as serendipitous in-person people discovery? Do we really want to know a person's complete profile before we meet them for the first time? Would this not lead to preconceived notions of the person and not give them a fair chance of giving a first impression? [rvarshne]
  • 1) Gamification and viral marketing are important, but they don't mean much if you don't have a good enough product. People will come because they were "marketed", but they won't stay for a product which they don't need. 2) Think about who are your users and what they need before building a product. Take Branchout for instance: does recommendation from your Facebook friends tell much? It won't be an added value if HRs don't believe in them or if job seeker don't believe that HRs will believe them. To sum up, build a good product before dashing into the social media marketing wave. [yanghe]
  • Gamification is a great concept, but it is difficult to distinguish the line is between creating a great experience and invading private life. Here is a great talk on what might happen as the virtual and real world converge![anomikos]
  • The role of artificial intelligence in constantly improving the search/recommendation relevancy. As the amount of information that must be handled/generated exceeds the capacity of a human being, the algorithm that can learn and improve overtime is crucial. For example, although Quora focuses on the quality by using human labor to monitor, Tracy concedes that they are looking for ways to establish an automatic filter. [paoj]
    • Indeed many industry experts I have spoken to have claimed that "AI has been a long time coming", but has never really fulfilled its imagined potential - so far. [misrab]
    • Yet, we can already be amazed at what companies manage to achieve thanks to our data. Highlighting these achievements would probably alleviate some of the privacy concerns or as The Economist puts it: "If users knew how the data were used, they would probably be more impressed than alarmed". Also see related readings from The Economist and Slate. [leogrim]
  • The continuous trend of social geotagging will dramatically increase the volume of hyperlocal contents. Instead of leaving these data stored by each application separately (e.g. Twitter, Google Places, and Foursquare); a public universal platform can boost more usage of geo data. For example, the community oriented ad networks. On the other hand, a standard platform may be limited by a type of scarce resource, like the situation of IPv4 address space exhaustion. [yliu717]
  • Web 1.0 was all about broadcasting; very unidirectional. Web 2.0 not just introduced the notion of recommendation systems and personalization (also part of Web 3.0) but also allowed people to interact with companies the way they interact with their friends through social media. The biggest change is that as online conversations become more people-centric, companies are shifting away from the center of attention and becoming less relevant. [ykcelik]
  • The amount of social data created every year is growing exponentially at an incredible rate. While data is being created for almost everything we do, the information is now readily available (What are we buying at the supermarket? How far are we scrolling down the webpage? What are we searching?), but the mining and storing of the data has become the new problem. [mcd3]
  • A wealth of information entails a poverty of attention for human beings. Therefore the future of algorithms is to "do the thinking for us": we need to put our efforts into building processes to extract and analyze the information that we need, rather than doing it ourselves. [lecat]
    • Here is a great insight from the Economist on the subject: "the mind can handle seven pieces of information in its short-term memory and can generally deal with only four concepts or relationships at once. If there is more information to process, or it is especially complex, people become confused." [lecat]
  • A/B Testing is an effective tool to find timely, local optimums in product design and marketing. [csholley]
  • Despite the fact that the Internet "grew up" with users hiding their identity in chat rooms, on message boards, etc... it is the very opposite of this notion through confirmed and real identity that true social interaction has thrived on the web. Without this "real" identity, services such as Facebook, Quora, Amazon recommendations, LinkedIn etc... would not be able to exist in their current incarnations. [mcreilly]
  • In today's attention economy, simple design and gamification of the user experience are becoming more imperative for the success applications which rely on the voluntary socialization of data. People enjoy being told the next steps to take when using an unfamiliar application, and celebrating "milestones" in a user's willingness to share is what leads to the successful sharing and accumulation of data en masse. [babchick]
  • The big picture of the social data industry is as follows: A) Data gathering infrastructures with a wise incentive schemes such as Face Book, LinkedIn B) Data analysis and taking action on the data. Recommendation systems, people discovery, best deal offers and etc… such as Geomium. C) Visualizing the data for getting insights and decision making such as [alirezaf]
    • The order of this list is very important: without data gathering, there is no analysis. Without analysis, there's nothing to visualize. The whole chain starts with the incentive systems. You can be very successful in the social data industry even if you fail B and C, as long as you're very good at A. [sboer]
  • The scale and type of social data being created are unprecedented: not only do we produce more information than we can store, we as people can only process so much. Thus a lot of this information never touches "human hands", instead bouncing between intelligent sensors and devices. 'Tis a Brave New World. [misrab]
    • I'm all for the more data the better, but is there ever too much? Some people go crazy from information overload. [bgarg]
  • There was a time there was a certain 'sketchiness' or stigma associated with online dating or having an online dating profile. However, that is going away really quickly. [addy]
  • The topic of data ownership is highly contentious. Even though we are sharing our own data to networks such as Facebook, LinkedIn and Twitter, it is not us that actually own the data. While we maintain control over what data is being shared, once shared, the data might as well be public. [awcheng]
    • It's interesting that many people in our class said they would share their own facebook information with the class but were worried about sharing their friends information. But their friends have already chosen to share this information with facebook, so why do we feel the need to protect what they have already shared? [emedjuck]
  • The sheer volume of social data that is collected about each web user (exabytes). The social data is used for wide and varied purposes ranging from helping consumers make better decisions about what they buy to catering to tastes as determined by the data, which includes in the online dating scene and "islands" of Facebook friends. [shinjini]
  • Social data can, and should, be used to create products that get better over time. This means not only incorporating user-generated data into the system to provide a better service for individuals or groups but also allowing individuals to edit/manage their own data. [tholley]
    • This goes hand-in-hand with the exponentially increasing amount of information being created, begging for intelligent automatic assimilation and processing. [misrab]
  • Social data has the power to change individual behavior. For example, if you had access to aggregated statistics on the riskiness of every investor's portfolio, your investment behavior may change after learning of the relative riskiness of your portfolio (i.e. "Your portfolio is riskier than 85% of all investors..."). [jhyu]
  • Collecting social data is a two-way street. The best way to get the user to willingly and accurately provide data about themselves is to offer some value in return. Ideally a feedback mechanism would be built in to reward data sharing and propagate the trend. [bnixon]
  • Interesting take on the social effects of an increasingly personalized web:

    • Thanks for posting this video. The idea of people becoming isolated because of the highly personalized nature of the information that the web returns to them is really interesting - i.e. two users type "Egypt" but get totally different results from the same search engine. I also really liked the comment by Eric Schmidt of Google: " It will be very hard for people to watch or consume something that has not in some sense been tailored for them". [evapi]
  • With the unprecedented social data contributed by each user online, two things are becoming more and more important: 1) offering platforms to facilitate users to contribute and share data. The information production-consumption relationship has been switched to information collaboration. (2) designing user-driven metrics to deeply understand users in order to better profile users and recommend products or information to them. [yszhou]
  • The SDR has increased information transparency to the point where companies have had to shift from corporate metrics to consumer metrics (i.e. Amazon helping costumers make better decisions for themselves) and that's a good thing! Ancient business models built on information asymmetry (cars, insurance, real estate) will soon face huge dangers from newcomers that embrace the revolution and comply with a new set of costumer expectations. [bbulcke]
    • I like this point about how asymmetry has essentially flipped businesses around and thus given opportunities to the consumers [bying]
    • The newfound leverage of consumers to easily share their perspectives of a company or product may sometimes result in less consumer surplus, as these preferences are better understood by producers, who in turn reset prices accordingly (up). [csholley]
  • The Social Data Revolution has changed the way advertisers create their marketing campaigns today - check this out: Microsoft Digital Advertising commissioned the making of a new controversial commercial called the "Breakup". The commercial makes some statements about the relationship between today's advertiser and today's consumer, and clearly challenges the advertisers by showing how they have stopped listening to their consumers. Check the "Bring the Love back" blog here and the video [evapi]:

  • As more social data becomes available online, the more we are able to manipulate it for our own purposes. This leads to a problem of measurement, which has become increasingly important and difficult to answer. What defines success for a crowdsourced Q&A on Quora? How can we measure influence on Twitter? Are people-discovery mechanisms helpful? The more we are empowered by social data, the more algorithms we need to develop to understand if we are doing anything useful. [yiliu2]
    • Related: As more of what defines who we are becomes available through data online (e.g., opinions on twitter, purchasing habits on Amazon, friends/social events on fb, math proficiency on Khan Academy, interests on Ted), it will become increasingly important to aggregate across these sites to paint a picture of one's identity. However, the algorithms (incl. what variables, with what weightings) to do so are nascent, unstandardized and subjective. The bodies and companies that build these algorithms will significantly shape what it means to have 'power' online, and therefore, shape online behavior (similar to how PageRank prized hyperlinks & popularity to proxy relevance). [aegupta]
  • Data is everywhere and more and more people are realizing this importance of this. In TEDxSV this year, multiple speakers talk about it. Here is a video that was shown. It might not be totally related but I thought people will enjoy it. [lyue]
  • Social Data does not = Social Media. Social data is not a marketing medium to reach consumers but rather a way to improve not only products and customer relationships but also internally as a company. [abriano]
  • Governments have huge amounts of data that until have been kept mostly secret. These data have the power to change the world. Here's an interesting video by Tim Berners Lee, inventor of the internet, about the US releasing government data: [lduplan]
  • Competitive advantage in organizations comes primarily from the data, not from the proprietary algorithms used to manipulate that data. Too much focus on algorithms results in losing sight of the data itself. You can have the fanciest algorithm in the world, but if your underlying data is crap, you are just deceiving yourself. [bgarg]
  • There are few things in life that appreciate with time: land, wine, and now data. Most companies and products depreciate with time but with companies utilizing more and more data about the customers and business, it creates a unique opportunity for these companies to build solid barriers to entries into their core competencies. It was also fascinating to learn that Google's use of data has built new products that weren't possible before (i.e. translation and spell check). [bying]
  • The emergence of social networking has not only changed the way people behave but also the way corporates function. Marketing has since shifted from outbound (i.e., corporate businesses usually reach out to customers) to inbound (i.e., corporate businesses now find ways to get found by customers and their friends). Targeted business or advertising or viral marketing has become more effective and successful, especially for small businesses like Zynga, as people share more and more of their personal or non-personal data. Social networking has become a medium not only for people to communicate but also for businesses to truly connect with customers. Check out Clara Shih's The Facebook Era (Facebook Marketing) in Reading and on youtube [pasi]
  • The plentiful of social data can differentiate information presented to a target customer. Depending on his age, education, pagelikes, groups involves, interests and even his friends likes extracted from SNS, such as Facebook, Twitter and Quora, a merchant will choose the best approach to market to this customer. The personalization has changed the way of communication between a merchant/service provider and customer. For example, just yesterday, launched new features incorporating social network data into search results. In search results, it will display related pages your friends "likes". This is a good example of social data improving search result relevance. [yenan]
  • If we look at enough data, we will always be able to find something that looks like a correlation. The real value lies in understanding why sets of data are correlated. [emedjuck]
    • Interested correlation found by OkCupid but even they point out they are not sure why this is happening

  • It is important to understand how we design our products to leverage social data. Are our products designed to capture social data? Are our products learn from the social data captured, and improve over time? How can we incentivize our users to create social data? How can we make systems and processes that allows our product managers to leverage social data and directly plugin the social data captured into the product development process? Gone are the days when product development teams set up focus groups, and conducted surveys from random sampling of users to understand the user needs and user experience. In the world of web 3.0, the constraints around collecting user metrics are simply disappearing, and hence we should make it trivially easy for our customers/users to share social data, and improve and iterate our products by learning from the social data created by our customers/users. [vkarthik]
    • Iteration is a key theme here, in that the most successful companies Web 2.0 companies are the ones that follow a mantra of "Push and Play, Launch and Learn". An abundance of social data and user metrics allow companies to make targeted changes to their product and then quickly gauge user response, taking much of the guesswork out of product iterations. Unlike car companies getting locked into the models they release for a year, regardless of whether they flop, Web 2.0 companies can use incremental roll-outs, and more importantly immediately roll-back and fix problems as they see them, iterating rapidly to quickly address user issues and improve their product. [rahulgi]
  • For me a big eye opener was the ability of "customization" to make a huge difference to the bottom line of a company. The future is about customization and those corporations that can provide for each user a customized experience (i.e. based on the tastes and habits of the customer) will enjoy bigger profits. Andreas mentioned how Amazon would send emails recommendations to its users after learning the time of week when they are most likely to open their email. This might seem trivial, but a back of the envelope calculation will reveal that it may end up earning Amazon many million dollars in revenue: assume amazon has 2M "active" users worldwide. Assume that, by providing recommendations emails in a timely manner, they increase viewing of their emails. Assume this leads to just one additional sale per year. Assume the average sale is $50. Thus, this made a difference of 100M dollars to amazon. [iyera]
  • Social media has fundamentally shifted the landscape of everything we do from the way business function to the way we interact with others. This shift has created an overwhelming amount of social data with tremendous utility potential. To best exploit such potential, we need to make social data useful for both the business and the average user/consumers. It is important to deliver utilities to users for further data generation from the user end. And the key in delivering utilities is to put it in appropriate context that users can best appreciate and relate to. I believe users' geo-location is an important context that enables the exploitation of social data's potential. [kancao]
      • An example of the shift initiated by social data:
      • Paradigm Shift in Data Culture:
        • Do Not Have -> Somewhere Already
        • Cannot Get -> Can
        • Must not use -> Embrace It
        • Be secretive -> Be transparent
        • Information Asymmetry -> Information symmetry (increase trust & likelihood of user sharing) [rahulgi]

  • We have to seriously consider social search as one of the main ways through which to receive news today. We can see the growing importance of this mechanism through this recent news article: http://www.pcworld. com/article/228057/bing_ facebook_deepen_ties_threaten_ google_1.html. But in truth, all of these +1 and "Like" mechanisms are just trying to make information more relevant. Well, relevancy has a high correlation to the amount of inputted information. I personally would be fine with news services using my Facebook data to give me highly tailored news, for example. [arievans]
  • Another way to look at this is how much privacy we are willing to give up. A recent blog article ( 05/15/92/) by a good friend of mine explains that perhaps "privacy" is altogether a notion of the old world and something that does not quite have a place in today's technological landscape. The reason is that in general the sharing of information through the ages has lead to technological innovation and the advancement of mankind. Though there are fears about what could happen, in truth there have always been dangerous consequences of using information wrongly. [arievans]
  • Social data, including private, sensitive ones, are now being constantly mined on the internet, as if they are precious commodities, by numerous organizations, including Facebook, Google, and Apple Inc., through millions of computers and other mobile devices. Depending on human imagination, ingenuity, and greed, they can be used for anything from providing better maps for traffic navigation (Location Data) to predicting companies’ share prices (Facebook “likes”). Many uses are proving controversial, some are downright devious. As such, these social data are like atomic energy; they possess enormous potential power. In the right hands, they can provide mankind with a bottomless source of clean energy; in the wrong hands, they could become atomic bombs capable of devastating humanity as we know it. [varistha]
  • From this course, I learned the idea of inverse blogging that has been adopted by Quora. Instead of the blogger sending most of the info to the public, it is the public who post most of the info to the blogger's question. New social data websites such as Quora are shaping the landscape of today's Internet through forming new standards. For example, Quora's way of asking and answering questions is a contrast to our traditional way of searching for already answered questions on Google. [huanl]
  • As social networking sites (particularly Facebook) have permeated the online landscape and brought real identity to the internet, there is a "back to the future" phenomenon occurring in how people conduct themselves: online behavioral norms are starting to look a lot like those in real life, governed by many of the age-old social constraints people feared the internet had broken. As these sites become better over time at replicating people's actual real life social networks (Facebook today is a crude representation), and as people become more aware/accepting of how private the internet really is (i.e. not very), this trend is likely to continue. [jlee2010]

2. Readings

Thank you :)