Andreas Weigend, Social Data Revolution | MS&E 237, Stanford University, Spring 2011 | Course Wiki


Student representatives election (figure out constituencies)
Speaker wishes (based on

Class_02: Social Data Landscape

Date: March 31, 2011
Audio: weigend_stanford2011.02_2011.03.31.mp3
Initial authors:[Shalini Kurian,], [Mike Leatherbee,], [Yusuf Celik,]

Key Points

  • Gauging the dimensions of Social Data, we delved into the relevance of data about people, a few instances of strategies of companies like Amazon were a key insight to understanding the vast reach of data that can be gathered about people, including Demographic Data to create context about customers, Psycho-graphic Datato gain a better understanding of how to modify products, Behavioral Data to match patterns for better suggestions to customers yielding greater conversion. Also analyzing relations between people and engagement with others and websites are mines for molding business strategies to target customers in a more personalized and effective manner.
  • The present way of data mining focuses on traditional characteristics such as gender, income level, age etc. The future of data mining, however, will focus on people's Intents (both implicit and explicit intents).
  • We answered a big question -What can we do with this Social Data?The answer simply put is A LOT!! Efficient recommendation systems, targeted advertising, personalized news, etc are some of the major concoctions that can be leveraged using this kind of social data.
  • The social network molds this data, to modulate behavior (e.g parking prices depending on time of the day). Also an important outcome of the collection of such social data is the stress on Privacy . People need to become more aware of what information they put about themselves and more importantly who is going to see this information. An example cited in class was about an individual who was stopped from entering the US because of two controversial book listings on her wish list which was available to the Homeland Security. Privacy hence should be taken seriously (Need more information about this is perhaps possible that our lack of sufficient context is making us draw a conclusion about lack of privacy. What is she was reading a book on how to make a molotov cocktail. What if instead of a she, it were a Libyan national.....would it not warrant some investigation?).


Dimensions of Social Data

The social data landscape can be constructed by three dimensions: individuals, connections, and community. Information about all three dimensions for a certain market segment helps us to anticipate behaviors and design tools that will allow us to engage more intimately with consumers. For example, by finding a community of individuals who have the custom of celebrating mothers day, identifying an individual who has a mother within the community, and discovering the connection between the individual and his/her mother, we would be able to offer the individual purchasing suggestions that are consistent with his/her mother's tastes. By gathering and manipulating the information from these three dimensions, we are able to improve the online-purchasing experience for the individual, as well as contribute to strengthening a real relation between a mother and her daughter/son.

In the following sections we discuss the three dimensions in more depth.

Data About People

The preliminary question to look at is what do we want to know about individuals? Every social network mirrors a graph where we are interested in the nodes, the arcs and the community (entire social network).


  • Focus on the node, which represent the attributes of the individual.
    • We are interested in the demographics and psychographics of individuals (i.e., gender, age, nationality, etc.).
    • has 500 attributes for individuals, which include:
      • Shipping address: given that this attribute is very relevant for the user (because you want things shipped to yourself), therefore the information should be very reliable.
      • RFM or Recency, Frequency and Monetary (total amount purchased) are key attributes for user profile and important metrics for Amazon's business strategy.
      • Product category. The challenge is how to relate and map products between each other (nearness, complementarity, substitutability, etc.) so that we can understand the "nearness" of different products. One way to map items is by analyzing the behavior of users. Co-purchasing behavior implies that products are complements, while co-clicking behavior implies that products are substitutes. One aspect to consider is the timing order of the co-purchasing behavior, because the order matters when Amazon suggests a complementary item. For example, offering a spare battery after purchasing a laptop is more effective than offering a laptop after purchasing a spare battery.

      • Percentage of co-clicks resulting in a purchase of an item, increases the relevance of the item for the user.
      • Gender is "estimated" by contrasting the name of the user against an external data, such as the data from the US Census Bureau.
      • Email reonse rate is relevant to increase likelihood of the individual actually reading a future email from Amazon. For example, Amazon's strategy is to send emails the same day of the week and 5 minutes before the user last clicked an email from Amazon. Amazon's secret for obtaining user data is PHAME:
Then we iterate or use the “f-word” – feedback. This is super different to typical data-mining.
    • The future way of getting to know user behavior includes measuring intent, such as "wish-lists" (C2W - customer to world). The way of profiling users is by: search behavior, price sensitivity, follower/leader tendency, reaction to offers, etc.

  • Focus on the arcs, which represent the relationships between individuals.
    • To measure connections between people, you can use tools such as the "gift button" in Amazon. Ways of identifying the type of relatinship between individals include the purchases at given dates, such as mother's day.
    • In the case of Facebook, if an individual “tags” someone, you can assume that they hang out with that person. Nevertheless, relationships are not binary. Normally one person has a greater interest in the relation than the other. This can be measured by the number of comments from one individual to the other, and the quality and quantity of the content shared.

  • Focus on the community (collective intelligence).
    • Where the person lives/works (indicated either by the shipping address or the IP address) is an indicator of community patterns. The behavioral tendencies of individuals within a community may be a way to anticipate behaviors of other individuals in the same community (i.e., fashion, needs, leisure, etc). These predictions can further be augmented by knowing the influence of individuals

Data From People

Knowingly or unknowingly, people share information about themselves with the world. It may seem creepy to some, but George Orwell's 1984 prediction that we would live in a society of ubiquitous surveillance by a central entity seems partially correct. The only difference is that the surveilling entity is not a government. Instead it is a dispersed group of companies (or organizations) whose sole objective is not to control our actions, but rather to modify our consumption behaviors in order to increase their performance (or reach their foundational objective). What may make some people uneasy is the fine line that exists between modifying our consumption behaviors and controlling our actions.

A couple examples of how organizations are gathering our information:
  • Think about mobile carriers: they know how fast their customers drive, who they are in the car with, at what time they wake up and go to sleep, who they sleep with, where they spend most their time, etc.
  • Similarly, fotologs, transportation cards and loyalty cards gather information about people's likes, habits, purchasing behavior, nearness to other individuals, etc.
  • Google has a record of all of our web activity (thanks in part to popular internet browser Google Chrome)

Engagement Data

Engagement is also understood as interactivity. It is, in some way, a measure of the level of influence that organizations can have on individuals. For instance, if an individual has a history of clicking on an email shortly after the email is sent by the organization, it is proof that the individual can be reached quickly and does not dismiss emails from the organization. Another way of measuring engagement is the purchase frequency of "suggested" products by Amazon. In this case, if an individual frequently purchases a second or third complementary product from the list of suggestions that Amazon offers, then this behavior indicates a high level of engagement of the individual with the messages that Amazon is sending him/her.

Examples of tools that serve for measuring engagement:
  • Forms, click rates, games and virtual gifts (particularly the price of the gift) are ways of measuring level of engament of the individuals with the website and between other individuals.

Web Strategy By Jeremiah Owyang

Opportunities of Social Data

  • What can we do with social data?:
    • Create more effective advertising campaigns by matching the offer with the likely consumer desire (on a given day they are 15 billion adds that go through bidding processes that must solve an optimization function which includes user profile with the company's willingness to pay for reaching the user).
      • Click here for an interesting article on how technology is helping marketers use big data to improve the consumer experience
    • Product, service, and even search result recommendations.
    • Introduce people searching other.

    • Sustainability, such as:
      • Carpooling (combining destinations with current user's geolocation). Example: Zimride


      • Congestion pricing schemes (such as SFPark)

  • Using geolocation and push notifications to route first responders to medical crises before ambulances can arrive.

    • Traditional Industry Moving Forward
Airlines and Restaurants

  • Use social data to prevent Fraud
lending_club.jpgLending Club, a peer-to-peer lending service that matches borrowers with investors, uses social data to make sure the user’s information checks out to try to protect his or her identity. Lending Club will compare application information from a credit file against information that’s publicly available. If there’s a mismatch, it gives the company reason to perform other strict identification procedures.

  • Improve artificial intelligence and related applications e.g. speech recognition

Leveraging users' queries and millions of audio files, Google has been able to train statistical models and develop speech recognition technologies that actually work, outsmarting former systems based only on logics and explicit linguistic rules. For deeper coverage and analysis of Google's technology, see the very interesting article by Farad Manjoo from Slate: Now You're Talking!

Implications For Society
  • Changes in behavior
    • insurance companiess may have different prices based on your online social data: They may charge you by the mile depending on the time of day and the route you use.
  • Privacy can no longer be taken for granted.

  • To understand how our notion of privacy has evolved over the generations of the web, check out this interactive visual demonstrating Facebook's default privacy settings from 2005 thru 2010. In 2005, most data was only available to your friends. By 2010, default settings allowed your data to be viewed by the entire internet. external image frame2.png
  • The ownership of the data becomes irrelevant.... the network owns the data. Here's an interesting article on social data ownership. In gist, the notion of digital assets (different from physical assets) is yet to be discovered.

Article on Implications of Social Data Revolution

Administrative Issues

  • If you are interested in attending the Data 2.0 Conference in SFO, please write Andreas. More info here.
  • DogFood #1: Imagine that on September 13, this year, you could have data from hundreds of thousands of mobile phone users, and you can put something on those phones. You can measure something which will be combined with geolocations. What would you want to put on those people's phones? Something you could measure.