Andreas Weigend, Social Data Revolution | MS&E 237, Stanford University, Spring 2011 | Course Wiki


Date: May 31, 2011
Audio: weigend_stanford2011.19_2011.05.31.mp3
Transcript: weigend_stanford2011.19_2011.05.31.doc
Initial authors: [Joachim Lyon,], [Ari Evans,]

1. A view across the decades...

The Past

  • 70s Building computers
  • 80s Connecting computers
  • 90s Connecting pages
  • 2000s Connecting People (e.g., Facebook)

The Present (Well, this decade anyhow..)

All the data this year will be greater than all data cumulatively created by mankind by the end of last year.
  • 2010s will be about:
    • Sensors--"How can we make sense of sensors?"
    • Instrumenting Businesses (This approach is instrumental, naturally)
    • Collaboration--How can we enable collaboration? Ultimately it will be about enabling behavior change.
  • Here are some current predictions about technological trends for the decade:
  • Geo will likely be a key to unlocking the mysteries of what the 2010s will hold, as we have come to learn how merely knowing a user's location can provide a wealth of additional information (a good and bad thing!)
  • We also posit that the 2010s will be about a human-computer symbiosis, letting computers do what they are good at and letting humans do what we are good at

Above image from Microsoft Research

2. Perspectives on communication

What are some important dimensions of communication?

  • Structured vs. Unstructured
    • Examples of less and more ad-hoc structures include: Facebook Wall or Group, Hashtags such as #followfriday or @aweigend
  • Symmetry vs. Asymmetry
  • Relevance-emphasis (old) vs. Chronological-emphasis (new)
    • Pithy summary of this comparison is: "Relevance vs. Recency"
    • An example of chronological emphasis is how Twitter gives us the web in realtime.
  • Synchronous vs. Asynchronous
    • These are one of the more traditional dimensional poles of the communication spectrum. However most contexts and technologies are more subtle than this and thus harder to understand or pass judgement on. For example, different cultures may demonstrate surprising and "counter intuitive" preferences along these scales in certain contexts.
  • Push vs. Pull

How does the structure of communication systems affect behavior?
  • Now we can reach a billion people -- this takes relatively trivial effort compared to what it would have taken two decades ago.
  • It takes ten minutes, or less, to invite 200 people to your party: what does this mean for how we structure our social lives?
  • Generation of new marketplaces. Evidenced by this 2010 Techcrunch article on disruptive business models: 10 Rocking Business Plans
  • "The 24/7 experiment" -- That is, experimentation is much cheaper; it is easier in many contexts now to continuously prototype. Instead of putting time/effort/money into up-front full and valid specs, we can now just "try out" small ideas.
    • Zynga is well known for successfully capitalizing on the ability to rapidly experiment. According to the LA Times "Among Zynga's key strengths is its ability to track the performance of each feature and design element through what's known as A/B testing. Simply put, Zynga compares players in two camps, one with a feature and one without, to see how well the feature does. If the feature has a high "click compulsion," or high rate of players clicking on it, Zynga incorporates it into the game for everyone." LA Times article
  • Pro:we keep in touch easier with more people at greatly reduced cost
  • Con: Teasing out the real signal from the noise may become increasingly difficult as the ease of communication increases over time
  • Con:What about the value of disconnecting? Will it become prohibitively difficult to do so?--perhaps many already believe this is already the case
    • Do we need to disconnect? Does knowing the weather while on the go on a camping trip count as "cheating?"

Our communication technologies are allowing us to instrument and "read out" the world. Consider the following comparison:

  • Social Media: Listening to what users are saying
  • Social Data: Watching what users are doing.

3. The importance of distinctions (and how they lead to learning)

Distinctions don't present binary right/wrong choices; they present a set of subtle tradeoffs that we should understand, in context, more deeply. For example:

  • Implicit vs. Explicit data companies
    • Implicit data companies include those that rely primarily on scraping data or simply those where users don't really have to input much: e.g., Wikinvest
    • Explicit data companies include those that rely on user contributions: e.g., Facebook, Yelp
    • Some companies are both; but all of them should get clear on what they are actually doing and how this aligns or not with their business approach.
  • Network Node vs. Edge Attributes
    • These elements are used very differently in marketing strategies
  • Symmetrical vs. Asymmetrical Communication/Influence
    • For example, compare Facebook (symmetrical) vs. Twitter (mostly asymmetrical) relationship structures

4. Understanding Individuals as Building Blocks

  • Identity
    • Just what is our identity anyway? Just Tweets?
  • Aging
    • If we age, if paper ages, should our data age too?
      • Does timestamping information achieve this same effect?
    • What would it mean to "age" a digital bit?
    • When and how do we in fact need a more persistent identity?
  • Data created by individuals: there are many kinds!
    • Geo-location (time+space) is probably the most important but most underrated piece of social data.
    • Intention vs. Attention Stream -- there's a difference! What are we actually measuring?
    • Short vs. Long Clicks
      • Yet another distinction: A click is not just a click
      • We want to know: Did the user come back? Did s/he convert?
  • How does behavior change and what metrics or frameworks will help us to useful track and then make sense of such change?
    • Metrics are critical in terms of their ability to underwrite and ground evaluative experiments; but this means the metrics must not be arbitrary -- they must actually matter, be meaningful, in light of the social data.
    • A relevant framework here is Kahneman & Tversky's work on gain/loss reactions: people feel and act very differently when presented the same absolute value when it is framed as a gain or a loss situation. This social psychological work was one of the nails in the coffin for the traditional rational economics view of behavior, dubbed homo economicus. (Economics subsequently co-opted such work, calling it behavioral economics. If you can't beat them, co-opt them!)
    • Another useful framework is to observe the relationship between user expectations and actual events. For example:

Good Event

Bad Event

5. Thinking about data

Who owns the data?

Impact on business models & data strategies

  • Moving from making people pay for proprietary data towards free distribution, allowing people to build apps on top of such data.
    • For example, dataSF (Jay Nath)
  • Platforms becoming power players (Facebook, iOS, Android, etc.) as new viable business strategies have emerged with win-win-win arrangements for all parties involved

6. Class exercise: Generate succinct pairwise "trends"

  • Data Mining vs. Instrumenting the World/People
  • Transaction Economy vs. Interaction Economy
  • Info Asymmetry vs. Info Symmetry
  • Generalized Push Information vs. Specialized Pull Information
  • Power of None vs. Power of One (e.g., ebay, yelp)
  • No Strings Attached (NSA) vs. Strings always attached
  • Push vs. Pull

7. Towards the future: what sectors/industries have and have not realized the power of a social data mindset?

Some notable examples of unrealized potential:

  • Finance Sector
    • Using social data to discover insider trading should be trivial; yet it still plagues our system.
  • Research
    • "Still built around constraints that no longer exist" -- aweigend
    • Data collection and analysis can be separated: aweigend shared a story from his experiences at the Santa Fe institute: he and a colleague released several data sets for anyone to analyze.