Andreas Weigend, Social Data Revolution | MS&E 237, Stanford University, Spring 2011 | Course Wiki

Class_14: People Discovery

Date: May 12, 2011
Audio: weigend_stanford2011.14_2011.05.12.mp3
Other: Notes ~
Initial authors:[Jonathon Klemens,], [Brian Bulcke,], Yi Liu,
Initial notes:


  • DF6 (due on Tuesday)
  1. Insight learned from class (delivered to the wiki)
    • Post using SUNet ID to provide legitimacy
  2. Suggested Readings to prepare Tom Glocer for his visit next week (delivered to the wiki)

  • DF7 (due 5/24)
  1. Work with data from San Francisco Open Data Initiative
  2. Create idea for app to interact with data
  3. Will tie back to DF1, look for more details later

  • Next Week: Tom and James, come prepared!


Jonathan Chang of Facebook (

People Discovery by Andreas

Jonathan Chang

Audio from Class: Jonathon_Chang_SDR_2011
Chang is a member of the Data Science team at Facebook. There he explores Bayesian probabilistic modeling, topic modeling, data mining, and their application to large-scale systems. Jonathan earned his B.S. in Electrical and Computer Engineering from Caltech in 2003, and inches ever closer to finishing his PhD from Princeton.
His blog: Please Scoop Me!

  • Attended Caltech and Princeton, where he participated in graduate research on machine learning

"Would the Real Lady Gaga Please Stand Up?"

Do you think Johnson&Johnson would prefer to target potential users who are friends with a specific person or potential consumers who like a specific thing? The answer lies at the core of Facebook: People vs Entities. Which one is more interesting?


What are entities?

Jonathan describes entities as anything in the graph (places, pages, stores, movies, etc.) which are not people. Entities are much more fluid and always changing - a new frontier.

Check-Ins can share, link, comment, like, post, interact, connect, display with all these different entities. They are generic nodes with properties attached.

Facebook consciously manages the entities present on their system in order to preserve the network and the social data on the network.

Eric Sun did a study on how information propagates through these social networks and found that top influencers (hubs) did not exist! Instead a social network is similar to epidemic models, which model how diseases spread.
Also Duncan Watts-network theory scientist at Yahoo- performed a series of experience showing that highly connected people are not crucial social hubs. He believes the probability of a message propagate widely in the network is approximately random.
What does Jonathan and his team do?

Clean this entity space up! It’s really messy! Very long to-do list.

EX: How many different Stanford pages are there and how many are actually different -- these copies are diluting the strength of the community as well as the social data anchored behind every person, entity and interaction. It's hard to tell what are pages, groups, etc.

Facebook faces a huge challenge in striking balances between user freedom and network integrity.
EX: which is the REAL Madonna fan page?!? Is this it? Madonna
An interesting question here is how we can design a mechanism so that only the official Madonna fan page create a page on face book and others join this not create a new one.

Difference between MySpace and Facebook:
MS is self-expression!
FB is self-identity! (conscious decision by FB in blending network structure and a user-centric experience)

Possible solutions to try and replace Jonathan's job
SOL: Entity verification could resolve the issue! (Twitter does this) ANS: Nope! Verification is not scalable!
SOL: Examining entities through connections to verify authenticity! ANS: Jonathan's implementation of pages being able to like and connect to other pages
SOL: Madonna needs to create signals to show she is the real Madonna - burden is on the page creator.. e.g. Prelease a song
SOL: Data cleaning - have to do this algorthmically. Need to improve metadata on the pages first, better structure, instead of one block of text


from Jonathan's Talk
  • It is hard to effectively change when you are as large as Facebook
Q: Is Facebook trying to consciously blur the line between peoples, entities, etc….
A: Nope! It’s just a big mess! It’s so hard to convert 100k users to move over to the correct fan page or group page. There are limitations to your design process simply due to size of the user base.
  • Cleaning Social Data is a very real problem for Facebook and social data as a whole
Q: Why hasn’t entities on Facebook been constructed more like entities on Wikipedia?
A: Facebook wants to structure the information, i.e. what location, people, time, date, area, food, likes, dislikes behind every entity, person and connection between. However, Facebook has allowed user's to start submitting changes to improve pages, which are then either automatically or manually approved. Just rolled out duplicate suggestions. Also, it is difficult to know that the unofficial Lady Gaga page is not by itself an important online community.
  • "Edges" are very hard to navigate as social data scientist
Q: Likes can get old and I might change preferences but Facebook will still push older likes over new/unknown material, how do you manage?
A: Issue of assigning weights to the edges (edges of connections? – must clear this up with team )

More Jonathan

If you are interested in seeing some more of the challenges that face Jonathan at Facebook, enjoy this "media quality rich" lecture video:

A Facebook engineering blog post highlights a great visualization of friend connections on a global scale: Visualizing Friendships

Jonathan's Side Note


Popularized by Gladwell's performance piece, The Tipping Point, we can explain real social change more as an epidemic than through the agency of influencers. In stead of thinking as information moving through a network, think of a virus taking over an organism.

The 7 Most Viral YouTube Videos of All Time

People Discovery

Audio from class: People_Discovery_SDR_2011

What is People Discovery?

  1. Discovering a person
  2. Discovering properties of a person
  3. Discovering relationships between people
    • i.e. is he really my friend?

  1. People are social animals
  2. Facilitate job search and networking
  3. Find Mr. or Miss. Right (or Right Now)
  4. Digitize real world friendships
  5. Learning new things, discovery

  • Plaxo
    • Contact Management
  • LinkedIn
    • Attribute (especially by company and position)
  • Facebook (Friend Finder)
    • Network (using personal info)
  • Twitter
    • Knowledge sharing
    • "Facebook friends are the people you know; the people you follow on Twitter are the people you wish you know."
  • Quora
    • Topical knowledge sharing/ Common Interests
    • Discovering the person who knows the answer to your question.

Is it more interesting to recommend people or cars?

Class Survey: Most prefer recommending and discovering people.

Why do most people prefer discovering other people instead of cars (or other objects)?
  • People require mutual matching or mutual interest
  • People adapt / change over time
  • There are friends of friends but not cars of cars (i.e. discovering one person can lead to discovery of other people)
  • People get an emotional response from the discovery
  • They could also potentially get recommendations for cars (i.e. social commerce)

Is people discovery online even possible? Or is it just the electronic manifestation of a discovery that happened online?
  • Some companies, like Redux, try to offer a true people discovery service that caters to people who are genuinely looking for friends online.
  • We discussed in class whether the "people discovery" on services like Facebook or LinkedIn actually introduce you to new people you haven't yet met or simply "fill in" your digital network with relationships you've already established offline.
  • People change over time. However, people discovery media like Plaxo don't delete your contact when you stop talking to him/her.
  • Interestingly, class survey showed that there are a few people who have met only 50% of their Facebook friends in person. This is a very different use-case for facebook from what a lot of people use facebook for.

How do we gauge the effectiveness of friend recommendations?

  • If possible, it is most effective to meet the person in real life.

However, when we cannot meet them in real life, how can we measure the results?
We can measure the users actions after recommendation and after the initial engagement between users:
  1. Count total and per week messages between the users.
  2. Count total and per week likes on content from the other user.
  3. Count photos viewed between users.
  4. Count chats and/or threads between the users.

How Facebook measures strength/effectiveness of edges:

Economics View of People Discovery
  • How to quantify the cost to the people you introduce and cost to you when you are introduced
  • – sign up for meals with strangers with similar interests
  • empowers you to discover new people and going to their events
  • - a new way to meet people