Andreas Weigend, Social Data Revolution | MS&E 237, Stanford University, Spring 2011 | Course Wiki

HW2: Influence in Social Networks

Part A: Measuring Influence (5 points, due Tuesday 19 April 2011, 12 noon)

Learning Goals:
· Learn about the concept of influence in social graphs
· Design different ways to capture influence, and understand their properties
· Understand how to evaluate an influence model

Assignment:
Start by understanding some of the issues on modeling influence. Besides Chapter 4 of the “Mining the Social Web”, relevant authors include
· Duncan Watts, and
· Sinan Aral.
Follow some references in papers by these authors.

Here are some of the questions to keep in mind when learning the concepts:
· What are some desired properties of influence in a social network? Examples include:
o What really does influence mean, e.g., does it mean to be able to get someone to do something (e.g., to follow a link or to retweet a tweet)?
o What does it mean for person A to have a larger influence than person B?
o Presumably, it should be possible to order the nodes in the graph by their influence. Does a numeric influence score make sense? And if so, what do you want a factor of 2 imply?
o Is there only a single dimension of influence, or are there different types of influence like influence related to recognised expertise?
· What are the differences (if any) between influence
o in a graph of friends (that are mutually confirmed, such as on Facebook), vs.
o in a graph of followers (that need no confirmation, such as on Twitter)?

Deliverable for Part A:
You need to submit answers to the following three questions on http://bit.ly/SDR2011HW2a

i. Model Inputs (2 points)

Describe two thoughtful variables that can serve as inputs into an influence model, and give the reasons for your choice. (Go beyond simply taking the number of followers.)
Think of the example given in Class 5 for the different problem of search relevance where we discussed the number of links going into a page, going beyond what the author says on the page by encapsulating the importance others attach to the page.

ii. Model Evaluation (2 points)

Assume someone gives you two “boxes” that compute influence. How do you compare the quality of their outputs, i.e., which box does the better job? Give two ingredients that enter the evaluation function.
(Again illustrating this with search results ranking, we distinguished “long clicks” from “short clicks” as having a positive vs a negative impact on the quality of the algorithm.)

iii. Implementation Plan (1 point)

Briefly outline what you need to implement your influence algorithm, and what you need to evaluate it, including what you will compare its outputs to.



Part B: Implementation (10 points, due Thursday 28 April 2011 (#ref email Monday 25th), 12 noon)

PDF Version: http://dl.dropbox.com/u/5223068/mse237_hw2b.pdf

Learning Goals:
· Implement the different measures of influence in social graphs
· Learn how to evaluate such measures of influence
· Learn to use the Twitter-Python API from HW1 on a much deeper level
· Learn about OAuth 1.0

Resources:
· Mining the Social Web – Chapters 4
  • Specifically, pay close attention to sections:
    • “RESTful and OAuth-Cladded APIs”
    • “A Lean, Mean Data-Collecting Machine”
    • “Measuring Influence”
· HW2 Starter Code - http://dl.dropbox.com/u/5223068/hw2_starter_code.py · Twitter User Baseline Ranks - http://dl.dropbox.com/u/5223068/twitter_rank.xlsx
Submission Details:In a DOC/PDF, please include the following:
  • The 20 “influential” users given, ranked from most influential to least influential according to the trivial algorithm (described below)
  • The 20 “influential” users given, ranked from most influential to least influential according to the “non-trivial” algorithm that you chose to implement
  • A discussion section where you describe your non-trivial algorithm as well as your findings, your evaluation of the two algorithms, etc
  • Your Python source code (as an Appendix)
  • Email to mse237@gmail.com with subject line “HW2 – [YOUR SUNet ID]”

Assignment Details:
As discussed in the first part of the assignment, within social networks there is an interesting notion of social influence. For this second part of the assignment we will actually go ahead, get our hands dirty with some code, and implement some algorithms that will compute the social influence metrics of a pre-defined list of users!
Part 1 – Figuring out OAuthBefore we can go out and start tapping into the wealth of data that is your social graph, we need to first authenticate with Twitter. OAuth itself can get quite complicated and nasty, so I’ve done my best to make it as painless as possible in this assignment. Simply go to:
https://dev.twitter.com/apps
Login to your Twitter account if you haven’t already, and make yourself a Twitter App (this is the only way to retrieve a consumer key / consumer secret key / OAuth Token, etc, for screenshots see the Appendix). At this point, save your application (make sure you set it to “Client” access instead of “Browser”), and then click on “Application Detail” in the side navigator bar to retrieve your application’s consumer key and consumer secret key, which you will then substitute into the starter code given for this assignment. Similarly, for the OAuth Token and OAuth Secret Token, click on “My Access Token” and substitute those two strings into the starter code. At this point, you should have all the necessary keys and tokens for OAuth to successfully authenticate you!
Step 2 – Diving Deeper into the Python APIOnce you are successfully authenticated, it’s time to practice using more of the Twitter-Python API and being able to retrieve a user’s followers, their ids, screen names, etc. All the methods that you’ll likely need are provided as examples in the starter code and/or textbook.
We encourage you to play with the commands yourself and see what the responses give you! If you’re unclear with what some of the parameters mean, then please check out the Twitter API documentation online for a quick primer. Like I said, use the starter code as a foundation, but here you have the freedom to explore the API and see what cool stuff you can dig up.
Step 3 - List of Twitter UsersJust for uniformity amongst students, the teaching staff has put together a list of 20 Twitter users that we think, collectively, cover a broad category of businesses, occupations, interests, etc. Because we have to adhere to Twitter’s 350 API calls / hour rate limit, we’ve broken down the 20 users into two groups: those with a “small” number of followers, and those with a “large” number of followers (10,000 was arbitrarily chosen as the cut-off point).
For those with a “small” number of followers, we would like you to directly use the Twitter-Python API to make GET requests and implement your algorithms. Because the numbers of followers they have are relatively small, you should not run into two many rate limit troubles when querying for details.
For those with a “large” number of followers, what we’ve done is gone ahead and grabbed, for each “large” user, all of their followers as well as how many followers each of their followers have, and we’ve cached them to files which you can then quickly read from instead of having to make API calls. This should be enough information for you guys to compute any potential influence measures, and should save you the potential headache of having to wait unnecessarily for hours because of the rate limit.
With that said, here’s the SMALL user list:
1. Andreas Weigend (@aweigend)
2. Enrique Allen (@enriqueallen)
3. Marc Smith (@marc_smith)
4. Matthew Russell (@ptwobrussell)
5. Ming Yeow Ng (@mingyeow)
6. Peter Hirshberg (@hirshberg)
7. SF New Tech (@sfnewtech)
8. Shopkick (@shopkick)
9. Tom Glocer (@tglocer)
10. ThinkAndroid (@thinkandroid)
And here's the LARGE user list (link to the data - http://dl.dropbox.com/u/5223068/twitter_data.zip):
11. Allstate (@allstate)
12. Bradley Horowitz (@elatable)
13. Esther Dyson (@edyson)
14. Andrew Mason (@andrewmason)
15. Joshua Schachter (@joshu)
16. Max Levchin (@mlevchin)
17. Quora (@quora)18. Bill Gross (@Bill_Gross)19. Danah Boyd (@zephoria)20. Fred Wilson (@fredwilson)
Step 4 – The Trivial Algorithm (2 Points)Now that you’ve gotten time to play with the Twitter-Python API some more, it’s time to get down to business. As suggested in “Measuring Influence” (pg. 103) in the text, the most trivial way of thinking of social influence is to just think – the more followers I have, the more popular I am, and therefore the more social influence I have in this social graph.
Using the list of 20 users given to you (again, remember to read from cached data on the “large” users, instead of accessing the API directly), please run the trivial algorithm and rank them from most influential to least influential (as defined by this metric).
Step 5 – Your “Non-Trivial” Algorithm (8 Points)
And finally, for the last part of the assignment, it’s your turn to think and reflect on everything that was mentioned in lecture, during Matthew Russell’s talk with us, etc. Think about what inputs/variables you’d like to include in your model, and finally think about a way to use these numerical inputs/variables to create a final score for each user (think Klout – a black box algorithm that takes inputs and outputs an influence score).
For this part of the assignment, your job is to then implement this "non-trivial" algorithm, which will go ahead and compute each Twitter user's social influence score using your model. Then, rank these 20 users from most influential to least influential. Evaluate your rankings and adjust your algorithms until the influence rankings make sense.

In your report, please explain:
1. Which metrics your algorithm used

2. Evaluate your influence rankings with the various Twitter user metrics against some kind of baseline ranking and discuss the differences (hint: Klout Score)

3. Highlight insights about quality of your ranking algorithms compared to the simple metrics

4. Briefly discuss what you learned; what surprised you, what you think could have gone better, etc.