Andreas Weigend, Social Data Revolution | MS&E 237, Stanford University, Spring 2011 | Course Wiki

Class_16: Visualizing Data / Social Data and Organized Crime

Date: May 19, 2011
Audio: weigend_stanford2011.16_2011.05.19.mp3
Initial authors: [Pao Jirakulpattanal], [Rhampapacht Vorapatchaiyanont], [Sebastiaan Boer]

Key Points

  • Visualizing Data
  • Geolocation
  • Organized Crime
    • Know what data is out there and how it can be used against you
    • Social data will get more and more convoluted
    • If you build interesting datasets, they will be stolen.
  • "Just because it appears on your screen doesn't mean it's real"


Please make sure that you use links where useful!


Visualizing Data

Visualizing data can give you certain insights that you would otherwise not have had. In the following examples, we can clearly see how the geospatial visualization of data can tell us very much about what is going on in the world around us.

Historical Example

external image 643px-Snow-cholera-map-1.jpgOne of the first geospatial visualizations of data was done in Cholera struck London back in 1853. A large number of people in the same areas on London got sick with Cholera over a short period of time. Over 127 people died in just 3 days. A man named John Snow had the hypothesis that this outbreak might be caused by polluted water sources. To gather data, he went from household to household and talked to the relatives of the sick and diseased, and asked them what well they used to get their drinking water. It struck Snow that there were a few areas in London where nobody got sick, while other regions were suffering heavily. After representing the data he collected on a map of London, it became clear that there was one well that caused the cholera outbreak. Once the appropriate actions were taken, the outbreak stopped. This incident represents one of the earlier simple efforts in visualizing data and conjecturing a hypothesis.

Recent Examples


A more recent example of geospatial visualization is the image to the right (by Eric Fischer).
The map shows the locations of pictures that were uploaded to Flickr. Every picture in JPEG format has EXIF data stored the camera that was used to take the picture, the exposure time, a time stamp and sometimes the location where the picture was taken. The blue dots on the map are pictures that were taken by individuals identified as 'locals', the red dots are tourists and the yellow dots are unknown. Some of the obvious tourist hotspots such as Alcatraz and the Golden Gate bridge are therefore colored in red.

Cabspotting traces San Francisco's taxi cabs as they travel throughout the Bay Area. The patterns traced by each cab create a living and always-changing map of city life. This map hints at economic, social, and cultural trends that are otherwise invisible. The core of this project is the Cab Tracker. The Tracker averages the last four hours of cab routes into a ghostly image, and then draws the routes of ten in-progress cab rides over it. Also, this platform has prompted some behavior change in the taxi drivers.
This Tract currently uses U.S. 2000 census data. It allows you to view every tract. Numerous important social dynamics such as race mixing, socio economic distribution, gender, age, level of education, and size of household. Interesting fact: the most diverse spot is San Quentin prison.

Since online data contains huge volumes of information, data visualizations can help you to see the concepts that you're learning about in a more interesting and useful manner.

Video visualization of Downtown Los Angeles city data (skip to 2:22):

Downtown Los Angeles from tam thien tran on Vimeo.

Internet Visualizations

Mapping the Blogosphere is a collection of maps of the blogosphere, including hyperbolic maps, as shown here.
external image mappingtheblogosphere.jpg

The Hierarchical Structure of the Internet was a study that looks at how the Internet is organized, both in terms of structure and connectivity. It shows how the central core of the Internet is made up of about 80 core nodes, but that even if those nodes failed, 70% of the other nodes would still function via peer-to-peer connections.
external image hierarchicalstructureofthei.jpg

One Week of The Guardian is a visualization of the stories from The Guardian newspaper. It focuses on the relationships between headlines, authors, pages, and categories.
external image oneweekoftheguardian.jpg

Looks is a collection of different Delicious bookmark visualizations. They’re created with a python-based graphics library and layout engine.
external image looksdelicious.jpg

One of the most comprehensive visualization tools I have ever since is the World Bank Data Visualizer. It is a great tool if you want to create your own visualization for different countries in the world. You can select different economic, health, population and ... variables and view how the world has changed during the past century.

Organized Crime (with Vladimir)

Today's guest goes by the pseudonym Vladimir. Vladimir has worked in a governmental agency focusing on counter terrorism and international organized crime. He speaks 6 languages (including German, English and probably Russian) and has worked in over 100 different countries.

Organized crime is a gigantic operation that is responsible for up to 15% of the GDP. Examples of organized crime include money laundry, narcotics, piracy, intellectual property, etc. These organizations are run in a similar fashion as large corporations. They can have complex organizational structures that include marketing departments, groups of malware writers, email writers, server hackers and such. Some organizations specialize at certain tasks, such as stealing creditcard data, parsing the data, producing 'fullz' (creditcard number in combination with full name, address, zip code) or importing and exporting stolen goods. These activities can be described as CaaS (Crime as a Service) or Crimesourcing.

According to Vlad, 70% of all Android flashlight apps are malware!

Social Data and Organized Crime

Most data breaches are tied to organized crime
external image Verizon2010Report_540x206.png
misuse of access privileges, hacking, and malware were the most commonly used methods for stealing data (Credit: Verizon)

Organized criminals were responsible for 85% of all stolen data last year and of the unauthorized access incidents, 38% of the data breaches took advantage of stolen login credentials, according to the 2010 Verizon Data Breach Investigations report.

While external agents were behind 70 percent of the breaches, nearly 50 percent were caused by insiders and only 11 percent were attributed to business partners.

The study combined data from investigations and statistics worldwide compiled by Verizon and the U.S. Secret Service in which 141 cases were analyzed involving more than 143 million compromised data records, compared with the more than 360 million records compromised in 2008.
Most of the externally originated breaches came from Eastern Europe, North America, and East Asia, the data shows.
Nearly 50 percent of breaches involved misuse of user privileges, while 40 percent resulted from hacking, 38 percent used malware, 28 percent used social engineering tactics, and about 15 percent were physical attacks.

Simple Examples of Criminal Usage – the website that tries to raise awareness with regards to over-sharing of location-aware information and applications such as Foursquare, Brightkite, Google Buzz etc. The Dark Side of Geo Data (CNET)
external image example.jpg

Counterfeit Social Data

Everything that occurs on your computer screen can be faked.
Examples: Nuclear power plant hack (Stuxnet)

Social Data Convolution

As more information becomes available, it gets easier to track people across multiple information sources.
  • Tweets, facebook updates, and information from phones are all geocoded
  • Data of a person's location can tell much more than just pure location:
    • i.e. if a person in focus pays a visit to the hospital every Wednesday, we may be able to extract some insights on the health condition of the person
    • i.e. For a man, who is not a teacher and does not have kids, but checks in at a school very frequently, the information about his location might be a major evidence against him in a kidnap case

Protecting yourself
The only thing you can do to protect yourself from the dangers of social data is to not participate in online sharing. For example, you can choose not to have an account on Facebook. The downside is, however, that you will frequently be left out of the loop in regards to events. Remember that the 2 two most important proxies in which your identity can be deduced are: 1. Cell phone 2. Credit Card.

SEC – Insider Trading Case
For example, data can be gathered and analyzed. Since mobile phones contain information regarding geolocation, a trader who has made a lot of money from a deal with Merck and has constantly been going to Merck headquarter is highly suspicious of insider trading.

Terrorists and Technology

external image 359_30_2008-IT-year-in-pictures-Terrorist-attacks-in-Mumbai.jpg.png

The 2008 Mumbai attacks (often referred to as November 26 or 26/11) : Multiple sites in the city of Mumbai were attacked with bombs and gunfires in a coordinated terror attack that began on November 26, 2009, and lasted for three days. The attacks killed 179 people and injured over 300 victims. The operation was facilitated by real-time data analysis and several months of prudent plannings, utilizing such available social and geo-locational data and technology as GPS, satellite data, and youtube footage. The preparation for the bomb attacks included google analysis of all relevant sites. Taxi bomb attacks found all over the city had also been assisted by real time data monitoring on the flow of traffic. This article from The Guardian discusses the extent to which social data was used to report on events during the attacks and that the Indian government asked for a temporary block to the site to half the dissemination of information.

Today's organized crimes are very technology-driven and rely heavily on social media and data.
  • Terrorists hold guns on one hand and blackberry on the other.

President Obama's Visit to Iraq

When the President Obama visited Iraq following Osama Bin Laden's death and stayed in the secret military base a military photographer took a photo of him and posted it on Picasa. Since the JPEG file contains geolocation information, the location secret base is revealed.

On the Brighter Side

  • The FBI and other government agencies use Facebook to get to know more about potential criminals through the use of 'sockpuppets'. The agency sets up fake accounts that are likely to get a friend connection with the target on facebook (by using the picture of an attractive women as profile picture, for example). After the target has accepted the friend offer, the agency has access to most of his or her Facebook data in a completely legal way. Here's an example

  • Social media has helped empower local citizens to track down criminals and petty thefts. This can help reduce the load on local police forces. Here's an article about Facebook being used to solve crime.
  • Social media updates help instill a sense of safety on college campuses.
  • 40-Year Record Low Crime rate: Although the cause, is unknown, I thought it would be interesting to point out that according to the NYTimes article: The number of violent crimes in the United States dropped significantly last year, to what appeared to be the lowest rate in nearly 40 years, a development that was considered puzzling partly because it ran counter to the prevailing expectation that crime would increase during a recession. In all regions, the country appears to be safer. The odds of being murdered or robbed are now less than half of what they were in the early 1990s, when violent crime peaked in the United States. Small towns, especially, are seeing far fewer murders: In cities with populations under 10,000, the number plunged by more than 25 percent last year.
  • StuxNet was a project undertaken by an unknown governmental agency to destroy the Iran PLC components of their nuclear reactor systems. Here's a video showing how it works: