DataRep & NatureofCode Final – Process Breakdown

Stages of Data Representation

Collecting
Cleaning
Analyzing
Visualizing

JAN 2013- UNDERSTANDING

Even though the subject matter [SAY NO to 6.9 MILLION - A PROTEST AT HONG LIM PARK, SINGAPORE] was close to heart, I am trying to make sense of what is happening at home via a medium I am not familiar with – Twitter. I am not a frequent twitter user and the use of twitter in Singapore is less prevalent compared to that of USA or even Indonesia. Nevertheless given Facebook privacy constraints and the protest at Hong Lim Park on 16 Feb 2013 being a streaming event – I chose tweets as my data set as it was then the best tool to capture an event that I could not be there physically.

JAN to FEB 2013 – COLLECTION & CLEANING

Resident Help
Craig & Ali
Craig (ITP Resident) helped me tremendously throughout. He showed me how to use CURL script to capture tweets in real-time from my browser. Ali (from ITP 2013) was so kind to give me a step-by-step walk through to set up an Amazon EC2 Server so that I could run CURL script to collect data remotely and continuously even when my computer is shut down. He even ran through basic TERMINAL command to help me fetch the file from EC2 server.

More details on collection in technical section.

Cleaning Part 1
With tweets collected from EC2 server, I though I could bring them immediately into Processing and start visualizing them. I was wrong. I have to clean them up since I am importing them as JSONObject into Processing. The cleaning stage was done in TextWrangler with help from Craig.

First step was getting rid of empty line breaks in my .txt file.
The data came in periodically, hence there were empty line breaks in between when there was no tweet stream coming in.

Second step, for it to be recognizable as JSONArray each tweet need to end with a ‘,’ comma. Hence we added a ‘,’ at the end of each line by replacing \r with ,\r

Finally, the file has to be readable as a JSONObject.

Begin the file with

{“Tweets”:[

end file with

]}

Craig was so patient in helping me getting around this part, even thought it wasn’t complex, it was very foreign to someone who has never worked with JSON.

Cleaning Part 2
The interesting part was due to the way I passed the CURL script with keyword search #hong lim park, I had tweets containing ‘hong kong’ & ‘park’ (which could have been a Korean pop-star) in them. Jer wrote a simple cleaning Processing sketch to read my raw tweet file and write it into a new cleanTweet file removing tweets containing irrelevant words. [Enter Processing sketch here]

MARCH 2013 – ANALYZING

Then Craig show me a example to import JSON file into Processing. Simply printing tweets as text on screen made me so happy.
[Show URL here]

APRIL 2013 – VISUALIZING

This was my very first visualization of my tweets after two-months of learning to collect, clean & parse data. Displaying them was harder that I expected.

Firstly, the timestamp on my tweets was Greenwich Time. I need to convert them to Singapore Time Zone. Secondly, I did not know to plot time labels base on time stamp of my first tweet.

After much failed attempts, Jer wrote a simple formula to make this conversion happen.
t.sgDate = new java.util.Date(t.date.getTime() + (1000 * 60 * 60 * 13));

This plots the hourly time labels as advised by Jer.
myCalendar.roll(Calendar.HOUR_OF_DAY, true);

Counting Tweets
Getting an array to count tweet hourly was challenging and with help from Youjin, she wrote a simple loop to store the tweets into Hourly Arrays.

1 hour : 31 Tweets
******************************
2 hour : 225 Tweets
******************************
3 hour : 233 Tweets
******************************
4 hour : 182 Tweets
******************************
5 hour : 154 Tweets
******************************
6 hour : 89 Tweets
******************************
7 hour : 78 Tweets
******************************
8 hour : 106 Tweets
******************************
9 hour : 72 Tweets
******************************
10 hour : 57 Tweets
******************************
11 hour : 34 Tweets
******************************
12 hour : 9 Tweets
******************************
13 hour : 11 Tweets
******************************
14 hour : 1 Tweets
******************************
15 hour : 4 Tweets
******************************
16 hour : 11 Tweets
******************************
17 hour : 11 Tweets
******************************
18 hour : 21 Tweets
******************************
19 hour : 43 Tweets
******************************
20 hour : 34 Tweets
******************************
21 hour : 41 Tweets
******************************
22 hour : 18 Tweets
******************************
23 hour : 36 Tweets
******************************
23
Total :1501

Finally, visualizing the tweets. Scattered Plot & Bar Chart.

MAY 2013 – REPRESENTING

Major design overhaul from last visualization to better represent a protest event. Change in choice of font (Haettenschweiler) and color. See post on user test feedback.

New features included after user test feedback.
Search feature! Users can type in keyword to filter tweets from Shiffman’s intHash example and his help with contains(searchWord) function.
Sound recording of protest included as part for the experience.

Further iterations after second user test
Background story leading up to the protest.
Background texture added to lessen harshness of the red canvas for easier reading.
Start button added to allow more control over pace of reading.
Highlighted first three-hours 4pm – 7pm to bring emphasis to the time frame.



REFLECTIONS

Translating
Translation of data for an audience is the most challenging part of the assignment.
The visualization provide an sense of the atmosphere at the protest part for myself and in hope to engage the audience in a similar manner.
How can we get a sense of an event without being physically present.

Future Enhancement
Display the data in a way where audience can comprehend, empathize and follow even if they have no prior knowledge.
Integrate, compare and contrast this data set with the sequel protest on May Day 2013. At the same time, forming a holistic overview of the changing emotive landscape with population, migratory flow data set as well as ethnicity demographics in Singapore.
To discover if xenophobic sentiments is present in Singapore. If yes, where, how and why it exists.

DataRep & NatureofCode – Final Edition

Final Version

DataRep & NatureofCode Final – version 2.0

Version 2.0 – Two ways of visualizing tweets: Scattered Plot & Bar Chart

Version 2.1 – Showed during Nature of Code play test.

Version 2.2 – Major overhaul after feedback from play test.


New features added:

  1. Search feature to find occurring keywords
  2. Sound recording from the protest included
  3. Introductory text to provide background on the event

DataRep & NatureofCode – Play Test Feedback Summary

FEEDBACK FROM USER TEST 01 May 2013

background story

  • include title of project ***
  • include a logo for the protest
  • need an explanation of what this protest is for or about
  • provide background info about the event **

experience design

  • integrate sound feedback
  • create ambient sound scape that make us feel as if we are in the protest
  • include pictures of protest
  • include voices from the protest
  • it feels like a scattered story about feelings and reaction of people rather than reason behind the protest

visual design

  • people are lazy to read
  • hierachy of information : visual -> title -> text
  • display as 24-hour clock
  • to show a timeline of events unfolding
  • pattern lines instead of bars. show patterns not numbers
  • show the arc of events unfolding
  • illustrated through news headlines, key event markers

Interaction

  • is this interactive?
  • most user clicked on the time label instead of the tweet dots*****
  • clicked on keywords
  • when i clicked on the time label it send me random tweets from that time frame
    highlight / pop-up / enlarge one tweet display tweet as a speech bubble

visual feedback

  • underline time label if the piece is within time frame
  • individual points are hard to click
  • space out dots
  • change cursor to make it more clickable looking

Animation

  • slow down animation cos it feels like something happening in real time – do i want the sense of it happening in realtime
  • feels like static on tv
  • feels like chatter, active discussion

Content

  • is it media blackout?
  • they don’t get a sense of what is happening but they get the emotions
  • they had to figure out what is happening
  • link to transitioning.org website as footnote
  • include ticker tape at bottom of the screen

Changes

  • playable timeline
  • options to click on time label and on tweets

* = numbers of times suggestion mentioned / common behavior

DataRep & NatureofCode Final – Displaying & Counting Tweets Part 1

Plotting tweets over time

The following shows the number of tweets collected over a period of 55 hours from the beginning of the protest based on hashtag #honglimpark, #hong lim park. “Hong Lim Park” was the venue were the protest took place. Several different hashtags eg. #sevenmillioninsingapore, #whitepaper, #occupysg, #singapore were tested however #hong lim park proves to yield the most desired frequency overtime. The current sketch shows the event start time at 530pm, this will be amended to 4pm (which is the actual start time of the event) when tweets from earlier time will be included.

Important Dates:
Start of Protest: 4pm, 16 Feb 2013
Start of Collection: 530pm, 16 Feb 2013
End of Collection: 230am, 18 Feb 2013

Version 1.0


Discoveries:
Total Number of Tweets: 1758
Total Hours Passed: 55.087223


Version 1.1



Discoveries over 24 hours:

Number of Tweets between (530pm-630pm): 248
Number of Tweets between (630pm-730pm): 248
Number of Tweets between (730pm-830pm): 162
Number of Tweets between (830pm-930pm): 111
Number of Tweets between (930pm-1030pm): 90
Number of Tweets between (1030pm-1130pm): 81
Number of Tweets between (1130pm-1230am): 100
Number of Tweets between (1230am-0130pm): 58
Number of Tweets between (0130am-0230am): 45
Number of Tweets between (0230am-0330am): 16
Number of Tweets between (0330am-0430am): 15
Number of Tweets between (0430am-0530am): 3
Number of Tweets between (0530am-0630am): 2
Number of Tweets between (0630am-0730am): 3
Number of Tweets between (0730am-0830am): 8
Number of Tweets between (0830am-0930am): 7
Number of Tweets between (0930am-1030am): 18
Number of Tweets between (1030am-1130am): 23
Number of Tweets between (1130am-1230pm): 44
Number of Tweets between (1230pm-1330pm): 36
Number of Tweets between (1330pm-1430pm): 36
Number of Tweets between (1430pm-1530pm): 22
Number of Tweets between (1630pm-1730pm): 30
Number of Tweets between (1730pm-1830pm): 25

Total Number of Tweets over 24 hours: 1431
Total Number of Tweets over 12 hours: 1177

DataRep Final : Where is 1.5 – Part 1

On 29 January 2013, the Singapore government released its Population White Paper. It predicted that the population would grow by 30% to 6.9 million by 2030, with immigrants making up majority of that figure. On 16 February 2013, a protest was held against the white paper at the Speaker’s Corner in Hong Lim Park organized by Transitioning.org. It witnessed over 5,000 attendees, making it one of the largest demonstration in Singapore history.

Online-social media became the main platform for local Singaporeans to voice their opposition and grievances over the surge in foreigners in recent years. Majority felt that these sudden influx would put a strain on the existing infrastructure especially transport and property prices as well as weakening the sense of national identity.

Image from http://www.sgag.sg

What is the data?
Data collected from twitter using hashtags #sevenmillioninsingapore, #honglimpark, occupysg

What is the medium?

  1. Stage One – A visualization of 24 hour from protest at Hong Lim Park on 16 Feb 2013.
  2. Stage Two – A visualization of one month before and after the second protest at Hong Lim Park on 1 May 2013.
  3. Stage Three – A data performance using tweets collected from these event to form the script for the play.

What is the question?
How many sides of the story can I reveal from the data set? What are some recurring keywords? Why? When does these sentiments appear? How does main stream and online media cultivate these sentiments? How does sentiments manifest over time?


Inspirations

Live Improvised Performance with Real Time Twitter Feeds
Live twitter updates direct improvised performances. Characters, themes and plots are chosen by searching hashtags; performers then improvise around these updates. http://twittertheatre.co.uk/

2011 Egyptian Revolution
Mohamed Nanabhay and the team at Al Jazeera have been using chartbeat to visualized concurrent traffic during the 18 day revolution

Ghost Counties
An interactive portrait of America by visualizing the 2010 census data by Jan Willem Tulp, a freelance information visualizer based in The Netherlands. Ghost Counties is a visualization developed in Processing which analyzes the numbers of homes and vacant homes in proportion to the population of all counties in the United States of America.

natureofcode – midterm


download source code here

bird fish
Building from week 3 homework – If user fires into the sky, birds particles are created. If user fires towards the water, fish particles are created. An attempt at learning inheritance, particle systems, oscillation.


//Birds fly in air
//Fishes swim in water
//Fire to create creatures

ArrayList creatures = new ArrayList();

Water water;
Wave wave;

PVector myMouse;

PVector start;
PVector end;
PVector accl;
PVector dir;
float dist;

float scale;

boolean drag = false;

void setup() {
size(1280, 720);
smooth();

//Initialize
start = new PVector();
end = new PVector();
accl = new PVector();
scale = 0.25;

//Create water object
water = new Water(0, height*0.6, width, height*0.4, 2);

//Create wave
wave = new Wave(new PVector(0, height*0.6), width+50, 12, 500);
}

void draw() {
background(255);

//Constrain mouse within upper half of screen
constrainInAir();

//Draw water
water.display();

//Draw wave
wave.calculate();
wave.display();

//START - For all the creatures in the ArrayList update & display
for (int i = creatures.size()-1; i >= 0; i--) {
Creature c = creatures.get(i);

//Enhanced for Loop
//for (Creature c:creatures) {

if (water.contains(c)) {

//Water slows down falling of particles
PVector dragForce = water.drag(c);
c.applyForce(dragForce);

//Remove BirdParticleSystem when hit water
c.birdsystem.removeBird();

//Add FishParticleSystem when hit water
//Make many many fishes here
}

//Add Gravity here
PVector gravity = new PVector(0, 0.01*c.mass);
c.applyForce(gravity);

//Run Creatures object
c.run();
c.checkEdges();

if (c.isDead()) {
creatures.remove(i);
}
}
//END - For all the creatures in the ArrayList update & display

//Checking if mouseReleased
//Drawing creature on-the-fly
if (drag) {
fill(235, 28, 167, 100);
stroke(100, 0, 0, 100);
line(start.x, start.y, myMouse.x, myMouse.y);
float d = dist(start.x, start.y, myMouse.x, myMouse.y);
ellipse(start.x, start.y, d*scale, d*scale);
}
}

void mousePressed() {
start = new PVector(myMouse.x, myMouse.y);
drag = true;
}

void mouseReleased() {
end = new PVector(myMouse.x, myMouse.y);
drag = false;

dir = PVector.sub(end, start);
dist = dir.mag();
dir.normalize();
dir.mult(-dist*scale);

//Add new Creature instance to ArrayList
creatures.add(new Creature(start.get(), dir.get(), accl.get(), dist, dist));
}

//Constrain mouse within upper half of screen
void constrainInAir() {
myMouse = new PVector(mouseX, mouseY);
myMouse.x = constrain(myMouse.x, 0, width);
myMouse.y = constrain(myMouse.y, 0, height/2 -10);
}

video sculpture: mid-term

Final Interaction Setup

Interaction Flow-Chart Version 1.0

DataRep – assignment 2: 500,000 Hotels

Using the hotelsbase.csv file which contains 500,000 hotels:

  1. Find out what is the northernmost hotel in the data set
  2. Find out what is the most remote hotel in the data set
  3. Create an single image which shows the distribution of 1,2,3,4 & 5 star hotels, each plotted on a separate small map

  • My initial process narrowed down hotels within lat < -90° && lat > -70° working through the lists of hotels array and finding the min() among them. In the second iteration I made a comparison of lat > mostNorthern and replace mostNorthern = lat if TRUE.

  • Most Remote
    I had a hard time getting the most remote hotel, I know that I should be calculating distance between hotels and checking which is further – dist(lat.x, lat.y, hotel.x, hotel.y) but wasn’t sure where to implement the calculation.

DataRep – assignment 1: Guardian Data Blog

Twitter 100 Most Followed User
Taking data from Guardian Data Blog to discover different ways to visualize csv data. It is interesting to see how the top-followed-user – Lady Gaga, does not have the highest number of tweets. Instead, the highest-number-of-tweets goes to New York Times which happens to be on the lower end of the followed-user spectrum.

version 1

download source code here

version 2

download source code here

The second variation reveals the type of industry from each user. It this case, musicians has more online presence on Twitter when compared to the other industries.

Next Posts