Stages of Data Representation
Collecting
Cleaning
Analyzing
Visualizing
JAN 2013- UNDERSTANDING
Even though the subject matter [SAY NO to 6.9 MILLION - A PROTEST AT HONG LIM PARK, SINGAPORE] was close to heart, I am trying to make sense of what is happening at home via a medium I am not familiar with – Twitter. I am not a frequent twitter user and the use of twitter in Singapore is less prevalent compared to that of USA or even Indonesia. Nevertheless given Facebook privacy constraints and the protest at Hong Lim Park on 16 Feb 2013 being a streaming event – I chose tweets as my data set as it was then the best tool to capture an event that I could not be there physically.
JAN to FEB 2013 – COLLECTION & CLEANING
Resident Help
Craig & Ali
Craig (ITP Resident) helped me tremendously throughout. He showed me how to use CURL script to capture tweets in real-time from my browser. Ali (from ITP 2013) was so kind to give me a step-by-step walk through to set up an Amazon EC2 Server so that I could run CURL script to collect data remotely and continuously even when my computer is shut down. He even ran through basic TERMINAL command to help me fetch the file from EC2 server.
More details on collection in technical section.
Cleaning Part 1
With tweets collected from EC2 server, I though I could bring them immediately into Processing and start visualizing them. I was wrong. I have to clean them up since I am importing them as JSONObject into Processing. The cleaning stage was done in TextWrangler with help from Craig.
First step was getting rid of empty line breaks in my .txt file.
The data came in periodically, hence there were empty line breaks in between when there was no tweet stream coming in.

Second step, for it to be recognizable as JSONArray each tweet need to end with a ‘,’ comma. Hence we added a ‘,’ at the end of each line by replacing \r with ,\r

Finally, the file has to be readable as a JSONObject.
Begin the file with
{“Tweets”:[
end file with
]}
Craig was so patient in helping me getting around this part, even thought it wasn’t complex, it was very foreign to someone who has never worked with JSON.
Cleaning Part 2
The interesting part was due to the way I passed the CURL script with keyword search #hong lim park, I had tweets containing ‘hong kong’ & ‘park’ (which could have been a Korean pop-star) in them. Jer wrote a simple cleaning Processing sketch to read my raw tweet file and write it into a new cleanTweet file removing tweets containing irrelevant words. [Enter Processing sketch here]

MARCH 2013 – ANALYZING
Then Craig show me a example to import JSON file into Processing. Simply printing tweets as text on screen made me so happy.
[Show URL here]

APRIL 2013 – VISUALIZING
This was my very first visualization of my tweets after two-months of learning to collect, clean & parse data. Displaying them was harder that I expected.
Firstly, the timestamp on my tweets was Greenwich Time. I need to convert them to Singapore Time Zone. Secondly, I did not know to plot time labels base on time stamp of my first tweet.
After much failed attempts, Jer wrote a simple formula to make this conversion happen.
t.sgDate = new java.util.Date(t.date.getTime() + (1000 * 60 * 60 * 13));
This plots the hourly time labels as advised by Jer.
myCalendar.roll(Calendar.HOUR_OF_DAY, true);

Counting Tweets
Getting an array to count tweet hourly was challenging and with help from Youjin, she wrote a simple loop to store the tweets into Hourly Arrays.
1 hour : 31 Tweets
******************************
2 hour : 225 Tweets
******************************
3 hour : 233 Tweets
******************************
4 hour : 182 Tweets
******************************
5 hour : 154 Tweets
******************************
6 hour : 89 Tweets
******************************
7 hour : 78 Tweets
******************************
8 hour : 106 Tweets
******************************
9 hour : 72 Tweets
******************************
10 hour : 57 Tweets
******************************
11 hour : 34 Tweets
******************************
12 hour : 9 Tweets
******************************
13 hour : 11 Tweets
******************************
14 hour : 1 Tweets
******************************
15 hour : 4 Tweets
******************************
16 hour : 11 Tweets
******************************
17 hour : 11 Tweets
******************************
18 hour : 21 Tweets
******************************
19 hour : 43 Tweets
******************************
20 hour : 34 Tweets
******************************
21 hour : 41 Tweets
******************************
22 hour : 18 Tweets
******************************
23 hour : 36 Tweets
******************************
23
Total :1501
Finally, visualizing the tweets. Scattered Plot & Bar Chart.


MAY 2013 – REPRESENTING
Major design overhaul from last visualization to better represent a protest event. Change in choice of font (Haettenschweiler) and color. See post on user test feedback.

New features included after user test feedback.
Search feature! Users can type in keyword to filter tweets from Shiffman’s intHash example and his help with contains(searchWord) function.
Sound recording of protest included as part for the experience.

Further iterations after second user test
Background story leading up to the protest.
Background texture added to lessen harshness of the red canvas for easier reading.
Start button added to allow more control over pace of reading.
Highlighted first three-hours 4pm – 7pm to bring emphasis to the time frame.




REFLECTIONS
Translating
Translation of data for an audience is the most challenging part of the assignment.
The visualization provide an sense of the atmosphere at the protest part for myself and in hope to engage the audience in a similar manner.
How can we get a sense of an event without being physically present.
Future Enhancement
Display the data in a way where audience can comprehend, empathize and follow even if they have no prior knowledge.
Integrate, compare and contrast this data set with the sequel protest on May Day 2013. At the same time, forming a holistic overview of the changing emotive landscape with population, migratory flow data set as well as ethnicity demographics in Singapore.
To discover if xenophobic sentiments is present in Singapore. If yes, where, how and why it exists.










