Monday, October 5, 2015

Social Science in the Age of Big Data

While big data provide us new opportunities to make sense of the world, they do not "speak for themselves" and do not override the need for theoretical models. Contrary to Anderson's argument, theory still matters as it "can help discriminate noise from signal, and provide the right context for the interpretation" (Gonzalez-Bailon, 2013, p. 9). For example, we can use big data to identify key actors and flows of information in digital networks, but based on the findings we may construct different stories. In order to make sense of the findings, we need to use theoretical lenses. 
It was also interesting to read that even though there are different types of actors in networked spaces, the public opinion is still overwhelmingly shaped by a minority of actors. For example, as Gonzalez-Bailon mentioned, Wu et al. (2011) found that about 0.05 percent of Twitter users account for almost half of public opinion. However, I suspect that it may vary in different contexts. Therefore, I think that further research is needed.


Readings: Big Data

I read the Wired article first and got really depressed until I got to the Gonzales paper. I tend to agree with Gonzales more on Big Data, in that yes data tracking supersedes some of our previous methods but that it is still important to have human researchers that can provide context. In particular:
Only when the data are assembled in the right way, by focusing on the signal and disregarding the noise, can we build a story that makes substantive sense.
Anderson doesn't agree for the necessity of theory and models to help explain data but I still side with Gonzales that there needs to be a subjective approach for us to truly understand the data especially when it comes to Social Science. Even in the hard sciences, I don't understand Anderson's example of the research "discovering" new species that he knows nothing about. What is the point of having "a statistical blip - A unique sequence that, being unlike any other sequence in the database, must represent a new species?" I would imagine one would need to know the details and not just be satisfied with a vague statistic that there probably is something. Did we send a rover to Mars to look for water or were we satisfied with the statistical probability that there would be water? It may not be completely related, but I saw an interesting article last week about the use of "small data," mainly in content-based apps like Netflix and Tinder. The author talks about how we can't handle large deluges of data and instead respond better to "card-based" types of apps where we're presented with a series of simple information.

Big Data and Audience Research


I agree with Chris Anderson's oversimplified argument that big data has ushered the end of the scientific method. Though I imagine in the 7 years following his blog post his position has possibly evolved, his article assumes that big data is simply available at our disposal. However, "big data," especially related to to audiences is not easily at the hands of researchers. It sits behind the gated server walls of Facebook, it needs to be munged and reorganized, and there is a relatively high level of human capital that goes in to analyzing this big data. The other thing to note about audience data is that natural language processing of peoples' conversations, social media postings, etc., is still relatively crude. Machine language and NLP have come a long way but in order to get high level results isn't a cake walk.

I think having access to large datasets is excellent, but as Jen Schradie of UC Berkeley wrote in this blog post , "I am not suggesting fishing expeditions in lieu of hypothesis testing nor any Anderson-esque junking of the scientific method. The numbers do not speak for themselves. Instead, it is our job as social scientists to understand the difference between the data, whatever its size, and the method, whatever that may be." I agree with Schradie. We can't simply starting poking at the data. Understanding a group of people, a place, a set of ideas and having some type of expertise is essential otherwise we lack the cultural capital to really engage the "big data" we've collected. 

Audience measurement with big data; Google Trends


I don't think the availability of big data will put an end to the need for theory. They actually go hand-in-hand as big data is meaningless without interpretation of what it means. As Gonzales (2013) mentions in his article, "disentangling signal from noise is still a subjective matter, as is providing the context that will help identify meaningful correlations and discard those that are unsubstantial." An abundance of data is good but an abundance of meaningless data is not. Therefore, the WIRED article by Chris Anderson claiming that Google conquered advertising simply with mathematics is disputable. It may be true that Google analytic offers a tool to analyze big data but to assume that this was done without any knowledge of the "culture and convention of advertising" is completely wrong. As we learned in class, we have to know what keywords to look for to make an effective campaign. Also, looking only at SEO disregards integrated marketing strategies and would do no more than get exposure. Furthermore, without knowledge of consumers and what they want, how can we arrive on a successful keyword/campaign that matters? I think that Google has provided a better way to determine the performance of campaigns, not how to actually run them.

Sunday, October 4, 2015

Social media's impact on television watching

Hi all -

Here's an article by Farhad Manjoo in the NYT about social media and tv that I thought was relevant to this class:

http://www.nytimes.com/2015/10/05/business/media/social-media-takes-television-back-in-time.html

-Krishnan

Saturday, October 3, 2015

Is data the new oil in the information era?

The article by Anderson extols Google’s analytical tools and how successful they have been in making profits for the company. I find this approach of big data/Petabytes analysis, such as Netflix’s Cinematch, quite useful for audience research. But wouldn’t it polarize the audience? We can argue indefinitely whether this mathematical/defying-traditional-scientific-method approach is right or wrong because it’s an epistemological matter.
Google’s founding philosophy is that they don’t care about human beings and their causal relationships, cultures, contexts, behaviors, motivations, etc.; all they care about is correlation. If the statistics of incoming links say this page is better than that page, then that’s good enough. This might be useful for commercial business people or economists. The epistemological problem here is that studying about social science, human being and communication doesn’t work like that.
Attempting to know about human beings and their driving motivations, cultures and contexts matter because of the importance of accuracy and outliers. For example, Google translator is anything but useful. Google translator uses an enormous amount of data and textual references to translate one language to another. But if two languages are radically different in terms of grammar, structure and nuances, it gets so inaccurate to an extent that it’s just not useful at all. Also, sometimes it’s the outlier that really matters in the realms of society and communication.
The article contends that we don’t need to know why people do what they do as long as they do it. It seems to criticize traditional inductive scientific reasoning, but it neglects to mention the probability of deductive reasoning. I’m not sure whether the author is equating statistical algorithms to interpretations of qualitative researchers when they analyze their corpus of research data.

Thursday, October 1, 2015

Most millennials are willing to pay for content, but not so much for news

http://www.niemanlab.org/2015/09/most-millennials-are-willing-to-pay-for-content-but-not-so-much-for-news/
More millennials say they’ve paid for print magazines (21 percent) and newspapers (15 percent) than digital magazines (11 percent) and online newspaper content (10 percent).