Sunday, 11 July 2010

Social Data Browsing: Dumpster

12 February 2006
Lev Manovich
Consider the following paradox. The same few decades of the nineteenth century that gave us the most detailed artistic representations of human emotions and inner feelings, including romantic love, also saw the rise of statistical and sociological imagination. While Flaubert and Tolstoy were putting the emotions of their heroines under the artistic microscope of their prose, a different paradigm was emerging in which the individuals were nothing but dots contributing to a social law, a pattern, or a distribution. In 1838 August Compte coined the term ‘sociology’ for the new discipline that was to study the laws governing the life of society. (He also proposed the term ‘social physics'). According to another founder of the discipline, Emile Durkheim, sociology is the science concerned with ‘social facts’ – phenomena that have an independent and objective existence separate from the actions of the individuals. In his major work Suicide (1897) Durkheim set out to demonstrate how such seemingly individual acts as suicides in fact follow general statistical patterns and can be explained in terms of structural forces that operate in society at large. Compare this to Anna Karenina (1877) where Tolstoy meticulously follows the last hours and minutes of Anna’s life with a kind of anti-sociological gaze – looking at her not from the outside as a social scientist, but on the contrary, depicting how the outside world appears as seen by her.
In general, representational art has depicted individuals rather than social groups, classes, and institutions. Even in the case of modern realist literature and painting, including socialist realism, which consciously aimed to represent social types and classes, what the writers and painters actually show us are individual human beings. In other words, regardless of whether a painting or a sculpture is named ‘worker’, ‘farmer’, ‘miner’ etc, it shows a single concrete individual. And when artists have tried visually to represent really big groups, the typical result has been a crowd in which individual differences are hard to read. The same relationships between the zoom function and the level of detail holds today – consider the individual figures in Mathew Barney’s The Cremaster Cycle versus the groups of veiled women in the films by Shirin Neshat, or the panoramic views of Andreas Gursky which reduce individuals to swirling dots.
It appears that we may be dealing with some essential characteristic of art. Or maybe this limitation is simply a general characteristic of all images in general – their inability to represent abstract concepts and logical relationships. After all, if in the course of evolution human species developed two different representations systems – one linguistic and one image-based – it would make sense that they should complement each other, and that images would not do what language does best.
But what if this limitation is simply a result of the representational techniques that artists had at their disposal? Consider, for instance, how the techniques of films invented in the first two decades of the twentieth century – editing and different types of shots – have allowed film directors to alternate between close-ups showing individuals and long shots showing the groups to which these individuals belong. Given this example, what can we expect from computers? Can computer media be used to create artistic representations that link the individual and the social without subsuming one in the other, i.e. the particular in the general? If we consider the range of computer techniques available for organising and viewing data, things look quite encouraging. We can switch between multiple views of the same data, traverse the data at different scales, and move between multiple media linked together. And we can do this in near or close to real time. We can also instruct software to search through and mine very large amounts of data – such as the data produced by the millions of real people who engage in online chat, write blogs, send emails, upload their photos on Flickr and so on. What types of representation can be created if we combine these computer techniques and new ways of gathering data as well as of structuring and displaying it?
Although The Dumpster by Golan Levin (working with Kamal Nigam and Jonathan Feinberg) can be related to traditional genres such as portraiture or documentary, as well as established new-media genres such as visualisation and database art, it is something new and different. I would like to call it a ‘social data browser’. It allows you to navigate between the intimate details of people’s experiences and the larger social groupings. The particular and the general are presented simultaneously, without one being sacrificed to the other.
The Dumpster application window shows a large ‘crowd’ of circles at the same time. While in a typical painting individual differences would be lost at this scale, here you can click on any circle and read the corresponding blog fragment. And this is just a beginning. Consider the way in which Levin structures the navigation. In typical hypermedia you move horizontally between pages or scenes connected by links. In typical information visualisation you ‘move upward’, so to speak – from the level of individual data to larger patterns that become visible when the numerous data points are turned into a single image or a shape. But in Levin’s group portrait, you are encouraged to navigate both horizontally, vertically, and diagonally between the particular and the general. You can, for example, simply click on different circles, jumping from one breakup case to another and randomly explore the overall data space. Or you can explore the circles that are similar in colour – which means that the corresponding postings are similar in some ways. Or you can explore the circles that have an opposite color and thus belong to a different grouping. In short, the seemingly incompatible points of view of Tolstoy and Durkheim – the subjective experience and the social facts – are brought together via the particular information architecture and navigation design of The Dumpster.
But if we simply limit ourselves to describing the work as it appears visually, we will miss the crucial characteristics of the social data browser constructed by Levin. We need to consider how the data presented in The Dumpster was obtained and processed before it was presented to us. Using a variety of methods, Levin and his collaborators have filtered the huge data space of online blogs isolating the postings from 2005 where teenagers narrated their breakups. The result was 20,000 postings describing ‘confirmed’ breakups. These postings were subjected to further analysis in order to derive various metadata about them: reasons for the break-up, who broke up with whom, the age and sex of the author, as well as their emotional state. Most of this metadata was not explicitly contained in the postings but is inferred with a high degree of probability by the project’s authors.
The result is a group portrait appropriate for the age of data mining, large databases, and global surveillance programs such as Echelon. The group ‘painted’ by The Dumpster did not commission this portrait itself but rather was created by the artist by searching though the digital traces that people leave online. The ordering of individual members within this very large group of 20,000 people is the result of mathematical analysis. As a result, each individual breakup experience becomes a point in a multi-dimensional space that we are invited to explore. In short, we are invited to mine the data prepared by the project’s authors who used sophisticated computer methods.
More than two decades ago, William Gibson accurately predicted the cyberculture of the 1990s with its idea of virtual navigation through data. By naming his recent novel Pattern Recognition, Gibson points to the new period we are living in now. It is a period when more prosaic but ultimately more consequential ways of exploring data have come to the forefront, including search engines available to the masses and data mining as used by companies and government agencies. The Dumpster uses industrial strength data gathering and data analysis strategies that normally are not easily accessible for single individuals to show how they result in new kinds of social representations.
Manovich, L. (2006) Social Data Browsing. [Online] [11/07/2010]

Where did the breakup data come from?
The Dumpster visualizes a fixed collection of 20,000 romantic breakups that occurred during 2005. These breakups were obtained from web logs ("blogs") posted by people on the Internet. At least half of the authors of these breakups were American teenagers between the ages of 13 and 19. Approxmately seventy percent of the breakup authors were identified as female, while roughly fifteen percent were identified as male.
The breakup data for the Dumpster was kindly provided by Intelliseek, the company behind BlogPulse. Blog posts were collected by issuing queries to BlogPulse's search engine using words and phrases indicative of breakups. For example, posts containing phrases such as "broke up" or "dumped me" were considered likely initial candidates. The resulting several hundred thousand posts were scored by a machine learning classifier trained to recognize posts about specifically romantic breakups, in an effort to eliminate (for example) posts about rock bands breaking up. From the remainder, the twenty thousand posts with the highest classification scores were selected for inclusion in the interactive visualization.
Using custom language-analysis software, the text of each post was computationally evaluated in order to determine many different characteristics of the breakup and the just-ended relationship. These included factual characteristics (e.g. was someone in the relationship cheating? Did the author instigate the breakup, or did the author's partner?), emotional characteristics (e.g. does the author appear to be angry, depressed, or relieved?), and other common features of romantic breakups (e.g. was this a "repeat breakup"? Have the former partners decided to remain friends?). Where possible, the age and gender of the author of the post were extracted and/or determined. All of these characteristics are then used as a means for computing and indicating the "similarity" of breakups within the interactive interface.
The Dumpster: A Portrait of Romantic Breakups Collected from Blogs in 2005. [11/07/2010]