New Frontiers for (Digital) Sociology?

In 2009 Savage and Burrows challenged sociology to modernise its methodological repertoires or face obsolescence. Digital Sociology is gathering momentum and developing into a promising riposte to this challenge.

Since Digital Sociology is a new area of exploration, interested sociologists are setting about defining its content and scope. This blog post is intended to make a small contribution to this discussion by proposing Digital Sociology should be concerned with emerging sematic web technologies.

Web 1.0 was made possible by Sir Tim Berners-Lee’s creation of the http protocol which enabled us to retrieve of a copy of a document by accessing its address on a network. Web 2.0 was made by us; the content providers.

To realise the semantic web or Web 3.0 he and his colleagues envision everything (documents, data, inanimate objects, even us) having an address on a network and artificial intelligence having the ability to understand and ‘learn’ relationships between these addressed entities. So Web 3.0 will be created by machines as we become data sources on the network (incidentally, this is one of the reasons ANT offers useful tools in this field).

Understandably, owing to scale of its ambition, the semantic web is yet to be realised. However many of the technologies that make the semantic web at least technically feasible have powerful implications for the way we and the computers we programme, collect, store, use, and share data.

These include linked data and ontologies. Those of you familiar with relational databases will know about pieces of data, well….having relations. For example if you and your partner are registered on the DVLA’s database, even if you live at different addresses, it will be easy for the database to establish you have a relationship with your partner because your records share your car’s registration number.

The aim of linked data, via data structuring models such as RDF, is to connect and consolidate records that exist in different databases; even data in different formats (if linked data was fully realised the Web would become one huge distributed database). The query language SPQARL enables us to query across databases by mining linked data. For example, if the DVLA and your university stored its data as linked data it will at least be technically possible, via related records like your car’s registration, to find out which department you work in. You may park your car at the university which stores your car registration along with your employment records. Theoretically a search engine could pull all the details about your car and employment in one data grab. Now imagine all the possible connections that could be made using all the digital data stored about you on all the databases out there.

Ontologies introduce another layer of functionality to this scenario. They are not ontologies as we understand them in sociology; in computer science it’s a framework for organising data in a way that gives it meaning. Ontologies allow computers to ‘learn’ the semantics of relationships between data. For the more technically minded here’s how the BBC are using an ontology to characterise news stories.

All that can be known in the previous example is that you and your partner share a registration number; computers can’t know or think that means you both drive that car. However, an ontology introduces other variables to the data relations. It could define you as a driver, and your partner as a driver, and then state you both own only one car. The computer would then ‘know’ you both drive the same car. As more variables are added the computer builds a richer picture until it’s able to imply relationships between data. Given, you and your partner are living at separate addresses and the car is sports car, add to that your age and a recent trip to Ibiza the computer could conclude you are having a mid-life crisis!

Obviously this only works on open data, correctly formatted data and a strict legal framework protects us from such sharing. However the use of this technology has important implications for the way data can be exploited.

Take for example Tesco club card data collected by DunnHumby. If a Tesco customer buys the Sun, razors, a ready meal for one, a Katie Price fitness video, some baby clothes, cigarettes, and multi-pack lager when England games are on the reasoning software can, as  it gains knowledge, produce a sophisticated profile of this customer; his social class, relationship status, lifestyle etc. Again, DummHumby follows strict guidelines for anonomysing this data but more semantic capabilities we add the more we are able, via machine processing, to reverse engineer aspects of identity. There’s no doubt that we should be worried about the personal data we give away to Facebook et al. for free but this technology could effectively mean the end of anonymous data. The better semantic A.I. technology becomes, the better it becomes at de-anonomysing data.

Returning to Savage and Burrows’ challenge, these technologies offer sociology new methodological tools; more ‘ intelligent’ data collection and processing but also a new, fertile territory for digital sociologists to analyse and critique.

P.S. This is only a fairly frivolous primer, for a more technical academic guide to these technologies please see;

Digital Futures? Sociological Challenges and Opportunities in the Emergent Challenges and Semantic Web Sociology 2013 47: 173 Susan Halford, Catherine Pope and Mark Weal Semantic Web DOI: 10.1177/0038038512453798

Huw Davies is a 2nd year, interdisciplinary, PhD student at the University of Southampton attempting to synthesize the best of sociology and computer science under the banner of Web Science. More info on his Twitter profile @huwcdavies


Categories: Digital Sociology


Leave a Reply

Your email address will not be published. Required fields are marked *