The big data brain drain

This provocative post by Jake VanderPlas argues that “the skills required to be a successful scientific researcher are increasingly indistinguishable from the skills required to be successful in industry” and that important implications follow from this for the future of higher education. There is a “new breed of scientist” that the academic world increasingly struggles to retain:

The increasing data-centeredness of science, however, is already leading to new approaches to problems: in the era of the LHC and LSST, the most exciting research is being driven by those who have the expertise to apply high-performance data-parallel statistical algorithms to ask interesting questions of huge, community-generated datasets. It is driven by the application of new statistical approaches, of new machine learning algorithms, and of new and faster codes to repeat classic analyses at a previously unattainable scale. In short, the new breed of scientist must be a broadly-trained expert in statistics, in computing, in algorithm-building, in software design, and (perhaps as an afterthought) in domain knowledge as well. From particle physics to genomics to biochemistry to neuroscience to oceanography to atmospheric physics and everywhere in-between, research is increasingly data-driven, and the pace of data collection shows no sign of abating.

The development of specialised software is becoming ever more integral to the cumulative progress of scientific inquiry. So too is the openness of that software, with privatised development processes intensifying a broader crisis of reproducibility that afflicts contemporary science. The problem is that not only does higher education not incentivise researchers to spend time and energy on these increasingly important pursuits, it actively disincentivizes them from doing so:

This brings us to Academia’s core problem: despite the centrality of well-documented, well-written software to the current paradigm of scientific research, academia has been singularly successful at discouraging these very practices that would contribute to its success. In the “publish-or-perish” model which dominates most research universities, any time spent building and documenting software tools is time spent not writing research papers, which are the primary currency of the academic reward structure. As a result, except in certain exceptional circumstances, those who focus on reproducible and open software are less likely to build the resume required for promotion within the academic system. And those poor souls whose gifts lie in scientific software development rather than the writing of research papers will mostly find themselves on the margins of the academic community.

To an extent, disconnects like this have always existed. The academic system has always rewarded some skills at the expense of others: teaching is a classic example of an essential skill which is perennially marginalized. But there are two main differences that make the current discussion more worrying:

  1. As I’ve mentioned, the skills now slipping through the cracks of the academic reward structure are the very skills required for the success of modern research.

  2. With virtually the entire world utilizing the tools of data-intensive discovery, the same skills academia now ignores and devalues are precisely the skills which are most valued and rewarded within industry.

The result of this perfect storm is that skilled researchers feel an insidious gradient out of research and into industry jobs. While software-focused jobs do exist within academia, they tend to be lower-paid positions without the prestige and opportunity for advancement found in the tenure track. Industry is highly attractive: it is addressing interesting and pressing problems; it offers good pay and benefits; it offers a path out of the migratory rat-wheel of temporary postdoctoral positions, and often even encourages research and publication in fundamental topics. Most importantly, perhaps, industry offers positions with a real possibility for prestige and career advancement. It’s really a wonder that any of us stay in the academy at all

What do you think of this argument? We’d love to hear responses, particularly from the perspective of the more qualitatively orientated social sciences. Get in touch if you’d be interested in writing something.

Categories: Higher Education, Uncategorized

Tags: , ,

1 reply »

  1. The question which seems to be absent, or implicitly answered in the positive, is what are the real benefits of big-data, or more important still, who does big-data serve? This seems to be glossed over in part due to the author’s field in which he is experiencing big data (astrophysics), which naturally has a different agenda in comparison with corporations in terms of why they are interested in big data. For example, big-data is an increasingly utilized tool in supermarkets, having both the effect of ensuring the consumers demands are satisfied (e.g. keeping track of floor stock and ensuring it is efficiently replaced) but also the more pernicious effect of CREATING, arguably unhealthy, demand in consumers (e.g. strategic positioning/supply of items in relation to weather conditions). One must then ask where does the consumers’ conscious decision-making end and the unconscious process begin. Or, in other words, when is big data serving to make its customers happy by meeting their premeditated demands and when is it manipulating its customers? Of course, marketing is all about manipulation, so this may be considered a moot point.

    The point I wish to raise is really: is big-data a status-quo perpetuating wolf, posing as the innovative, making-life-easier-for-all sheep?

    More data in certain fields may allow more effective decision-making, yet it also has the effect of making us complacent and overly reliant of numbers to give us the illusion of certainty. We may thus be making signal out of noise, whilst overlooking other, more important factors. It is here that the more qualitative oriented modes of research and inquiry seem to be at risk of being marginalised, as big-data serves to put quantitative research on an even higher pedestal, making qualitative research the progressively uglier step child.

    On a less related note, I think another interesting sociological dimension of the big-data argument (or reflex, for it seems the concept has its own momentum) is how it essentially can be read as the shifting from humans creating inert,passive forms of technology to serve our needs and make life easier and more productive (beginning in the stone age) to a time when technology becomes used, if not to create our needs, at least to predict our needs and in turn allow us to be more productive in other non-related areas (or so the argument goes). But is it not closer to the truth that certain applications of big-data makes us unnecessarily more consumptive and less in control of our lives? One the one hand, it could be said that the control is handed over to large corporations and marketers. However, more interestingly, if for no other reason than its conspiratorial connotations, is what the potential power of big data says about our relationship to technology – the power ascribed to big data hints at the idea of singularity – the time when technology attains greater than human intelligence. With so much data on ourselves and our behaviour, which is used to in turn predict, modify and inform subsequent behaviour, we do in a very real sense become slaves to technology, albeit in a more benign way. When everything runs so smoothly that we have little more to do than enjoy the ride and little need to think, what does that make us and what does that make the ‘machines’?

Leave a Reply

Your email address will not be published. Required fields are marked *