Saturday, April 21, 2012

Feynman on Curiosity

This video is one of the effective advertisements I've seen for the value of gathering and systematizing empirical knowledge, by none other than the late Richard Feynman:

Also, since you are probably wondering: the music is Primavera by Ludovico Einaudi.

Tuesday, April 17, 2012

The Future of the Academy in 2032

Just before he died, for a few years I helped the great sociologist Dan Bell with using his computer, and as a result I got to know him very well. One thing I learned from him (besides the distinction between "criticism" and "critique") is the usefulness of prediction as an endeavor in itself (as opposed to explanation). In this spirit, I offer five predictions about the future of the academy in 2032:
  1. First, despite opposition from many established institutions, there will be an enormous increase in open-source education. Classes on any topic will be available online for free, with lecture notes, videos, presentations, and chat services (with other students) available to anyone with a computer. Exemplars of this trend include MIT OpenCourseWare, Khan Academy, and
  2. Second, academic publishing will be increasingly online, with peer review a continuous process. Rather than books and articles published at one time in paper form after a process of peer review, academic projects will be ongoing, process-oriented, available online, and subjected to a continual process of peer review. In essence, everything that academics produce will be works-in-progress, and updated when errors are noted. Early indications of this trend include the NBER archive and
  3. Third,due to technological changes and increased monitoring of people's activity, academics will have to be adept with managing and analyzing big data. Common statistical methods will often be difficult to use on such large data sets, straining the computational capacities of computers. While not common in the academy yet, big data is one of the top buzzwords of 2012, and I expect this to spread to academic work relatively soon. An exemplar of this kind of academic work is the Google ngrams project. (One danger, however, is that private corporations might be hostile to information-sharing, and the values of profit-making may severely inhibit the availability of big data to academics.)
  4. Fourth, big ideas will actually be in greater demand in the future. Precisely because there will increasingly be an excess of information, grand theories and master narratives will be increasingly desired to help guide attention, avoid fragmentation of different research traditions, and unify otherwise disparate theories. For example, Josh Tenenbaum's efforts at unifying artificial intelligence (which suffers from disciplinary fragmentation) with probabilistic graphical models is a promising endeavor.
  5. Finally, the skills in demand will be increasingly modular rather than topical. For example, as part of the Cold War in the 1960s, the United States government funded various "area studies" programs to educate Americans on the traditions, customs, and practices of various geographic regions around the world. In the future, there will be less emphasis on this kind of topical knowledge, and greater emphasis on modular skills  such as critical analysis of any kind of texts or arguments, understanding the basic structures of any set of languages, and gathering and analyzing various kinds of qualitative and quantitative data. 
To the extent any of these predictions are correct, sociology is particularly well-suited to take advantage of these trends. Sociologists are generally supportive of the democratic, inclusive principles of open-source education and online publishing, and sociology has an unparalleled tradition of big ideas. Moreover, modularity is ingrained in the discipline; in fact, sociology is almost by definition a modular discipline, inasmuch sociology is an approach to a particular subject matter rather a particular subject matter per se.

Wednesday, April 11, 2012

The Quantified Self

This site on the quantified self shows a small but growing revolution: using quantitative data for self-improvement. I can only expect this to grow in importance. Despite their popularity, means, modes, medians (in their conditional variants as well) simply capture central tendencies, and that there is nearly always substantial heterogeneity within and across populations. Accordingly, basic proscriptions and prescriptions, such as "Take an aspirin a day" may not apply to all individuals, and thus individual tracking is potentially extremely useful. For example, see Seth Robert's blog post on how eating butter might improve cognitive functioning (for him, at the very least).

Misc. Lectures Online

I highly recommend the following lectures for anyone interested in social science research using quantitative methods:
  • The late Sam Roweis (a brilliant educator who died unexpectedly several years ago) gives a superb introduction to machine learning and probabilistic graphical models here, complete with lecture slides. In case you aren't aware, probabilistic graphical models are in effect a unifying approach to a wide range of statistical models, from hidden Markov models to hierarchical Bayesian models.
  • Salman Khan, the MIT graduate who started the eponymous Khan Academy, offers a superb series of lectures on probability, available here. Probability is actually the foundation for quantitative research in the social sciences, since much of the goal of inference is to quantify uncertainty through the use of probability distributions such as the Gaussian, Poisson, Gamma, and so forth.
  • Although for programmers in python, the computer scientist Allen Downey gives a thorough, intuitive, and entertaining overview of Bayesian analysis, which you can view in its entirety here.

Tuesday, April 10, 2012

Biplots in Stata

I've been examining qualitative data using biplots, which are readily available in Stata using Ulrich Kohler's excellent package. For example, here is a biplot of a rich data set of poor white men on variables such as drug use and other risk factors:
There are several useful features of biplots: first, they concisely summarize a wealth of information in one graph, including relationships among both cases and variables; second, in line with Tufte's dictum, biplots have a high data-to-ink ratio; third, since cases are not directly modeled, biplots help with integrating qualitative and quantitative data (i.e., cases are not "hidden" by a hyperplane, as in a classical linear regression model); finally, there are absolutely no frequentist statistics to deceive the analyst.

Wednesday, April 04, 2012

Top 5 Unsolved Sociological Questions

Physicists and other natural scientists often spend time specifying and focusing attention on unsolved questions, such as how particles obtain mass, the origins of dark matter, and how time is related to entropy. In general, I think it's a good practice for any field of endeavor to revisit the questions that are stubbornly and perplexing unsolved, including sociology. Thus, in this spirit of refining our ignorance (and clarifying our sociological "known unknowns"), here is my list of the top unsolved sociological questions of the early 21st century:
  • What is causing the unprecedented, nearly-monotonic drop in crime rates across the developed world over the last several decades? As the NYT mentions, this question has been perplexing criminologists and sociologists, and everything from changing demographics to the legalization of abortion has been cited (although the latter cause is most probably incorrect, pace Steven Levitt).
  • Why are various forms of inequality increasing across the developed world, from Sweden to the United States, since the early 1970s? Although many sociologists and economists have focused on technological change, immigration rates, and de-unionization, deeper causes (such as those related to political institutions or social structures) remain largely unexplored.
  • Why do so many cultural and social phenomena (such as the frequency of words in the English language, size of cities across the globe, and amount of wealth across individuals) follow power-law distributions when plotted by size (or frequency) and rank? Explanations have focused on preferential attachment (popularly articulated by Herbert Simon) and information efficiency costs (as outlined by Benoit Mandelbrot), but thus far we have no conclusive evidence for favoring any particular mechanism over others.
  • How does culture (defined as values, norms, attitudes, and beliefs) result in different economic and political outcomes across groups? Since the time of Max Weber, the causal effect of culture on human behavior has baffled sociologists and other social scientists, in part because of the apparent intractability of measuring culture and clearly linking it to economic and political outcomes. As a result, answering this question is an open, fertile area of empirical and theoretical exploration.
  • Why is the United States unusually politically conservative and religious compared to other developed countries? At least since Tocqueville sociologists, including the late Seymour Martin Lipset, have puzzled over why the United States has exhibited a kind of cultural "exceptionalism" (in the non-normative sense), with relatively high levels of religiosity and political conservatism. Although many explanations have been offered, a satisfactory account has remained stubbornly elusive.

Tuesday, April 03, 2012

Making Books

This video makes me wonder how, although technology has innumerable benefits, some aspects of culture will be lost if we don't retain at least some working knowledge of older technologies:

Monday, April 02, 2012

The Limits of Formal Theory in Sociology

Sociologists and economists often disagree about the role of so-called "formal" theory in understanding social behavior. For the most part, sociologists are much more skeptical that mathematical models (with little reference to data) can clearly and accurately describe, explain, and predict how humans act, think, and feel. I take a middle-of-the-road position: such models of human behavior can be helpful for illuminating arguments, but often they are such crude approximations of reality that they can obscure what is actually going on. I'm reminded of Max Tegmark's brilliant article on the mathematical universe hypothesis, in which he claims that the universe is a giant mathematical structure. In fact, the disciplines can be understood in reference to derivations from known mathematical laws, as shown in this diagram:
The problem, as Tegmark suggests in this diagram, is that until we understand how to reconcile mathematically general relativity and quantum field theory, as well as how this reconciled theory is related to other fields in physics and related fields, mathematizing sociology will at best be a set of (possibly crude) approximations of reality.