Monday, September 22, 2008
Who Should Chose Your Cultural Identity?
While people claim that they abhor prejudice they seem to find nothing wrong with imposing cultural behaviors and norms on their children. Implicit in this is the belief that there is something special about their culture which is to be preferred in the education of their children. Then they complain when others stereotype them.
Why should a child have to adopt the rituals and conventions of their parents? Instead of being required to practice these rites why shouldn't they be free to chose their own? Why should I learn the music of my country if I prefer that of another time and place?
I understand that one has to teach children something. I also understand the motivation which makes the older generation pass on its norms. If your children don't keep the flame burning that your life was for naught and the fact of your existence vanishes to future generations. No one wants to be forgotten.
Nevertheless there are many reasons to oppose such teaching. It breeds separatism, dislike of other and narrow-mindedness. It also preserves old feuds between groups that have no bearing on those now living. What is the point of commemorating some distant victory if not to stick it to the other group that lost? Many cultures contain myths and falsehoods which have been passed down from less well informed generations. These obsolete beliefs stifle progress by making questioning a social taboo. In some cultures such questioning can be severely punished, even by death.
Being forced to adopt the cultural norms of the group you were born into also encourages discrimination. If your parents brought you up to be an Irish-American than that is what others will tend to identify you as. But suppose you prefer Spanish or French culture? Why should an accident of birth brand you involuntarily?
Some societies try to teach "multi-culturalism" believing that this will lessen prejudice. But the students sill look at it from within the framework of their own background. It's like observing the strange natives on some anthropological expedition - curious, but not for me.
There have been some shifts in the US. For example few people nowadays have the same attitudes towards Asian-Americans as existed 100 years ago. Many Asian-Americans carry little of the cultural baggage of their ancestor's home countries. They have become "white" Americans. I think a similar thing may be happening in the EU. Young people who travel from one home country to another tend to become more cosmopolitan and less provincial. There ability to speak several languages also helps.
I realize that putting changes into practice is a near-impossible task, but I think it is more a case of changing attitudes than of actual steps. Children are always going to learn the culture and language that their parents speak (although second generation immigrants tend to do this less, and by the third generation many can't speak their grandparent's original language). Still cultural practices are passed down as part of the "heritage".
In the schools mutli-culturalism needs to be decoupled from being based upon ethnic distinctions and replaced with the teaching of more universal characteristics.
Many people grow up and explicitly reject their parent's background, but the prejudice of society still tries to force them into these categories. I claim that you are what your enemies call you. The harm can be most easily seen from the extreme example: the most acculturated and secular "Jewish" Germans still ended up in the ovens.
It is a sorry commentary on human nature if the only way people can define themselves is by pushing their prejudices onto their children.
Why should a child have to adopt the rituals and conventions of their parents? Instead of being required to practice these rites why shouldn't they be free to chose their own? Why should I learn the music of my country if I prefer that of another time and place?
I understand that one has to teach children something. I also understand the motivation which makes the older generation pass on its norms. If your children don't keep the flame burning that your life was for naught and the fact of your existence vanishes to future generations. No one wants to be forgotten.
Nevertheless there are many reasons to oppose such teaching. It breeds separatism, dislike of other and narrow-mindedness. It also preserves old feuds between groups that have no bearing on those now living. What is the point of commemorating some distant victory if not to stick it to the other group that lost? Many cultures contain myths and falsehoods which have been passed down from less well informed generations. These obsolete beliefs stifle progress by making questioning a social taboo. In some cultures such questioning can be severely punished, even by death.
Being forced to adopt the cultural norms of the group you were born into also encourages discrimination. If your parents brought you up to be an Irish-American than that is what others will tend to identify you as. But suppose you prefer Spanish or French culture? Why should an accident of birth brand you involuntarily?
Some societies try to teach "multi-culturalism" believing that this will lessen prejudice. But the students sill look at it from within the framework of their own background. It's like observing the strange natives on some anthropological expedition - curious, but not for me.
There have been some shifts in the US. For example few people nowadays have the same attitudes towards Asian-Americans as existed 100 years ago. Many Asian-Americans carry little of the cultural baggage of their ancestor's home countries. They have become "white" Americans. I think a similar thing may be happening in the EU. Young people who travel from one home country to another tend to become more cosmopolitan and less provincial. There ability to speak several languages also helps.
I realize that putting changes into practice is a near-impossible task, but I think it is more a case of changing attitudes than of actual steps. Children are always going to learn the culture and language that their parents speak (although second generation immigrants tend to do this less, and by the third generation many can't speak their grandparent's original language). Still cultural practices are passed down as part of the "heritage".
In the schools mutli-culturalism needs to be decoupled from being based upon ethnic distinctions and replaced with the teaching of more universal characteristics.
Many people grow up and explicitly reject their parent's background, but the prejudice of society still tries to force them into these categories. I claim that you are what your enemies call you. The harm can be most easily seen from the extreme example: the most acculturated and secular "Jewish" Germans still ended up in the ovens.
It is a sorry commentary on human nature if the only way people can define themselves is by pushing their prejudices onto their children.
Global Waming as an Economic Issue
Global Warming as an Economic Issue
As part of the ongoing debate about global warming there have been various studies which have tried to cast this as a problem in economics. The most comprehensive of these was created by the British Economist Nicholas Stern. The full report is available here along with various summaries.
For our purposes this is all that is required:
The way these numbers were arrived at was by doing a present value calculation. This is standard in economic discussions, how much needs to be saved now to provide the desired amount in the future.
One of the sharpest critics of Stern is the economist William Nordhaus who has made a mini industry out of commenting on the report. One of his fundamental complaints is that Stern uses an incorrect rate for future economic growth. Here's one version of Nordhaus' explanation:
This is in reply to critics who disagreed with a book review in the New York Review of Books by physicist Freeman Dyson. Here's the review: The Question of Global Warming
There have been other approaches to this problem, one of the most novel is by economist Martin Weitzman: On Modeling and Interpreting the Economics of Catastrophic Climate Change (PDF), where he argues that the economic impact of a catastrophe is so great that even if the chances of it are extremely remote steps should be taken to avoid it.
Nordhaus has managed to shift the debate to one over some parameters in an economic model.I don't think such calculations (and even assuming that compound interest will be the dominant financial growth mechanism 200 years hence) is a reliable measure for such long range projections. There are some basic assumption embedded in both papers. The idea that economies grow at an average rate has only been true in developed countries since the rise of the industrial revolution. It still isn't true in many parts of the undeveloped world, where societies tend to be relatively static (or were until modern medicine started dropping the death rate faster than the birth rate declined). There is no reason to expect that a trend that has affected 25% of the world's population for only 300 years will continue to be the norm for the next 200 years.
Second the idea of compound interest which is the basis of calculating present value only makes sense in a capitalist economy. In this type of system money is invested and what is required in return is interest. Some societies still prohibit the charging of interest. In addition compounding implies that interest received periodically can be reinvested at the same terms as the original capital. This assumes continual growth, something which is not a forgone conclusion as we enter into an era of resource shortages.
Without assumed continual growth, both arguments fall apart. We can predict nothing about how money set aside now will be used in 200 years, or even if it is possible to preserve capital over such a long period of time. The wealth of the French Aristocracy didn't last. The British taxed away the wealth of the landed gentry during the 20th Century. Things considered of high value in one era have become valueless in another. Where one would put the money that is supposed to supply the funds to ameliorate the effects of climate change in the distant future is not a simple task.
A series of recent events have shown that there is value in taking steps now to fix the known risks faced now. The effects of the series of hurricanes, earthquakes and other natural disasters could have all been minimized by adopting adequate civil engineering projects.Nordhaus things that combating the eventual effects of climate change now will only consist of reducing economic activity.
In other words a simple economic fix will provide an incentive towards efficiency. In fact he opposes large-scale efforts altogether:
This is not planning, it is wishful thinking. Large enterprises undertake focused R&D all the time, that's why they have research labs. When the task is too large then government funding is necessary. The idea that big ideas will occur spontaneously in someone's garage is hopelessly out of date. Societies have to decide on goals and then put the resources into achieving them. This can be levees along the Gulf Coast, earthquake resistant buildings, or looking for a cure for cancer. The libertarians in the US have promoted a free-market idea that disfavors centralized planning. As a consequence the only government central planning that takes place in the US is in the military sector. Corporations, of course, do central planning, that's why they exist, but their motives are to make a profit not save the world.
Even if it were possible to agree on a discount rate that would hold over the next 50 or 200 years, that still avoids facing the moral issues. Depending upon "entrepeneurship" to address social problems doesn't work, that's why we have destroyed cities and millions living in poverty. The moral course is to work to alleviate suffering now, and not hope that something will come along in the future.
Why can't libertarians understand that today's suffering can't wait?
As part of the ongoing debate about global warming there have been various studies which have tried to cast this as a problem in economics. The most comprehensive of these was created by the British Economist Nicholas Stern. The full report is available here along with various summaries.
For our purposes this is all that is required:
Using the results from formal economic models, the Review estimates that if we don’t act, the overall costs and risks of climate change will be equivalent to losing at least 5% of global GDP each year, now and forever. If a wider range of risks and impacts is taken into account, the estimates of damage could rise to 20% of GDP or more.
The way these numbers were arrived at was by doing a present value calculation. This is standard in economic discussions, how much needs to be saved now to provide the desired amount in the future.
One of the sharpest critics of Stern is the economist William Nordhaus who has made a mini industry out of commenting on the report. One of his fundamental complaints is that Stern uses an incorrect rate for future economic growth. Here's one version of Nordhaus' explanation:
Based on historical studies and projections, the inflation-corrected return on investment has been in the range of 3 to 6 percent per year depending upon time period and risk. In my modeling, I have used a 4 percent discount rate. Applying this discount rate to the trust would lead you to propose a present payment of x = $39,204. Over two hundred years, as the interest on that sum is paid and compounded, the value of the trust would reach $100 million.
This is in reply to critics who disagreed with a book review in the New York Review of Books by physicist Freeman Dyson. Here's the review: The Question of Global Warming
There have been other approaches to this problem, one of the most novel is by economist Martin Weitzman: On Modeling and Interpreting the Economics of Catastrophic Climate Change (PDF), where he argues that the economic impact of a catastrophe is so great that even if the chances of it are extremely remote steps should be taken to avoid it.
Nordhaus has managed to shift the debate to one over some parameters in an economic model.I don't think such calculations (and even assuming that compound interest will be the dominant financial growth mechanism 200 years hence) is a reliable measure for such long range projections. There are some basic assumption embedded in both papers. The idea that economies grow at an average rate has only been true in developed countries since the rise of the industrial revolution. It still isn't true in many parts of the undeveloped world, where societies tend to be relatively static (or were until modern medicine started dropping the death rate faster than the birth rate declined). There is no reason to expect that a trend that has affected 25% of the world's population for only 300 years will continue to be the norm for the next 200 years.
Second the idea of compound interest which is the basis of calculating present value only makes sense in a capitalist economy. In this type of system money is invested and what is required in return is interest. Some societies still prohibit the charging of interest. In addition compounding implies that interest received periodically can be reinvested at the same terms as the original capital. This assumes continual growth, something which is not a forgone conclusion as we enter into an era of resource shortages.
Without assumed continual growth, both arguments fall apart. We can predict nothing about how money set aside now will be used in 200 years, or even if it is possible to preserve capital over such a long period of time. The wealth of the French Aristocracy didn't last. The British taxed away the wealth of the landed gentry during the 20th Century. Things considered of high value in one era have become valueless in another. Where one would put the money that is supposed to supply the funds to ameliorate the effects of climate change in the distant future is not a simple task.
A series of recent events have shown that there is value in taking steps now to fix the known risks faced now. The effects of the series of hurricanes, earthquakes and other natural disasters could have all been minimized by adopting adequate civil engineering projects.Nordhaus things that combating the eventual effects of climate change now will only consist of reducing economic activity.
The current international approach in the Kyoto Protocol will be economically costly and have virtually no impact on climate change. In my view, the best approach is also one that is relatively simple - internationally harmonized carbon taxes.
In other words a simple economic fix will provide an incentive towards efficiency. In fact he opposes large-scale efforts altogether:
We should avoid thinking that we need a climate Manhattan Project to develop the key technology. It seems likely that new climate-friendly technologies will be the cumulative outcome of a multitude of inventions, many coming from small inventors, and originating in unrelated fields.
The best way to encourage the process of radical invention is to ensure an economic environment that is supportive of innovation and entrepreneurship.
This is not planning, it is wishful thinking. Large enterprises undertake focused R&D all the time, that's why they have research labs. When the task is too large then government funding is necessary. The idea that big ideas will occur spontaneously in someone's garage is hopelessly out of date. Societies have to decide on goals and then put the resources into achieving them. This can be levees along the Gulf Coast, earthquake resistant buildings, or looking for a cure for cancer. The libertarians in the US have promoted a free-market idea that disfavors centralized planning. As a consequence the only government central planning that takes place in the US is in the military sector. Corporations, of course, do central planning, that's why they exist, but their motives are to make a profit not save the world.
Even if it were possible to agree on a discount rate that would hold over the next 50 or 200 years, that still avoids facing the moral issues. Depending upon "entrepeneurship" to address social problems doesn't work, that's why we have destroyed cities and millions living in poverty. The moral course is to work to alleviate suffering now, and not hope that something will come along in the future.
Why can't libertarians understand that today's suffering can't wait?
Thursday, September 18, 2008
Too big to fail
As the "capitalists" in the GOP proceed with the largest nationalization in US history, it seems a good time to talk about market dominance.
In the past six months a half dozen financial firms have been effectively nationalized. In order to disguise this a variety of means have been used, such as lending money to the acquiring public firm, or as yesterday with AIG taking an ownership position via warrants.
In the UK, which has a tradition of nationalizing (and then re-privatizing) a takeover is done straightforwardly. That's what happened with the Northern Rock Bank. But in the US we have to maintain the fiction that private enterprise does everything better than government and, therefore, subsidies and bailouts have to be called something else.
The reason given for these actions is that these firms are "too big to fail". I claim that any firm that is too big to fail is just too big, period.
Since Reagan there has not been any semblance of anti-trust legislation and firms have been allowed to merge and buy each other up, willy-nilly. For awhile firms used to justify their actions by claiming that this would lead to improved efficiency, but the stock market usually sells off the stock of the acquiring firm. Investors know that the results will lead to lower profits. So why do they do it? I claim that the CEO's do it as a way to keep score (as they do with their salaries). "I'm running a $XXX billion firm" is oneupmanship for the parasite class these days.
If a firm is too big to fail, then it is too big to exist.
There are two simple reforms (well, simple to state). First, firms cannot own other firms. The parents are just conglomerates and provide no added benefits. In fact top management can't even follow the details of all their subsidiaries.
Second, when a firm gets over a certain size it has to be split up. Look at the benefits from breaking up AT&T. It's true it was a monopoly, but once it was split new businesses emerged and we got everything from WiFi to cell phones. Now that the telecom business has been allowed to reconsolidate the US has fallen behind in this area.
Not only are big firms inefficient, they wield too much political power and they control too big a segment of the economic pie.
The nationalization of failing firms has just made explicit what has been the reality for several decades now. The US is not a capitalist economy, it is a corporatist-syndicalist one similar to what Mussolini tried to set up. The parallels with the rise of an internal secret police function should give us pause as well.
Walmart may not have a traditional monopoly position in the retail market, but its size allows it to set the pace and distorts the entire consumer space. It's not just that they force ethical firms to compete with an unethical one, its that they accustom everyone to the idea that unethical business practices are acceptable and a way to financial success.
This is the "Christian" lesson that the Walton's are teaching the good people of middle America. We have become a nation of cynics as a result. Shame on them.
In the past six months a half dozen financial firms have been effectively nationalized. In order to disguise this a variety of means have been used, such as lending money to the acquiring public firm, or as yesterday with AIG taking an ownership position via warrants.
In the UK, which has a tradition of nationalizing (and then re-privatizing) a takeover is done straightforwardly. That's what happened with the Northern Rock Bank. But in the US we have to maintain the fiction that private enterprise does everything better than government and, therefore, subsidies and bailouts have to be called something else.
The reason given for these actions is that these firms are "too big to fail". I claim that any firm that is too big to fail is just too big, period.
Since Reagan there has not been any semblance of anti-trust legislation and firms have been allowed to merge and buy each other up, willy-nilly. For awhile firms used to justify their actions by claiming that this would lead to improved efficiency, but the stock market usually sells off the stock of the acquiring firm. Investors know that the results will lead to lower profits. So why do they do it? I claim that the CEO's do it as a way to keep score (as they do with their salaries). "I'm running a $XXX billion firm" is oneupmanship for the parasite class these days.
If a firm is too big to fail, then it is too big to exist.
There are two simple reforms (well, simple to state). First, firms cannot own other firms. The parents are just conglomerates and provide no added benefits. In fact top management can't even follow the details of all their subsidiaries.
Second, when a firm gets over a certain size it has to be split up. Look at the benefits from breaking up AT&T. It's true it was a monopoly, but once it was split new businesses emerged and we got everything from WiFi to cell phones. Now that the telecom business has been allowed to reconsolidate the US has fallen behind in this area.
Not only are big firms inefficient, they wield too much political power and they control too big a segment of the economic pie.
The nationalization of failing firms has just made explicit what has been the reality for several decades now. The US is not a capitalist economy, it is a corporatist-syndicalist one similar to what Mussolini tried to set up. The parallels with the rise of an internal secret police function should give us pause as well.
Walmart may not have a traditional monopoly position in the retail market, but its size allows it to set the pace and distorts the entire consumer space. It's not just that they force ethical firms to compete with an unethical one, its that they accustom everyone to the idea that unethical business practices are acceptable and a way to financial success.
This is the "Christian" lesson that the Walton's are teaching the good people of middle America. We have become a nation of cynics as a result. Shame on them.
Econometrics as Signal Processing
The purpose of this essay is to explore how the techniques used in economic analysis compare to those used in other disciplines. As in other social sciences, data in economics is observational, rather than being obtained from controlled experiments. Analyzing such data can be cast as an information processing problem.
In this essay, I will draw a parallel between the two disciplines. The fundamental assumption is that any finite data set can be considered a "message" or "signal", embedded in noise. The noise in this case is additional information which is not relevant to the hypothesis under consideration. Extracting information from this "noise" then becomes analogous to techniques used in signal processing.
Shannon [1948] established the perspective that within a given message there is a fixed amount of (unknown) information. Extracting it requires not only finding it but recognizing it once found. Since no recovery method will be perfect, and since the noise will always corrupt the message in some fashion, there also needs to be a way to determine how close the information is to the source signal and how much residual distortion remains.
In conventional signal processing, such as that used for telephony or storing music or video, there are physiological and psychological criteria that have been developed experimentally which determine acceptability. Many modern compression schemes deliberately throw away data, but the message is still found satisfactory due to the relatively low requirements of people when receiving these sorts of messages. For example, audio compression is based around the limits of human hearing; information which is determined to be aesthetically unimportant is discarded to save space.
In the case of scientific observational data, there are well-established rules that define acceptability when the "message" is extracted. A common example is curve fitting, where goodness of fit is determined by least squares or a similar test. Simultaneously, various confidence levels are generated such as variance. The use to which the message will be put determines whether a given level of confidence is acceptable or not. In controlled experiments, differences between control and experimental data are the message, and depending on confidence in the strength of the signal (as measured by a variety of standard statistical tests), the hypothesis may or may not be confirmed at the acceptable level of confidence.
There are many methods used to extract the message. I'll just mention a few.
1. Filtering. If the noise is known to be in one area while the signal is in another, filtering can be used to separate them. There are three popular types of filters, low pass, high pass and band pass. Low pass filters are used in telephony to eliminate high-frequency hiss above the speech range. High pass filters are used to eliminate low-frequency hum from poorly shielded public address systems. Band pass filters are used to capture a single radio station along the dial. Digital signal processing has made the implementation of much more highly complex filters feasible; filters can also adapt to a changing signal in real time. Such systems are used to eliminate feedback at live concerts.
2. Signal averaging. This is useful when the message is repeated. By gathering multiple copies of the message, the signal/noise ratio is increased. Astronomical observation uses this, as does radar processing.
3. Subtracting out noise. If the characteristics of noise can be fairly well determined the noise can be subtracted from the message. Modern digital cameras take a blank picture (which should only contain noise from the camera's sensors) and subtract it from the desired image.
4. Predictive reconstruction. Many messages tend to vary slowly so a loss of part of the signal can be reconstructed from adjacent information, whether adjacent in space, time, frequency, or other dimension. Digital TV sets have frame buffers which compare one frame to the next. If there is a loss of signal, prior frames are used to estimate the missing information. Since the criteria for acceptability of a moving image is low, this works well as long as the interruptions are relatively short. Humans use this when processing speech. Much speech is highly redundant and missing a word or two can usually be compensated for by comparing the message to expectations of what the words should have been.
There is a fundamental difference between the first method and the others. In the first case, the technique is to remove information from the data set. It is hoped that more irrelevant information (noise) is removed than message information. For example, when filtering a particular radio station out of the electro-magnetic spectrum, the result has less information than before, but filtering has made the signal more observable.
In the other cases, external information is fed into the system to make the data set larger. It is hoped that the extra information is relevant to the message. However, because of this addition, it is important to account for potential bias in the result.
Examining statistical methods through the lens of information theory
Let's look at some common statistical methods, especially as treated by econometrics. I can't cover all the techniques that have been developed, but once I have shown the pattern it should be possible to make the proper analogies between the two disciplines.
All techniques which subtract selected data are type 1, employing filtering to enhance the signal/noise. A common approach is to use, say, a lagged, three month average when looking at employment data or the like. This is a form of high pass filtering. Short term fluctuations are filtered out leaving the desired long wave signal - the longer trend.
Making seasonal adjustments to data is also a form of filtering; in this case low pass. The repetitive shifts throughout the year are like hum, it is periodic (or quasi-periodic) enough to be filtered out.
Sliding windows and sub-sampling are type 2, signal averaging. A new data set is collected which is similar to the prior one in the important respects and various averaging techniques are used to remove the differences leaving the desired information. Whether old samples are dropped off the end or the sample size is increased should depend upon some knowledge of how much the message is changing. Out-of-sample forecasting is also a type of signal averaging. The new samples can be considered a new instance of the message.
Techniques using dummy variables are type 3, analogous to subtracting out noise. There is "information" present which is known to be irrelevant and the characteristics are also reasonably well understood so that it can be well described. The dummy variables can be used to "subtract" this from the calculations.
Bayesian techniques are type 4, a variety of predictive reconstruction. The idea is that there is some information external to the data set which is known independently about the environment. Adding this in improves the signal/noise ratio. An event that occurs in the data may be strongly correlated with an event that is not; for example, a data set tracking the relation between river flooding and rainfall may not include riverbed construction events, but the correlation between the two make it appropriate to introduce this additional information. Adding in the 'missing' information adds only a small amount of noise (the inverse of the correlation), and may improve the resulting signal. Similarly, using an independently derived model, whether hypothetical or based upon previous cases, adds information to the system. If the extracted message conforms closely to the model then it increases the probability that it is the "right" message. However, as the difference between the predictive model and reality can be difficult to estimate, expectations may, therefore, distort the results.
As with signal processing there is a limit to how much information can be extracted from any "message", the entropy of the system. Another analogy will be useful. Many people have seen the crime shows on TV where the detective turns to the technician and says about an image on the screen "can you enhance that"? Poof! the evil doer is revealed. In the real world there is a simple law of optics, identical in formulation to Shannon's measurement of information complexity which determines the resolution of a given image. Attempting to enhance images beyond this Nyquist limit produces no additional information. An information-contributing technique blowing it up larger or sharpening it may make it easier to view, but it doesn't add any new information. In this case the absolute limit is determined by the diameter of the lens (assuming it has no other aberrations) and the wavelength of light. To get more detail you need to change one or the other. That's why electron microscopes don't use light to resolve fine detail, instead using electrons (which have a much smaller wavelength).
The same thing is true with finite data sets There is only so much information available within the data set. It is tempting to try to find patterns that conform to desired expectations, but this is done at the expense of certainty. A common case of this is when a small number of current data points are combined with model data to extrapolate future results, as is routinely done in, e.g., population predictions. Both of these methods (introducing a model and extrapolating) introduce uncertainty and the possibility for bias.
However, in many cases the claims made are not subjected to a rigorous enough error analysis, and low-confidence results are presented as truth. Is it adequate to say that the results may be so and so with only a 75% confidence level? This is an important consideration, and matters when defining policy, but is outside the scope of information theory.
One of the issues about a data set is whether it is time-based or not. While there is often a distinction made between them, time-based and non-time-based data sets are equally amenable to information analysis. From an information point of view, time is just another dimension. For example, one could be trying to make a prediction on the effect of change in consumption of some commodity over time compared with some behavioral characteristic - say chocolate consumption vs weight gain. Using time based analysis one could using a sliding window to create a series of data sets which incorporate out-of-sample material. This analysis is conceptually no different than gathering a data set from a specific geographic region, and then considering the out-of-sample data as coming from a different region.
Curve fitting is a popular technique in the social sciences, for both interpolation and extrapolation of data. It can also be used as a form of band-pass filtering. Any finite signal can be decomposed into a sum of orthogonal functions. The most common sets are the polynomials and the trigonometric functions (sine or cosine). For example, a popular method of finding out the acoustical properties of a performance space is to capture a sharp sound, as from a gunshot, and then decompose it into a harmonic series using a Fast Fourier Transform (FFT). This produces a frequency response curve and a reverberation curve for the space depending on the method used.
This technique has been extended to the spatial domain, allowing for analysis of complex optical paths in lenses. Instead of the painstaking prior methods of establishing image quality, which required imaging a test chart of increasing finer lines, a single slanted knife edge is used and the resulting image is scanned digitally. The knife edge is equivalent to the acoustic spike; both are step functions. The resulting image is decomposed using FFT into a series of spatial frequencies which translate into resolving power and contrast at each frequency. I'm not aware of an econometric equivalent of impulse or step signal testing.
Recognizing the signal (or lack thereof)
Another issue concerns the problem is that the desired message may not be in the data set, or may not be complete. This is not the same as a simple data loss or a sampling error, but is like looking for the needle in the wrong haystack. To go back to my chocolate example, perhaps chocolate is a factor, but it is peanut butter which most influences weight. We have measured the wrong thing, and while we may get a correlation it may not be the most important. Recognizing the message is as vital as finding it.
I think this problem is much more common in the social sciences than is appreciated. With so many factors present in the real world. one has to make assumptions about what to measure or even what one can measure. This involves a bit of assuming the answer, and Bayesian statistics won't help if the possible choices are all bad ones.
Drug testing suffers from the needle in the haystack problem. Many new drugs only affect a small percentage of people with a given condition. In order to observe the effect the sample size must be large enough. This is referred to as the number needed to treat (NNT). Suppose a given statin helps prevent heart attacks in three out of 100 people who take it; then there is a reasonably large chance that a sample of only 100 will show no positive effect. The inverse case is even worse. Suppose, at the same time, the drug adversely affects one in 1000. The positive effect will be found in a sample of 1000, with a good degree of confidence, but not the side effects. Incorrect choice of the population size can introduce difficult-to-detect bias such as by selecting a population size unlikely to exhibit rare but serious undesired effects. However, if the 'cost' of these rare events is sufficiently high, the results of the experiment may not lead to an acceptable real-world policy This has led to serious problems such as with the recall of the drug Vioxx due to rare but potentially lethal side effects. In social sciences limited to observational data, the experimenter does not get to select the population size, and so this effect may occur involuntarily; the analyst may not even consider it a source of bias. The SETI project which is looking for signs of extra-terrestrial life uses advanced signal processing techniques, but suffers from this problem.
Summary
The problem of extracting a coherent "message" from social science data is not fundamentally different from the same task in other disciplines. As with many fields of study, insularity and the development of a unique terminology has made knowledge transference less efficient than it might be.
These techniques can be used to reveal the possibility of hidden bias in findings, in the form of signal added by the analyst. This hidden bias may be an inadvertent product of analysis, or may have been introduced consciously or unconsciously as a result of a specific agenda. Analysts in some sciences, such as physics, are held to a high standard of impartiality because of the paradoxical situation where those closest to the data are the most likely to possess an unconscious bias toward the outcome. This makes it very difficult for data analysts to see or believe that they may be introducing bias. To assist in producing pure results, techniques such as clarity of methods, sharing of data, and independent verification of results are crucial to producing truly scientific results.
Identifying and eliminating bias in the social sciences is crucial. Because the results of analysis in the social sciences, especially economics and sociology, are used to understand civil and fiscal issues, and construct public policy, errors or bias in social science findings can affect the lives and fortunes of millions of people. However, the difficulty of identifying bias leads to abuses, as those with a predefined agenda misuse the techniques to obtain results which support their position. The most common misuses seem to involve biased selection of data, either in the type selected or in the range used for analysis. This is a type of filtering, and can completely distort or eliminate the original signal; but because of the nature of the data tampering, it is invisible unless observers have access to the original, unfiltered data set.
Some professions are diligent in monitoring this type of scientific misconduct, but this diligence is less visible, and perhaps even more important, in the social sciences. Because of the difficulty of running independent, confirmatory experiments, social science results must be scrutinized even more carefully than those in sciences more amenable to experimental exploration.
There have also been misuses over methodology, but this usually centers on the choices one makes for various parameters. The fact that some ideologues claim more certainty than is warranted doesn't help either, and by association blackens the reputation of those who are more scrupulous. Maybe the fact that many of the ideologues are employed by organizations which profess the same viewpoints makes attempts at censure difficult to enforce, but this only highlights the importance of accurate, unbiased results in the social sciences.
Thus we have seen that econometrics and statistical analysis used in the social science are really equivalent to other information processing techniques. Advanced modeling is simply a way to add in external information believed to be relevant, and the type and complexity of the model do not determine how much this additional signal improves the reliability of the analysis.
In this essay, I will draw a parallel between the two disciplines. The fundamental assumption is that any finite data set can be considered a "message" or "signal", embedded in noise. The noise in this case is additional information which is not relevant to the hypothesis under consideration. Extracting information from this "noise" then becomes analogous to techniques used in signal processing.
Shannon [1948] established the perspective that within a given message there is a fixed amount of (unknown) information. Extracting it requires not only finding it but recognizing it once found. Since no recovery method will be perfect, and since the noise will always corrupt the message in some fashion, there also needs to be a way to determine how close the information is to the source signal and how much residual distortion remains.
In conventional signal processing, such as that used for telephony or storing music or video, there are physiological and psychological criteria that have been developed experimentally which determine acceptability. Many modern compression schemes deliberately throw away data, but the message is still found satisfactory due to the relatively low requirements of people when receiving these sorts of messages. For example, audio compression is based around the limits of human hearing; information which is determined to be aesthetically unimportant is discarded to save space.
In the case of scientific observational data, there are well-established rules that define acceptability when the "message" is extracted. A common example is curve fitting, where goodness of fit is determined by least squares or a similar test. Simultaneously, various confidence levels are generated such as variance. The use to which the message will be put determines whether a given level of confidence is acceptable or not. In controlled experiments, differences between control and experimental data are the message, and depending on confidence in the strength of the signal (as measured by a variety of standard statistical tests), the hypothesis may or may not be confirmed at the acceptable level of confidence.
There are many methods used to extract the message. I'll just mention a few.
1. Filtering. If the noise is known to be in one area while the signal is in another, filtering can be used to separate them. There are three popular types of filters, low pass, high pass and band pass. Low pass filters are used in telephony to eliminate high-frequency hiss above the speech range. High pass filters are used to eliminate low-frequency hum from poorly shielded public address systems. Band pass filters are used to capture a single radio station along the dial. Digital signal processing has made the implementation of much more highly complex filters feasible; filters can also adapt to a changing signal in real time. Such systems are used to eliminate feedback at live concerts.
2. Signal averaging. This is useful when the message is repeated. By gathering multiple copies of the message, the signal/noise ratio is increased. Astronomical observation uses this, as does radar processing.
3. Subtracting out noise. If the characteristics of noise can be fairly well determined the noise can be subtracted from the message. Modern digital cameras take a blank picture (which should only contain noise from the camera's sensors) and subtract it from the desired image.
4. Predictive reconstruction. Many messages tend to vary slowly so a loss of part of the signal can be reconstructed from adjacent information, whether adjacent in space, time, frequency, or other dimension. Digital TV sets have frame buffers which compare one frame to the next. If there is a loss of signal, prior frames are used to estimate the missing information. Since the criteria for acceptability of a moving image is low, this works well as long as the interruptions are relatively short. Humans use this when processing speech. Much speech is highly redundant and missing a word or two can usually be compensated for by comparing the message to expectations of what the words should have been.
There is a fundamental difference between the first method and the others. In the first case, the technique is to remove information from the data set. It is hoped that more irrelevant information (noise) is removed than message information. For example, when filtering a particular radio station out of the electro-magnetic spectrum, the result has less information than before, but filtering has made the signal more observable.
In the other cases, external information is fed into the system to make the data set larger. It is hoped that the extra information is relevant to the message. However, because of this addition, it is important to account for potential bias in the result.
Examining statistical methods through the lens of information theory
Let's look at some common statistical methods, especially as treated by econometrics. I can't cover all the techniques that have been developed, but once I have shown the pattern it should be possible to make the proper analogies between the two disciplines.
All techniques which subtract selected data are type 1, employing filtering to enhance the signal/noise. A common approach is to use, say, a lagged, three month average when looking at employment data or the like. This is a form of high pass filtering. Short term fluctuations are filtered out leaving the desired long wave signal - the longer trend.
Making seasonal adjustments to data is also a form of filtering; in this case low pass. The repetitive shifts throughout the year are like hum, it is periodic (or quasi-periodic) enough to be filtered out.
Sliding windows and sub-sampling are type 2, signal averaging. A new data set is collected which is similar to the prior one in the important respects and various averaging techniques are used to remove the differences leaving the desired information. Whether old samples are dropped off the end or the sample size is increased should depend upon some knowledge of how much the message is changing. Out-of-sample forecasting is also a type of signal averaging. The new samples can be considered a new instance of the message.
Techniques using dummy variables are type 3, analogous to subtracting out noise. There is "information" present which is known to be irrelevant and the characteristics are also reasonably well understood so that it can be well described. The dummy variables can be used to "subtract" this from the calculations.
Bayesian techniques are type 4, a variety of predictive reconstruction. The idea is that there is some information external to the data set which is known independently about the environment. Adding this in improves the signal/noise ratio. An event that occurs in the data may be strongly correlated with an event that is not; for example, a data set tracking the relation between river flooding and rainfall may not include riverbed construction events, but the correlation between the two make it appropriate to introduce this additional information. Adding in the 'missing' information adds only a small amount of noise (the inverse of the correlation), and may improve the resulting signal. Similarly, using an independently derived model, whether hypothetical or based upon previous cases, adds information to the system. If the extracted message conforms closely to the model then it increases the probability that it is the "right" message. However, as the difference between the predictive model and reality can be difficult to estimate, expectations may, therefore, distort the results.
As with signal processing there is a limit to how much information can be extracted from any "message", the entropy of the system. Another analogy will be useful. Many people have seen the crime shows on TV where the detective turns to the technician and says about an image on the screen "can you enhance that"? Poof! the evil doer is revealed. In the real world there is a simple law of optics, identical in formulation to Shannon's measurement of information complexity which determines the resolution of a given image. Attempting to enhance images beyond this Nyquist limit produces no additional information. An information-contributing technique blowing it up larger or sharpening it may make it easier to view, but it doesn't add any new information. In this case the absolute limit is determined by the diameter of the lens (assuming it has no other aberrations) and the wavelength of light. To get more detail you need to change one or the other. That's why electron microscopes don't use light to resolve fine detail, instead using electrons (which have a much smaller wavelength).
The same thing is true with finite data sets There is only so much information available within the data set. It is tempting to try to find patterns that conform to desired expectations, but this is done at the expense of certainty. A common case of this is when a small number of current data points are combined with model data to extrapolate future results, as is routinely done in, e.g., population predictions. Both of these methods (introducing a model and extrapolating) introduce uncertainty and the possibility for bias.
However, in many cases the claims made are not subjected to a rigorous enough error analysis, and low-confidence results are presented as truth. Is it adequate to say that the results may be so and so with only a 75% confidence level? This is an important consideration, and matters when defining policy, but is outside the scope of information theory.
One of the issues about a data set is whether it is time-based or not. While there is often a distinction made between them, time-based and non-time-based data sets are equally amenable to information analysis. From an information point of view, time is just another dimension. For example, one could be trying to make a prediction on the effect of change in consumption of some commodity over time compared with some behavioral characteristic - say chocolate consumption vs weight gain. Using time based analysis one could using a sliding window to create a series of data sets which incorporate out-of-sample material. This analysis is conceptually no different than gathering a data set from a specific geographic region, and then considering the out-of-sample data as coming from a different region.
Curve fitting is a popular technique in the social sciences, for both interpolation and extrapolation of data. It can also be used as a form of band-pass filtering. Any finite signal can be decomposed into a sum of orthogonal functions. The most common sets are the polynomials and the trigonometric functions (sine or cosine). For example, a popular method of finding out the acoustical properties of a performance space is to capture a sharp sound, as from a gunshot, and then decompose it into a harmonic series using a Fast Fourier Transform (FFT). This produces a frequency response curve and a reverberation curve for the space depending on the method used.
This technique has been extended to the spatial domain, allowing for analysis of complex optical paths in lenses. Instead of the painstaking prior methods of establishing image quality, which required imaging a test chart of increasing finer lines, a single slanted knife edge is used and the resulting image is scanned digitally. The knife edge is equivalent to the acoustic spike; both are step functions. The resulting image is decomposed using FFT into a series of spatial frequencies which translate into resolving power and contrast at each frequency. I'm not aware of an econometric equivalent of impulse or step signal testing.
Recognizing the signal (or lack thereof)
Another issue concerns the problem is that the desired message may not be in the data set, or may not be complete. This is not the same as a simple data loss or a sampling error, but is like looking for the needle in the wrong haystack. To go back to my chocolate example, perhaps chocolate is a factor, but it is peanut butter which most influences weight. We have measured the wrong thing, and while we may get a correlation it may not be the most important. Recognizing the message is as vital as finding it.
I think this problem is much more common in the social sciences than is appreciated. With so many factors present in the real world. one has to make assumptions about what to measure or even what one can measure. This involves a bit of assuming the answer, and Bayesian statistics won't help if the possible choices are all bad ones.
Drug testing suffers from the needle in the haystack problem. Many new drugs only affect a small percentage of people with a given condition. In order to observe the effect the sample size must be large enough. This is referred to as the number needed to treat (NNT). Suppose a given statin helps prevent heart attacks in three out of 100 people who take it; then there is a reasonably large chance that a sample of only 100 will show no positive effect. The inverse case is even worse. Suppose, at the same time, the drug adversely affects one in 1000. The positive effect will be found in a sample of 1000, with a good degree of confidence, but not the side effects. Incorrect choice of the population size can introduce difficult-to-detect bias such as by selecting a population size unlikely to exhibit rare but serious undesired effects. However, if the 'cost' of these rare events is sufficiently high, the results of the experiment may not lead to an acceptable real-world policy This has led to serious problems such as with the recall of the drug Vioxx due to rare but potentially lethal side effects. In social sciences limited to observational data, the experimenter does not get to select the population size, and so this effect may occur involuntarily; the analyst may not even consider it a source of bias. The SETI project which is looking for signs of extra-terrestrial life uses advanced signal processing techniques, but suffers from this problem.
Summary
The problem of extracting a coherent "message" from social science data is not fundamentally different from the same task in other disciplines. As with many fields of study, insularity and the development of a unique terminology has made knowledge transference less efficient than it might be.
These techniques can be used to reveal the possibility of hidden bias in findings, in the form of signal added by the analyst. This hidden bias may be an inadvertent product of analysis, or may have been introduced consciously or unconsciously as a result of a specific agenda. Analysts in some sciences, such as physics, are held to a high standard of impartiality because of the paradoxical situation where those closest to the data are the most likely to possess an unconscious bias toward the outcome. This makes it very difficult for data analysts to see or believe that they may be introducing bias. To assist in producing pure results, techniques such as clarity of methods, sharing of data, and independent verification of results are crucial to producing truly scientific results.
Identifying and eliminating bias in the social sciences is crucial. Because the results of analysis in the social sciences, especially economics and sociology, are used to understand civil and fiscal issues, and construct public policy, errors or bias in social science findings can affect the lives and fortunes of millions of people. However, the difficulty of identifying bias leads to abuses, as those with a predefined agenda misuse the techniques to obtain results which support their position. The most common misuses seem to involve biased selection of data, either in the type selected or in the range used for analysis. This is a type of filtering, and can completely distort or eliminate the original signal; but because of the nature of the data tampering, it is invisible unless observers have access to the original, unfiltered data set.
Some professions are diligent in monitoring this type of scientific misconduct, but this diligence is less visible, and perhaps even more important, in the social sciences. Because of the difficulty of running independent, confirmatory experiments, social science results must be scrutinized even more carefully than those in sciences more amenable to experimental exploration.
There have also been misuses over methodology, but this usually centers on the choices one makes for various parameters. The fact that some ideologues claim more certainty than is warranted doesn't help either, and by association blackens the reputation of those who are more scrupulous. Maybe the fact that many of the ideologues are employed by organizations which profess the same viewpoints makes attempts at censure difficult to enforce, but this only highlights the importance of accurate, unbiased results in the social sciences.
Thus we have seen that econometrics and statistical analysis used in the social science are really equivalent to other information processing techniques. Advanced modeling is simply a way to add in external information believed to be relevant, and the type and complexity of the model do not determine how much this additional signal improves the reliability of the analysis.