Macro- and Microdata Analyses and Their Integration, Nancy D. Ruggles and Richard Ruggles
Foreword by Edward N. Wolff
Edward Elgar Publishing, 1999

Macro- and Microdata Analyses and Their Integration
Click to enlarge

In this book, Nancy and Richard Ruggles demonstrate their unique grasp of the measurement and analysis of macro- and microdata and elucidate ways of integrating the two data sets.

Their analysis of macrodata is used to examine the economic growth of the United States from the 1920s to the present day. They focus particularly on recession and recovery between 1920 and 1974 and the measurement of shortrun economic growth. They also examine the measurement of saving, investment, and capital formation in the United States. On a microeconomic level, they analyze economic intelligence in World War II, offer a study of fertility in the United States in the pre-war era, and analyze longitudinal establishment data. Finally they integrate the two approaches to provide a method of obtaining a more complete picture of social and economic performance.

Nancy D. Ruggles was formerly a Senior Research Economist with the Institute for Economic and Social Policy Studies at Yale University, and Richard Ruggles is the Stanley Resor Professor of Economics at Yale University.

"Richard Ruggles, often assisted by Nancy Ruggles has been a major contributor to national income accounting and to the empirical study of microeconomics and macroeconomics using that and other data. He has focused on the quantitative analysis of actual economic systems in a discipline increasingly preoccupied with abstract pure conceptual models. Like the work of Simon Kuznets and others, Ruggles's analyses encompass an unusually wide range of variables."
--Warren J. Samuels, Michigan State University, USA

Foreword
Edward N. Wolff

The essays collected in this volume represent pioneering work by Nancy and Richard Ruggles on both the integration of micro- and macroaccounting data and the development of microdata. Four principal themes emerge in this volume.

The first is the reconciliation of macrodata with microdata. The development of national accounts was based on double-entry bookkeeping such that any product entry is matched with a corresponding income entry. The principal aim is to maintain consistency between the product and income accounts. Thus, for example, the definition of gross domestic product (GDP) on the product side consists of five components: household consumption, investment, government expenditures, exports, and imports (treated as a negative entry). However, GDP can also be defined on the income side as the sum of three components: employee compensation, corporate gross profits and indirect business taxes. One of the principal goals in constructing the national income and product accounts is to maintain equality between the two ways of defining GDP.

The national accounting system was developed during the 1930s and 1940s through the work of Simon Kuznets, Richard Stone, and Nancy and Richard Ruggles, among others. The 1950s and 1960s saw the development of major microdata sources first in the US and later in other OECD countries. Most of these were based on household surveys. In the case of the United States, this process began with the creation of samples from the US decennial censuses and the production of monthly samples from the Current Population Survey (CPS). Later, samples (and censuses) of enterprises in the economy were developed in the US, as well as censuses of governmental units.

In principle, the data contained in these new microdata sources should be consistent with the macrodata in the national accounts. For example, the sum of income entries in the March supplement of the CPS should match the corresponding entries of personal income in the national accounts; the sum of incomes and sales in the enterprise microdata should match the enterprise data in the national accounts; and the total for government expenditure in the government microdata should equal the corresponding entries in the national accounts. In practice, however, this was rarely the case and the two sources often produced rather disparate estimates. This is particularly true for the household sector for which estimates of total interest, dividends, transfer payments, and even wages and salaries derived from the two sources often differ.

Three papers in this volume address this issue in great detail: ‘The Role of Microdata in National Economic Accounts’; ‘The Development of Integrated Data Bases for Social, Economic and Demographic Statistics’; and, particularly, ‘The Integration of Macro- and Microdata for the Household Sector’. Several requirements are put forward by Nancy and Richard Ruggles to fully integrate the two sources. First, the definition of sectors should be the same in the macrodata and microdata. For example, while in the household microdata only households are included, the macro ‘household’ accounts often include nonprofit institutions. Second, definitions and imputations should be consistent between the two sources. For example, while employer-financed pension contributions are recorded as part of employee compensation in the national accounts, this component does not normally appear in household microdata as part of compensation. Another example is that while the national accounts data include imputed rent to owner-occupied housing, this element is rarely present in household microdata. Third, alignment of macro- and microdata should be a two-way street. Though national accounting data are internally consistent (between the product and income sides), this does not necessarily imply that macrodata are necessarily superior to the corresponding microdata. For example, in national accounting data, the interest received by the household sector as personal income is computed as a residual from that of other sectors, whereas in microdata the household provides a direct estimate of interest received.

The second theme is the synthesis of microdata from several sources. The problem addressed by Nancy and Richard Ruggles is that individual data sources, particularly household surveys, can ask the respondent only a limited number of questions. This is due both to practical limitations on the length of interviews and the cost of processing additional questions. As a result, different household surveys have concentrated on different kinds of household behavior. The CPS and the decennial census sources focus mainly on demographic characteristics of households and income receipts. The Consumer Expenditure Survey is very strong on consumption expenditure data but relatively weak on income and demographic details. The Federal Reserve Board Survey of Consumer Finances concentrates primarily on household asset and liability data but offers very little information on consumption expenditures.

Another problem is that different microdata sources may focus on different parts of the distribution. This is particularly the case with household data on income. For example, both the decennial census and the CPS focus mainly on the broad middle classes but are relatively weak on the lower tail (the bottom 10 percent) and the upper tail (the top 5 percent) of the income distribution. In contrast, the Internal Revenue Service Tax Model, a sample of tax returns stratified by income, contains detailed income data on the upper tail on the distribution. However, it contains very limited information on the bottom tail on the income distribution, since most of these families do not file tax returns. It also has very sparse information on the demographic characteristics of the individuals in its sample.

The solution proposed by Nancy and Richard Ruggles is a statistical match of microdata sources. The idea of statistical matching is to combine microdata files which are complementary in terms of the variables they contain or the parts of the distribution that they sample. This approach is developed in two articles in this volume. The first, ‘A Strategy for Merging and Matching Microdata Sets’, lays out one method for statistical matching. The basic procedure is to select common or overlapping variables found in the two data sets to be matched. The most important of these variables, called cohort variables, are chosen to be matched on an exact basis between the two data sets. These might include the type of household (married versus single), gender (in the case of singles), age, and race. The other overlapping variables, called ‘X variables’, are matched on the basis of pre-assigned intervals rather than on exact values. For example, a typical X variable is total family income, which could be matched within hundred dollar or thousand dollar intervals. The two files are first sorted on the basis of the cohort variables and then, within cohorts, on the basis of the X variables. The two files are then merged on the basis of their closest match.

The second, ‘Merging Microdata: Rationale, Practice and Testing’, describes one such successful match carried out by the Ruggles between the 1970 Census of Population 1-in-1,000 Public Use Sample (PUS) and the 1969 Internal Revenue Service Tax Model (IRS). One problem was that the two files had different units of observation: the former was based on households and the latter on tax units. This problem was solved by assuming that all married couples filed joint returns and individuals, including unmarried adults in a household, filed single returns. The cohort variables (which were matched exactly) used in the statistical match were: (1) type of tax return; (2) sex of respondent (in the case of single returns); (3) race of head of household; (4) age of head of household; (5) number of children; and (6) owner-occupied home versus rental unit. The matching (or X) variables, for which statistical matching was done within intervals, were: (1) wage and salary income, (2) business earnings, (3) farm income and (4) total income. Results of the match are documented in considerable detail in the article. Statistical tests were also applied which indicated that reliable synthetic data sets could be constructed from the sort–merge matching procedure developed by the Ruggles.

The third theme is the creation of new longitudinal microdata sets. For the household sector there do exist ‘panel’ data sets which re-interview the same households on a year-by-year basis. A cross-sectional sample of households is selected in the base year of the survey. This sample is interviewed in successive waves over time to create a longitudinal data file. One such example is the Panel Study of Income Dynamics (PSID), which was developed and is maintained by the Survey Research Center of the University of Michigan. This data source has led to a rich body of literature, on such topics as income mobility, poverty spells, and income variability over time.

There also exist administrative records kept by the federal government which could form the basis of the creation of longitudinal data sets. Though they are not specifically designed as panel data, they do contain identification numbers that would allow the records to be matched on a year-by-year basis.

Nancy and Richard Ruggles, in their paper, ‘The Analysis of Longitudinal Establishment Data’, demonstrate how this can be done in the case of establishment records kept for the US Census of Manufacturing. Every five years, the US Bureau of the Census does a census of all manufacturing plants in the US. It records data such as the number of employees, total sales, total profits, and total capital stock owned in the plant. It keeps records of these plants on the basis of an ID number. In noncensus years, a sample of establishments is re-interviewed (in the Annual Survey of Manufacturing), and the same questions are asked. The combination of these two data sources can be used to develop a longitudinal data set of manufacturing establishments.

The development of such a longitudinal data file is not as straightforward as it might appear. First, there are both births (new entries) and deaths (plant closures) which complicate the process of keeping track of such plants over time. Second, some firms may merge while others may divest themselves of one or more plants by selling them to another firm. Since it is important to keep track of firm ownership information for purposes of statistical analysis, this problem further complicates the development of a longitudinal data set. Third, a plant may shift in terms of the products in produces, thus resulting in a change in industry classification.

The Ruggles’s pioneering work in this field produced a file which they called the LED file — Longitudinal Establishment Data file. The period covered was from 1974 to 1981. This file has now evolved into the Longitudinal Research Database (or LRD), which is maintained and updated annually by the Center of Economic Studies at the US Bureau of the Census. The data base is now available to a wide range of researchers and has already given rise to numerous studies. One example is the path-breaking book, Job Creation and Destruction, by Steven J. Davis, John C. Haltiwanger, and Scott Schuh (MIT Press, 1996).

The technique developed by the Ruggles can today be applied to other governmental data sources. One example is the social security records on individual workers maintained by the Social Security Administration. Annual reports of wage and salary earnings, as well as social security contributions, are easily identified by social security number and could be linked over time to create a longitudinal data set (this was done for a short period of time to create the so-called LEEDS file). Other possibilities are the personal income tax records and the corporate income tax returns of the Internal Revenue Service (IRS). Currently, the IRS provides researchers with a sample of annual tax returns for individuals. These could also be linked over time (again on the basis of the social security number) to create a longitudinal data set.

The fourth theme in this volume is the importance of institutional sectoring for the analysis of economic behavior. This problem is addressed in three papers, ‘Household and Enterprise Saving and Capital Formation in the United States, 1947–91: Market Transactions View’; ‘Accounting for Saving and Capital Formation in the United States, 1947–91’; and ‘The Integration of Macro- and Microdata for the Household Sector’. All three articles focus on the measurement of savings. Though most theories of savings, such as the permanent income hypothesis or the life cycle model, implicitly assume that all savings is done by households, Nancy and Richard Ruggles argue that savings is done by different institutions within the US. In particular, besides households, the enterprise sector itself (as well as governments) engages directly in savings behavior.

In their accounting scheme, they develop separate current and capital accounts for the household sector, the enterprise sector, and the government sector. Their most provocative finding is that the household sector and the enterprise sector are each self-financing. In other words, almost all the financial savings done by households is used to pay for household capital formation — particularly housing and consumer durables. On net, the household sector channels almost no financial savings to the enterprise sector. Conversely, almost all the capital formation done by enterprises is financed through enterprise savings — particularly undistributed gross profits. These sets of results have wide-ranging implications for modern theories of savings and investment.