Data Resources: What is Data?

Information on how to find, use, and cite numeric data resources or data sets.

What is Data?


Data are "facts or information used usually to calculate, analyze, or plan something."  Source: Meriam Webster Dictionary.

Data sets, datasets, or data resources, are raw data files.  Usually data sets will also provide related files, typically the codebook and setup files. Codebooks are guides that allow one to make sense of the raw data, and will contain the questionnaire and values for the responses to each question. Setup files allow users to read the text files into statistical software packages.

Using openly available data sets, researchers can use the same data set for different purposes, statistically analyzing data to show relationships among the different variables. 


Micro data vs.  Macro data

Micro and macro data are often used to differentiate data used in social science research.

Micro data is individual level data.  Often, micro data have been collected from each individual, either via a survey or interview. In a micro data set, each row represents an individual person, each column an attribute such as age, gender or job-type.

'Macro data' is  a term used to describe mainly two subtypes of data: Aggregated data and system-level data. Aggregated data is constructed by combining lower level unit or individual information.  Examples of aggregated data include summaries of the properties of individuals, unemployment statistics, demographics, GDP etc. System level data yields information about properties of the state or the political system. This type of data form political indicators, such as institutional variables and regime indices.  System level macro data is not based on summaries of the properties of lower-level units, but instead measures characteristics of the higher-level units.   Source:  The MacroData Guide.



Cross-sectional data is only collected once. 


Time Series

Time series data studies the same variable over time. The National Health Interview Survey is an example of time series data because the questions generally remain the same over time, but the individual respondents vary.


Longitudinal Studies

Longitudinal studies describe surveys that are conducted repeatedly, in which the same group of respondents are surveyed each time. This allows for examining changes over the life course. The Project on Human Development in Chicago Neighborhoods (PHDCN) Series contains a longitudinal component that tracks changes in the lives of individuals over time through interviews."  Source:  What is Data?

Image source:  Fleshas.  Data servers.  GNU Free.  Wikimedia Commons.