CARVIEW |
Every repository with this icon (

Every repository with this icon (

Description: | Clean data related to the housing crisis edit |
Homepage: | edit |
Public Clone URL: |
git://github.com/hadley/data-housing-crisis.git
Give this clone URL to anyone.
git clone git://github.com/hadley/data-housing-crisis.git
|
Your Clone URL: |
Use this clone URL yourself.
git clone git@github.com:hadley/data-housing-crisis.git
|
name | age | message | |
---|---|---|---|
![]() |
.gitignore | Tue Jul 21 14:18:38 -0700 2009 | Wrote readme summaries of most of my findings [garrettgman] |
![]() |
communication/ | Thu Jul 23 12:54:08 -0700 2009 | Minor tweaks to connexions modules [hadley] |
![]() |
data/ | Mon Jul 20 12:15:58 -0700 2009 | moved all of the export (or most of them). FIL... [bigbear] |
![]() |
exploration/ | Thu Jul 23 14:23:56 -0700 2009 | Wrote read me for usps vacancy folder. fixed li... [garrettgman] |
![]() |
readme.md | Thu Jul 23 11:11:41 -0700 2009 | exploration [DexReid] |
Project Overview
This research group is a collaboration between Rice University undergrads, graduate students, and professor from the Statistics Department. It is funded by the NSF's VIGRE (Vertically Integrated Grants for Research and Education in Mathematical Sciences) program. We hope to foster research of the US housing crisis by creating an easily accessible repository of data and findings. We also hope to pioneer efficient ways to display information collected from data, as well as to inspire others to contribute data and analysis ideas.
Data related to the housing crisis exists in large, independent, and often messy data sets. So far, we have worked with subsets as large as 10 GB. The variety and size of the data creates an obstacle for effective analysis. Our first task after locating a new data source is to make it consistent with our existing data structures. We must also screen it for correctness, completeness, and conciseness. To facilitate sharing data, we have conducted both data cleaning and analysis with the open source statistical software R, which is available free of charge at (www.r-project.org). We've made both the data and programming code available to the public through the website (www.github.com). We hope that by keeping the code transparent and self-replicating, others are able to easily build off our work.
We would like to develop a website that will allow users to easily access the data they are interested in, which would otherwise be a daunting task for those who wish to use a data set of this size. Because our analysis and findings also involve large amounts of information, (such as construction price time series for each US metropolitan area) we are exploring interactive graphical methods for displaying this information.
The US housing crisis has undermined the world economy in wide reaching and poorly understood ways. Although there is a lot of speculation over the causes and the effects of the housing crisis, most of these ideas come from opinionated blogs or news articles that do not list their sources. This lack of data becomes perilous as the US government invests trillions of dollars based on untested hypotheses concerning the crisis. We hope to promote well-informed policy and discussion by making it easy to collect useful information about the housing crisis.
Data Set Overview
American Community Survey (ACS): ACS is a 1% sample of all houses in the USA. It contains 1,293,393 housing records and 2,946,342 people and is collected yearly. We extracted data related to migration and second housing information.
Case-Shiller Housing Price Index: Case-Shiller HPI measures average change in prices for single-family homes and is calculated monthly. This data covers 20 Metropolitan Statistical Areas (MSA). See below for the difference between Case-Shiller and Federal Housing Finance Agency house price indices.
Census Housing Units: Data set is from the U.S. Census. It records the number of housing units by state and county. The data covers years 2000 through 2007. The number of houisng unites in 2000 is from the 2000 census. For ever year after, the estimate was based on July 1st of that year. For a definition of housing unit, please see "Terms" below.
Census Population: Data set is also from the U.S. Census. Cleaned data set contains information regarding population for U.S. Core Based Statistical Areas (CBSA). Has information such as population estimate, deaths, births, and migration. The data covers years 2001-2008. For a definition of CBSA, please see "Terms" below.
Cenus QuickFacts: Data set contains basic demographic and geographical information. It includes information for all states and counties, and cities with more than 25,000 people.
City Location: The clean data set contains information for all major cities for every state. It includes longitude and latitude, and the multiple location codes used throughout the data set. Please see “Terms” for the list and definition of the different types of locations used.
Construction Housing Units: This information was taken from the U.S. census. It records new residential construction in Metropolitan Statistical Areas. The data is collected monthly and covers years 2000 through 2009.
County Labor Force: This data set is from the U.S. Bureau of Labor Statistics. It contain information regarding labor force and unemployment for every state and county. It covers years 2000 through 2008.
Federal Housing Finance Agency (FHFA) House Price Index: Data measures of the movement of single- family house prices. This data is from repeat mortgage transactions purchased or securitized by Fannie Mae or Freddie Mac. This data is collected quarterly and covers years 2000 through 2009.
Metropolitan Gross Domestic Product (GDP): This data set is from the U.S. Bureau of Economic Analysis. Data records how much money has been made from specific industries such as agriculture and manufacturing. The data covers Metropolitan Statistical Areas and is collected yearly (2001-2006).
Market Rents: Data is from U.S. Department of Housing and Urban Development. It records the median monthly rent for Metropolitan Statistical Areas and counties. This data covers years 2003 through 2009.
Texas MSA Sales: This data was downloaded from Texas A&M University real estate center. Data includes housing variables such as total number of houses sold, average house price, and average number of houses listed. Covers only Texas metropolitan areas as defines by the real estate center. We have matched this to census Metropolitan Statistical Areas as best as possible. This data was collected monthly from January 1990 to April 2009.
United States Postal Service (USPS ) Vacancy: This data was downloaded from the U.S. Department of Housing and Urban Development. Data includes total number of addresses and number of residential and business vacancies. This data was collected quarterly and covers all states and counties for years 2006 through 2009.
Terms
Locations:
- MSA (Metropolitan Statistical Area): The Census Bureau defines an MSA as one or more counties that has a population of at least 50,000 people, plus adjacent territory that has a high degree of social and economic similarities.
- CBSA (Core Based Statistical Area): CBSA contains both MSAs and new created micropolitan areas.
- Micropolitan Area: Census Bureau defines micropolitan areas as urban clusters of at least 10,000 and fewer than 50,000 people.
- PUMA (Public Use Microdata Area): PUMA consists of 5% of the population.
House Price Index: A scale representing the average value of specified prices as compared with some reference figure. In our data set this reference figure is usually the value of the first recorded HPI. (HPI Current / HPI index date)*100
Gross Domestic Product: Amount of money a city's economy generates from a certain industry.
Housing Unit: A place where a single family lives (includes houses, apartments, condos etc.).
FIPS Code: "Federal Information Processing Standard" Every state, county, and region has a FIPS code.
How does the HPI differ from the S&P/Case-Shiller® Home Price indexes?
Although both indexes employ the same fundamental repeat-valuations approach, there are a number of data and methodology differences. Among the dissimilarities:**
a. The S&P/Case-Shiller indexes only use purchase prices in index calibration, while the all-transactions HPI also includes refinance appraisals. FHFA’s purchase only series is restricted to purchase prices, as are the S&P/Case-Shiller indexes.
b. FHFA’s valuation data are derived from conforming, conventional mortgages provided by Fannie Mae and Freddie Mac. The S&P/Case-Shiller indexes use information obtained from county assessor and recorder offices.
c. The S&P/Case-Shiller indexes are value-weighted, meaning that price trends for more expensive homes have greater influence on estimated price changes than other homes. FHFA’s index weights price trends equally for all properties.
d. The geographic coverage of the indexes differs. The S&P/Case-Shiller National Home Price Index, for example, does not have valuation data from 13 states. FHFA’s U.S. index is calculated using data from all states.
For details concerning these and other differences, consult the HPI Technical Description and the S&P/Case-Shiller methodology materials
-Information above is taken directly from FHFA Questions