26 Free Dataset Listings for Predictive Analytics


For those interested in honing their analytical skills, finding new research subjects, and/or testing the performance of their apps and models, this is a list of websites with links to (mostly) free datasets:

100+ interesting data sets for statistics: A list and summary of datasets thoroughly and sometimes humorously described.

Amazon Web Services Public Datasets: A centralized repository that can be integrated into AWS cloud-based applications. Although it hosts the public data sets at no charge, users pay for the compute and storage they use for their own applications. Includes data on climate, microbiomes, songs, books, and much more.

American Fact Finder: From the U.S. Census Bureau, this includes data from a range of public surveys, such as the American Housing Survey, Annual Economic Surveys, and Equal Employment Opportunity Tabulation.

Bank of England: Informally known as Bankstats, it contains data on UK money and lending, monetary financial institutions’ balance sheets, public sector debt and more.

BigDataMadeSimple: Contains over 70 websites from which a user can get large data repositories.

CERN Open Data Portal: Access point to a growing range of data produced through the research performed at CERN, which is the European Organization for Nuclear Research. It disseminates the preserved output from various research activities, including accompanying software and documentation which is needed to understand and analyse the data being shared.

Data and Story Library: Contains data files that can be looked at via method, topic and subject. There’s also a search function. Abstracts tell you about each dataset.

Data.gov: The home of the U.S. government’s open data. Contains data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations, and more.

Datasets for classroom use: From Stetson University’s Dr. John Rasp’s statistics website.

Economic News Network: UK network that aims to enhance the quality of economics learning and teaching. Includes links to global information sites as well as datasets such as IMF statistics.

Eurostat: Repository from the European Statistics Organisation, it includes data on a wide range of subjects such as population, business, economics, agriculture and health. Also see Economic and Financial Affairs data.

GitHub: The Git repository hosting service contains this list of public data sources separated into topics such as Biology, Education, Physics, Sports and Transportation.

Houghton Mifflin: Dropdown menu with datasets for specific types of statistical analyses.

James Madison University Libraries: Contains various statistics and data sources, though some are only available to students at James Madison.

Journal of Statistical Education: From the University of Florida, this is includes datasets geared toward multiple statistical tests, such as various types of regressions and ANOVAs.

Kaggle: Founded as a platform for predictive modelling and analytics competitions, Kaggle hosts a variety of datasets, from deaths and battles in the Game of Thrones books to world university rankings.

KDnuggets:  A list of data repositories from a popular site covering business analytics, big data, data mining, and data science.

Keel-dataset: A listing of hundreds of datasets along with experimental studies that have used those datasets. Keel stands for Knowledge Extraction based on Evolutionary Learning.

Microsoft Azure Marketplace: Azure, which is cloud computing platform and infrastructure, contains links to many free datasets, but other datasets are for sale.

Office of Open Government Select Datasets: From the U.S. Social Security Administration, these datasets are about people, including their wages, employers, and more. For a complete listing of their data assets, go to www.ssa.gov/data.

OECD: From the Organisation for Economic Co-operation and Development, these datasets can be browsed by country or topics, which include agriculture, economic, education, environment, and others.

RDataMining: A list of free datasets from a website devoted to data-mining in R, which is an open-source statistical programming language.

r-dir: A list of free datasets from a website devoted to reference materials for the open source software package R.

StatCrunch: Datasets shared by members of StatCrunch, which is web-based statistical software package.

Statsci.org: A portal for statistical science, this website not only contains datasets but also sources of raw data and a list of textbooks. And, it contains datasets for specific types of statistical tests here.

The World Bank: Contains free and open access to data about development in countries around the globe. Data can be searched for but is also listed by country, topic and indicator.




