Login  |  Help  |  Contact Us


The PREDICT Repository

Welcome to PREDICT, the Protected Repository for the Defense of Infrastructure Against Cyber Threats. PREDICT can quickly and easily provide qualified developers and evaluators with regularly updated network operations data they can use in their cyber security research.

PREDICT is supported by the Department of Homeland Security, Science & Technology Directorate.

Our distributed repository of data hosts and providers located at major universities and other venues helps researchers generate repeatable results and save time and financial resources by not having to invest in data collection and storage capacities.

Learn more about the repository by exploring the tabs above and browse our data catalog to see what kinds of data are available.


PREDICT Dataset Highlight

GT Malware Passive DNS Data Daily Feed
By Adam Allred
Georgia Tech

Each day, the Georgia Tech Information Security Center (GTISC) processes over 100,000 previously unseen, suspect Windows executable files. To derive network-level information that can help make the potential maliciousness of these files self-identifying, each executable is run in a sterile, isolated environment for a short period of time, with limited access to the Internet.

During processing, each executable’s use of the Domain Name System (DNS) is recorded in both raw (packet capture, or PCAP) and plaintext formats. The plaintext format, which contains a subset of information present in the PCAP files, is represented as a series of CSV files named according to the date on which a given set of executables was processed. Each file comprises a series of 3-tuples that provide the executable's MD5 hash, the qname (domain name) of the DNS query, and (if the query was of type A) a resolution IP address for the domain name. 

As of July 2015, a daily feed comprising both raw and plaintext versions of the data is available through DHS PREDICT. In aggregate, this information represents a special kind of passive DNS database for suspect and known malicious software, which GTISC believes will be useful for a variety of research and operational purposes.

If you have an account, and would like to request this dataset, follow these directions:
1) Login and click the Browse Catalog button
2) Click the Quasi-restricted radio button and wait for the page to refresh
3) Enter "GT Malware Passive DNS Data Daily Feed" in the Search Text field and click Submit Search button.
4) Click the check box next to the resulting dataset, then click the Requested Selected Dataset(s) button at the bottom of the page

News and Events

Daily Feed Available for GT Malware Passive DNS Data...

New classes of data have been added: Commercial and Non-Commercial...

Welcome, United Kingdom...

Netalyzr Data Now Available...

Internet Atlas Now Available Through PREDICT...

Welcome, Australia...

Select News and Events to View:

New This Week

  • Updates to the Terms of Use Agreement to reflect new data classes
  • Enhancements made to the dataset class filtering, sorting and browsing tools 

Frequently Asked Questions 

Account requests
Dataset requests
Data Providers

Browser Note:

Chrome is the preferred browser for best portal performance.

Privacy Statement   |   Portal Terms of Use   |   Adobe Reader Plug-In   |   Copyright © 2005-2014, RTI International   |   v7.4.1