GT Malware Passive DNS Data Daily Feed
By Adam Allred
Each day, the Georgia Tech Information Security Center (GTISC) processes over 100,000 previously unseen, suspect Windows executable files. To derive network-level information that can help make the potential maliciousness of these files self-identifying, each executable is run in a sterile, isolated environment for a short period of time, with limited access to the Internet.
During processing, each executable’s use of the Domain Name System (DNS) is recorded in both raw (packet capture, or PCAP) and plaintext formats. The plaintext format, which contains a subset of information present in the PCAP files, is represented as a series of CSV files named according to the date on which a given set of executables was processed. Each file comprises a series of 3-tuples that provide the executable's MD5 hash, the qname (domain name) of the DNS query, and (if the query was of type A) a resolution IP address for the domain name.
As of July 2015, a daily feed comprising both raw and plaintext versions of the data is available through DHS PREDICT. In aggregate, this information represents a special kind of passive DNS database for suspect and known malicious software, which GTISC believes will be useful for a variety of research and operational purposes.
If you have an account, and would like to request this dataset, follow these directions:
1) Login and click the Browse Catalog button
2) Click the Quasi-restricted radio button and wait for the page to refresh
3) Enter "GT Malware Passive DNS Data Daily Feed" in the Search Text field and click Submit Search button.
4) Click the check box next to the resulting dataset, then click the Requested Selected Dataset(s) button at the bottom of the page