UpGuard’s security researcher Chris Vickery has discovered three misconfigured AWS S3 buckets that are available for public access on the internet. The archives contain several terabytes of data, which includes social media posts and similar pages from across the globe. The data is the property of the US military, and it is alleged that the information was collected to create profiles of persons of interest. There is evidence that the software used to develop the databases was created and operated by VendorX, a defunct government contractor and it shows how risky it can be to involve third-party vendors in confidential matters.
Vickery discovered the unprotected archives while performing routine scanning of open data silos hosted by Amazon and these were rather easy to found because of the absence of appropriate security measures. The archives were titled Pacom-archive, Centcom-backup, and Centcom-archive. CENTCOM is the abbreviation for US Central Command that is responsible for controlling the majority of army operations in Central Asia, Middle East and North Africa while PACOM refers to US Pacific Command and it covers remaining regions namely China, Southern Asia, and Australasia.
While speaking with The Register, Vickery explained that he accidentally discovered the databases when he entered the term COM while scanning the publicly available S3 buckets and a simple refining of his search displayed the CENTCOM archive. Initially, he believed that the archive belonged to China-based multinational firm Tencent, but he soon realized that it was the property of US military.
Vickery downloaded around 400GB of data as a sample, and he claims that “dozens of terabytes” of data is open to public access most of which include compressed text files. One of the databases contain 1.8billion social media posts that have been collected over the past eight years until today, and the posts were made in central Asia mainly. Vickery also noted that some material was fetched from comments of US citizens.
The documents Vickery scanned revealed that the US military collected the information in connection with the Outpost program launched by the US government. It is a social media monitoring and manipulating campaign that targets youth overseas to prevent them from joining terrorist groups.
Vickery was able to identify Outpost development configuration files in the databases along with the Apache Lucene keywords indexes that were developed for Elasticsearch, an open-source search engine. Vickery also identified Another file titled Coral, and it probably referred to the very famous, award-winning Coral Reef data-mining program from the US military.
Back in 2012, Army Distributed Common Ground System’s technical director Mark Kitz stated that Coral Reef is developed to analyze a data source so that the analyst can mine a significant amount of data and assess the association between individuals to create a social network.
“Previously, we would mine through those intelligence reports or whatever data would be available, and that would be very manual-intensive,” stated Kitz back then.
Vickery revealed that he identified the openly accessible archives on 6th September 2017, much before Amazon released the new security controls to protect the S3 buckets, so it is yet to be seen how effective the new measures are. He also informed the US military about the availability of these databases, and soon these were locked down while the military also thanked Vickery for notifying them about the issue.
Although we do know exactly why Coral Reef was developed by US military and that monitoring or even collecting social media posts is nothing to be worried about but finding out that such material is quite easy to be identified given its poor configuration then one is bound to get disturbed. These kinds of open archives are no less than a treasure trove of data for intelligence agencies as well as cybercriminals and enemies of the state. As noted by UpGuard security in its blog post:
“While a cursory examination of the data reveals loose correlations of some of the scraped data to regional US security concerns… the apparently benign nature of the vast number of captured global posts, as well as the origination of many of them from within the US, raises serious concerns about the extent and legality of known Pentagon surveillance against US citizens.”
“In addition, it remains unclear why and for what reasons the data was accumulated, presenting the overwhelming likelihood that the majority of posts captured originate from law-abiding civilians across the world.”