RaiderSec: Introducing dumpmon: A Twitter-bot that Monitors Paste-Sites for Account/Database Dumps and Other Interesting Content

Thursday, March 28, 2013

Introducing dumpmon: A Twitter-bot that Monitors Paste-Sites for Account/Database Dumps and Other Interesting Content

TL;DR

I created a Twitter-bot which monitors multiple paste sites for different types of content (account/database dumps, network device configuration files, etc.). You can find it on Twitter and on Github.

Introduction

Paste-sites such as Pastebin, Pastie, Slexy, and many others offer users (often anonymously) the ability to upload raw text of their choice. This is helpful in many scenarios, such as sending a crash report to someone or pasting temporary code. However, in addition to some people not being careful with what they upload (leaving passwords and other sensitive data in the text), attackers have been starting to use these sites to share post-compromise data, including user account data, database dumps, URLs of compromised sites, and more.

Since there are so many users uploading text to these sites, it's often difficult to find these interesting files manually. While techniques such as Google Alerts can be applied, the results are often a day or two old and are sometimes deleted. This prompted me to create a tool which monitors these sites in "real-time" (less than a minute of delay for the slowest sites) for specific expressions, and then automatically rank, aggregate, and post these results to Twitter for further analysis. I call this tool DumpMon.

Similar Tools

There are a couple of similar tools available which do essentially the same thing as dumpmon - with just a few key differences:

@PastebinLeaks - with its last tweet on December 16, 2011, PastebinLeaks no longer appears to provide pastebin monitoring. However, I really like how it integrated quite a few different expressions, such as one for HTTP passwords, Cisco and Juniper configuration files, etc. Unfortunately, as far as I can tell PastebinLeaks is closed-source.
@PastebinDorks - This bot (intentionally closed-source, still in "alpha") is still active and posts a few tweets per day. This bot appears to be primarily concerned with account credential dumps. I think the idea of assigning a numerical rank to a tweet could help determine the usefulness of a paste, but it makes the actual data found unclear.

My goal with dumpmon is to create the "next step" of paste site monitoring with the following key features:

Open-Source. I'm always open to contributions via Github. I'm working on creating all the documentation - should be up soon.
Monitors more than just Pastebin (full site listing in Appendix)
Supports multiple file types (ie the Cisco configuration files and honeypot logs)
For large account dumps, simply gives you the raw information (Emails: x, Hashes: y) directly in tweet

In the future, I would like to look into implementing the following features:

Automatically run found hashes through large wordlists and posting results
Allow users to tweet a regular expression they want monitored to the bot. The bot will then tweet them the paste once it finds a match
Search for interesting details from other sources of information (such as popular forums, etc.) instead of just paste sites
Allow caching of "most interesting" results to prevent deletion
Create daily/monthly reports that show the amount of detected data for aiding in password research

With those features outlined - let me quickly show you how I built the bot. Don't care? Just go straight to the bot here.

Bot Architecture

Here is the general architecture of the bot that's currently running:

As you can see, each site runs from its own separate thread which monitors for new pastes, downloads each one and matches it against a series of regular expressions. Then, if it finds a match, it will build and post a tweet that looks like the following:

If hashes are found, it will also include the number of hashes as well as the ratio of emails to hashes. The "Keywords" attribute seen gives an approximate ratio of "positive keywords" found out of a given list, such as "Target: ", "available dbs", "member_id","hacked by", "database: ", etc.), subtracting value for each regex matched from the blacklist. Just another metric to help determine if a paste is "interesting." It should also be noted that the emails are found are unique.

Don't Bite the Hand that Feeds

It's commonly that the most time-expensive part of web scraping is actually fetching the content. While I could go about speeding up this process by completely using an event-driven framework such as Gevent, Twisted, or others, I wanted to do my best to my best to respect the sites hosting the content. Also, I didn't want the tool to get temporarily blocked... For a third time (my bad, Pastebin). With this being the case, my bot uses the following algorithm to only get new pastes using polite time constraints.

Appendix

Currently, dumpmon supports the following paste-types:

Account/Database dumps
Google API Keys
Cisco Configuration Files (Juniper to be added soon)
Honeypot Log Dumps

Dumpmon also supports the following paste-sites:

If you can think of any other paste sites you want added, let me know!

Follow @dumpmon

- Jordan

23 comments:

nMarch 28, 2013 at 1:55 PM
Try all of them here:
http://awk.freeshell.org/ListOfPastebins
http://www.similarsitesearch.com/alternatives-to/pastebin.com

and the obvious:
https://pastee.org/

http://paste2.org/
ReplyDelete
Replies
Bo0oMApril 5, 2013 at 8:41 PM
Отлично)
ReplyDelete
Replies
Christophe VandeplasApril 6, 2013 at 4:21 AM
It seems you wrote a tool that acts using the same concept as pystemon. https://github.com/cvandeplas/pystemon

pystemon does support a lot more sites and has a very flexible configuration. Perhaps you're interested to join efforts?
ReplyDelete
Replies
AnonymousApril 8, 2013 at 3:12 PM
Hi Jordan,

Thanks for your work and for sharing this.

I've made the necessary changes to settings.py and worked through the errors I was getting there. Now I'm getting the following:

Traceback (most recent call last):
File "dumpmon.py", line 12, in
from lib.Pastebin import Pastebin, PastebinPaste
File "/home/dave/Projects/dumpmon/lib/Pastebin.py", line 1, in
from .Site import Site
File "/home/dave/Projects/dumpmon/lib/Site.py", line 7, in
from settings import USE_DB, DB_HOST, DB_PORT
ImportError: cannot import name USE_DB

Any ides about what could be going on? I haven't made any changes from the original source other than the settings.py
file.

Thank you very much,

Dave
ReplyDelete
Replies
Phiber_OptikApril 12, 2013 at 5:17 AM
Hello there

Good Tool and good work. I have installed all the dependencies and generated settings.py with all the requirements but when I am trying to run the script I get the following error:

Traceback (most recent call last):
File "dumpmon.py", line 17, in
from twitter import Twitter, OAuth
ImportError: cannot import name Twitter

Any feedback will be appreciated.

Thanks :)
ReplyDelete
Replies
Phiber_OptikApril 16, 2013 at 5:17 AM
I followed your instructions and it seems everything is working now :)

Just a proof of concept
python dumpmon.py -v
[u'robert_ambridge@hotmail.com']
[u'contactsDrumsdrums@thomann.dePh']
[u'abuse@abuse.online.nl']
[u'monte@ispi.net', u'monte@ohrt.com']

Thank you very much
ReplyDelete
Replies
AnonymousMay 1, 2013 at 5:20 PM
[python-twitter](https://code.google.com/p/python-twitter/)
$ pip install python-twitter
$ pip install beautifulsoup4
$ pip install requests
$ pip install pymongo <-- for MongoDB support (must have mongod running!)

I got passed on the onpython-pip install python-twitter
Requirement already satisfied (use --upgrade to upgrade): python-twitter in ./site-packages
Cleaning up...

python-pip install python-twitter
Requirement already satisfied (use --upgrade to upgrade): python-twitter in ./site-packages
Cleaning up...

python-pip install requests
Requirement already satisfied (use --upgrade to upgrade): requests in ./site-packages
Cleaning up...

python-pip install pymongo
Requirement already satisfied (use --upgrade to upgrade): pymongo in /usr/lib64/python2.7/site-packages
Cleaning up...

Next, edit the settings.py to include your Twitter application settings. <---- where do i get this?

Im kinda loss I would love to set up my own dumpmon thanks

ReplyDelete
Replies
UnknownJune 25, 2013 at 8:22 AM
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/pikachoo/dumpmon/lib/Site.py", line 102, in monitor
self.update()
File "/home/pikachoo/dumpmon/lib/Pastebin.py", line 32, in update
lambda tag: tag.name == 'td' and tag.a and '/archive/' not in tag.a['href'] and tag.a['href'][1:])
File "/usr/local/lib/python2.7/dist-packages/bs4/element.py", line 1167, in find_all
return self._find_all(name, attrs, text, limit, generator, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/bs4/element.py", line 499, in _find_all
found = strainer.search(i)
File "/usr/local/lib/python2.7/dist-packages/bs4/element.py", line 1527, in search
found = self.search_tag(markup)
File "/usr/local/lib/python2.7/dist-packages/bs4/element.py", line 1483, in search_tag
or (markup and self._matches(markup, self.name))
File "/usr/local/lib/python2.7/dist-packages/bs4/element.py", line 1565, in _matches
return match_against(markup)
File "/home/pikachoo/dumpmon/lib/Pastebin.py", line 32, in
lambda tag: tag.name == 'td' and tag.a and '/archive/' not in tag.a['href'] and tag.a['href'][1:])
File "/usr/local/lib/python2.7/dist-packages/bs4/element.py", line 892, in __getitem__
return self.attrs[key]
KeyError: 'href'
ReplyDelete
Replies
JordanJune 26, 2013 at 1:49 PM
Hi there!

I'm always looking for new things to monitor. If you could provide what to look for, I'd be happy to include it.

Thanks!
ReplyDelete
Replies
UnknownJuly 31, 2013 at 1:00 AM
In Paste.py isn't this doing the same regex search twice?
if regex.search(self.text):
_ logging.debug('\t[-] ' + regex.search(self.text).group(1))

In which case, wouldn't it be twice as fast/efficient to do this?
var = regex.search(self.text)
if var:
_ logging.debug('\t[-] ' + var.group(1))
ReplyDelete
Replies
UnknownAugust 12, 2013 at 3:29 AM
Thanks for the fantastic work. Is there a way to add the ability to download the raw pastes from these sites once they're identified?
ReplyDelete
Replies
The RaveSeptember 1, 2013 at 6:20 AM
Great tool came across it while looking on igoogle

I have had issues configuring for a few reasons... I have a question how long before it starts outputting to twitter?
when i run sudo python dumpmon.py i get output looks like email addresses but that is all i am getting..

ive been to twitter and created an app with my tokens and secrets etc....

I've posted my experience on my blog great work....

Dave
ReplyDelete
Replies
DanielMarch 3, 2014 at 7:15 PM
This comment has been removed by the author.
ReplyDelete
Replies
EplosiveSkullMarch 18, 2014 at 8:42 AM
Hello There !
Recently I Came Across To This Wonderful Tool and I had installed on my Linux immediately. Everything is perfect but i don't know how to save the paste as they are identified (as you said earlier to edit the code in helper.py but i don't know how).Any feedback will be appreciated.
Thanks

ReplyDelete
Replies
UnknownDecember 25, 2014 at 8:29 AM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownJanuary 16, 2015 at 10:53 AM
Hi, It seems like dumpmonitor has stopped tweeting. Thought you should know. /Martin
ReplyDelete
Replies

Add comment