The next tool on Backtrack 4 I am going to review is The Harvester which was written by the guys over at Edge Security. The Harvester is a tool for gathering e-mail accounts, user names and hostnames/subdomains from different public sources. It’s a really simple tool, but very effective.
The supported sources are:
- Google – emails,subdomains/hostnames
- Bing search – emails, subdomains/hostnames
- Pgp servers – emails, subdomains/hostnames
- Linkedin – user names
Below I will go through a few examples of data mining some common search engines for usernames, email address’s and subdomains. The information gained in passive reconnaissance can be a invaluable resource for the penetration tester.
Lets take a look at the options which are available:
- root@666:/pentest/enumeration/google/theharvester# ./theHarvester.py
- *************************************
- *TheHarvester Ver. 1.6 *
- *Coded by Christian Martorella *
- *Edge-Security Research *
- *cmartorella@edge-security.com *
- *************************************
- Usage: theharvester options
- -d: domain to search or company name
- -b: data source (google,bing,pgp,linkedin)
- -s: start in result number X (default 0)
- -v: verify host name via dns resolution
- -l: limit the number of results to work with(bing goes from 50 to 50 results,
- google 100 to 100, and pgp does'nt use this option)
- Examples:./theharvester.py -d microsoft.com -l 500 -b google
- ./theharvester.py -d microsoft.com -b pgp
- ./theharvester.py -d microsoft -l 200 -b linkedin
Lets use cnn.com as a example:
- root@666:/pentest/enumeration/google/theharvester# ./theHarvester.py -d cnn.com -l 500 -b bing
- *************************************
- *TheHarvester Ver. 1.6 *
- *Coded by Christian Martorella *
- *Edge-Security Research *
- *cmartorella@edge-security.com *
- *************************************
- Searching for cnn.com in bing :
- ======================================
- Limit: 500
- Searching results: 0
- Searching results: 50
- Searching results: 100
- Searching results: 150
- Searching results: 200
- Searching results: 250
- Searching results: 300
- Searching results: 350
- Searching results: 400
- Searching results: 450
- Accounts found:
- ====================
- @cnn.com
- cnnfutures@cnn.com
- ====================
- Total results: 2
- Hosts found:
- ====================
- www.cnn.com
- edition.cnn.com
- money.cnn.com
- sportsillustrated.cnn.com
- amfix.blogs.cnn.com
- live.cnn.com
- news.blogs.cnn.com
- politicalticker.blogs.cnn.com
- marquee.blogs.cnn.com
- weather.cnn.com
- m.cnn.com
- transcripts.cnn.com
- www.cnnstudentnews.cnn.com
- ac360.blogs.cnn.com
- campbellbrown.blogs.cnn.com
- newsource.cnn.com
- cgi.cnn.com
- joybehar.blogs.cnn.com
- topics.edition.cnn.com
- internationaldesk.blogs.cnn.com
- us.cnn.com
- larrykinglive.blogs.cnn.com
- topics.cnn.com
- weather.edition.cnn.com
- cnnwire.blogs.cnn.com
- scitech.blogs.cnn.com
- on.cnn.com
- ricksanchez.blogs.cnn.com
- archives.cnn.com
- community.cnn.com
- sports.si.cnn.com
- arabic.cnn.com
- quiz.cnn.com
- newsroom.blogs.cnn.com
- cgi.money.cnn.com
- partners.cnn.com
- pagingdrgupta.blogs.cnn.com
- features.blogs.fortune.cnn.com
- tech.fortune.cnn.com
- insession.blogs.cnn.com
- business.blogs.cnn.com
- behindthescenes.blogs.cnn.com
- olympics.blogs.cnn.com
- afghanistan.blogs.cnn.com
- gdyn.cnn.com
- premium.cnn.com
- inthefield.blogs.cnn.com
- ypwr.blogs.cnn.com
- premium.edition.cnn.com
- edition1.cnn.com
- drgupta.cnn.com
- edition2.cnn.com
- wallstreet.blogs.fortune.cnn.com
- tips.blogs.cnn.com
- mxp.blogs.cnn.com
So as you can see from this search we were able to get a lot of possible subdomains but not very many email address’s. This is one reason its important to run your query on all available search engines.
Lets show a example which will show a few more email address’s:
- root@666:/pentest/enumeration/google/theharvester# ./theHarvester.py -d 53.com -l 500 -b google
- *************************************
- *TheHarvester Ver. 1.6 *
- *Coded by Christian Martorella *
- *Edge-Security Research *
- *cmartorella@edge-security.com *
- *************************************
- Searching for 53.com in google :
- ======================================
- Limit: 500
- Searching results: 0
- Searching results: 100
- Searching results: 200
- Searching results: 300
- Searching results: 400
- Accounts found:
- ====================
- josh.paskewicz@53.com
- @53.com
- info@tapioles53.com
- @.53.com
- rachael.smith@53.com
- nan.horton@53.com
- aler...@53.com
- alertingservice@53.com
- j.brinkman@53.com
- Jerome.Gilbert@53.com
- Gilbert@53.com
- michelle.weddington@53.com
- ====================
- Total results: 12
- Hosts found:
- ====================
- www.53.com
- reo.53.com
- direct.53.com
- premierissue.53.com
- retire.53.com
- ir.53.com
- tdsc.53.com
- secure.53.com
- ra.53.com
- 2Fwww.53.com
- Www.53.com
- 252Fwww.53.com
- espanol.53.com
- employee.53.com
- bnjhz.php?...53.com
- express.53.com
- www.ra.53.com
- Ra.53.com
- 3Dreo.53.com
- wwww.53.com
- Retire.53.com
- @.53.com
- www.express.53.com
- mxism.php?...53.com
- pngyo.php?...53.com
Using this example we got a lot more results, for example we now know that most likely all the email address’s will follow the following naming convention, firstname.lastname@53.com. This can be a very useful piece of knowledge because as long as we have a first and last name of any one at 53rd bank, we have their email address.
This is just one of the may tools which can aid a penetration tester in the passive reconnaissance process.