Week 9 Lab – Open Web Information Gathering
Students
Notes
· This seminar can be performed without a virtual machine however it’s written for Kali (setup in previous weeks). If you are running it without Kali then google to find an alternative command for your OS.
· You can perform this exercise in groups over your online meeting tool of choice or individually if you would prefer.
· If you are performing the exercise as a group, please make sure you've joined the same group in the group selection tool on iLearn.
Background
Cyber criminals and hackers spend a lot of time browsing the web, looking for background information about their target organisation. Things that they will be interested in are: What does their target organisation/individual do? How do they interact with the world? Do they have a sales department? Are they hiring? Cyber criminals will browse the organisation’s website, looking for general information such as contact information, phone and fax numbers, emails, company structure etc. They will also look for sites that link to the target site, or for company emails floating around the web.
A lot of the time, the smallest details can give an attacker the most information. For example, how well designed is the target website? How clean is their HTML code? These things might give an attacker a clue about the organisation’s web development budget, which may reflect on their security budget.
Google is a hacker’s best friend, especially when it comes to information gathering.
Enumerating with Google
Google supports the use of various search operators, which allow a user to narrow down and pinpoint search results. For example, the ‘site’ operator will limit Google search results to a single domain. Say we want to know the approximate web presence of an organisation, we can use ‘site:Microsoft.com’ to show only results for the Microsoft.com domain. Figure 1 below shows that on 22nd March 2017, Google indexed around 34.5 million pages from the Microsoft.com domain. These specific queries are referred to as “Google Dorks”
Figure 1: The Google ‘site’ operator in action
Activity 1: Practice with the ‘site’ operator
Use the ‘site’ operator and perform a Google index on 3 companies of your choice. Ideally selecting small or medium size organisations would be ideal. Record in the box below the companies that you have selected and the number of pages that Google indexed for each.
Company 1: No of pages:
Company 2: No of pages:
Company 3: No of pages:
In the Microsoft example shown in Figure 1, you will notice how most of the results originate from the www.microsoft.com subdomain. Now let’s filter those out to see what other subdomains may exist at microsoft.com. We can do this using the following command:
site:microsoft.com –site:www.microsoft.com
These two simple queries have revealed quite a lot of background information about the microsoft.com domain, such as their Internet presence and a list of their web accessible subdomains.
Use this simple query on your selected 3 companies and record the number of results returned for each and three subdomains for each in the box below:
Company 1: No of pages:
Subdomains:
Company 2: No of pages:
Subdomains:
Company 3: No of pages:
Subdomains:
Activity 2: Research
Perform some research and provide 3 Google Dorks that can be used to find sensitive information
Dork 1: Purpose: Dork 2: Purpose: Dork 3: Purpose: |
Activity 3: DNS lookups
We’re going to perform a zone file lookup; this can be done using specific tools but there are some websites that will allow us to do this too. We’re going to use https://www.ultratools.com/tools/dnsLookup. Perform a lookup on the 3 domains you have chosen above.
Example: zoom.us
Mail server: Google
Name server: AWS
Web server IP: XXXXXXXXXX
Domain 1:
Mail server:
Name server:
Web server IP:
Domain 2:
Mail server:
Name server:
Web server IP:
Domain 3:
Mail server:
Name server:
Web server IP:
Activity 4: Robots.txt
Robots.txt is publicly available and found on websites – it gives instructions to web robots (search engine crawlers) about what is and is not visible using the Robots Exclusion Protocol. The Disallow: / statement tells a browser not to visit a source. Disallow can give an attacker intelligence on what a target hopes not to disclose to the public. The Robots.txt file can be found in the root directory of a target website and is publicly available.
Go to your web browser and type in the following address: http://www.facebook.com/robots.txt. Your search should return something like Figure 3 below.
Figure 3: Results of robot.txt search on Facebook
Robots can ignore a /robots.txt disallow command, especially malware robots that scan the web for security vulnerabilities. Email address harvesters used by spammers will also pay no attention to the robots.txt file disallow command. Anyone can see what sections of the server that the organisation doesn’t want robots to use or see – this information can be used to find information that the company wants to keep private (and this usually means that there is something there that the company wants to hide). This information is giving potential malicious actors a lot of information and intelligence about the structure of the website (and therefore, potential targets).
Enter the address of a popular website into your search engine and add robots.txt to the end of the address (see above). Record, in the box below, web pages or folders that the organisation doesn’t want you to see.
Domain 1
[Insert robots file here]
Domain 2
[Insert robots file here]
Domain 3
[Insert robots file here]
Activity 5: Email harvesting
Email harvesting is an effective way of finding emails, and possibly usernames, belonging to an organisation. These emails are useful in many ways, such as providing a potential list for client-side attacks (such as phishing), revealing the naming convention used in the organisation, or mapping out users in the organisation.
Open Kali Linux and navigate to the ‘theharvester’ tool. You can do this by clicking on:
1. Applications > Kali Linux > Information Gathering > OSINT Analysis > The Harvester
2. Next, enter the following syntax into theharvester command line:
theharvester -d microsoft -l 200 -b linkedin
3. Record the first five lines of what is returned in the box below:
4. Now try a different company and a different search engine using the following syntax: theharvester –d sixthstartech.com –l 300 –b google
5. Record what is returned in the box below:
6. Using the following syntax, enumerate email addresses belonging to one or more of the organisations you chose in Activity 1:
Ø theharvester –d [organisation] –l 300 –b [search engine name] #
-d [organisation] will be the organization from which you want to fetch the information]
-l will limit the search for a specified number
-b is used to specify the search engine name (for example, Google, Yahoo, Bing etc)
Record in the box below the information that you have been able to find about your chosen organisation(s). You can experiment with different search engines and limit searches to various numbers.
Activity 6: Research
Look into other open source techniques and describe 3 of them below
Technique 1
[Describe here]
Technique 2
[Describe here]
Technique 3
[Describe here]
Activity 7: Research
Perform some research into how this information could be used maliciously and describe below