Week 9 Lab – Open Web Information Gathering Students Student ID Name ...

Question

Week 9 Lab – Open Web Information Gathering
Students
    Student ID
    Name








Notes
· This seminar can be performed without a virtual machine however it’s written for Kali (setup in previous weeks). If you are running it without Kali then google to find an alternative command for your OS.
· You can perform this exercise in groups over your online meeting tool of choice or individually if you would prefer.
· If you are performing the exercise as a group, please make sure you've joined the same group in the group selection tool on iLearn.
Background
Cyber criminals and hackers spend a lot of time
owsing the web, looking for background information about their target organisation. Things that they will be interested in are: What does their target organisation/individual do? How do they interact with the world? Do they have a sales department? Are they hiring? Cyber criminals will
owse the organisation’s website, looking for general information such as contact information, phone and fax numbers, emails, company structure etc. They will also look for sites that link to the target site, or for company emails floating around the web.
A lot of the time, the smallest details can give an attacker the most information. For example, how well designed is the target website? How clean is their HTML code? These things might give an attacker a clue about the organisation’s web development budget, which may reflect on their security budget.
Google is a hacker’s best friend, especially when it comes to information gathering.
Enumerating with Google
Google supports the use of various search operators, which allow a user to na
ow down and pinpoint search results. For example, the ‘site’ operator will limit Google search results to a single domain. Say we want to know the approximate web presence of an organisation, we can use ‘site:Microsoft.com’ to show only results for the Microsoft.com domain. Figure 1 below shows that on 22nd March 2017, Google indexed around 34.5 million pages from the Microsoft.com domain. These specific queries are refe
ed to as “Google Dorks”
Figure 1: The Google ‘site’ operator in action
Activity 1: Practice with the ‘site’ operato
Use the ‘site’ operator and perform a Google index on 3 companies of your choice. Ideally selecting small or medium size organisations would be ideal. Record in the box below the companies that you have selected and the number of pages that Google indexed for each.
Company 1: Ebay No of pages: 3,77,00,000
Company 2: Walmart No of pages: 4,12,00,000
Company 3: Tata No of pages: 1170 Pages
In the Microsoft example shown in Figure 1, you will notice how most of the results originate from the www.microsoft.com subdomain. Now let’s filter those out to see what other subdomains may exist at microsoft.com. We can do this using the following command:
site:microsoft.com –site:www.microsoft.com
These two simple queries have revealed quite a lot of background information about the microsoft.com domain, such as their Internet presence and a list of their web accessible subdomains.
Use this simple query on your selected 3 companies and record the number of results returned for each and three subdomains for each in the box below:
Company 1: Tata
No of pages: 850 Pages
Subdomains:
https:
egotata.com
https:
www.tata.com/jrd
https:
www.tata.com/tsmg
Company 2: Wallmart No of pages: 5,12,00,000
Subdomains:
https:
www.walmartshoplive.com
https:
www.walmart.com/plus
https:
www.walmartpetrx.com
Company 3: Ebay No of pages: 3,78,00,000
Subdomains:
https:
www.ebay.com/sns
https:
careers.ebayinc.com
https:
pages.ebay.com
Activity 2: Research
Perform some research and provide 3 Google Dorks that can be used to find sensitive information

Dork 1: site:.edu “phone number”
Purpose: This Dork searches for websites on .edu domains that contain the words “student” and “phone number”.
Dork 2: link:www.google.com
Purpose: lists webpages that have links pointing to the Google homepage.
Dork 3: allintitle: google search
Purpose: It will return only documents that have both “google” and “search” in the title.
Activity 3: DNS lookups
We’re going to perform a zone file lookup; this can be done using specific tools but there are some websites that will allow us to do this too. We’re going to use https:
www.ultratools.com/tools/dnsLookup. Perform a lookup on the 3 domains you have chosen above.
Example: zoom.us
    Mail server: Google
    Name server: AWS
    Web server IP: 52.202.62.196
Domain 1: Tata.com
    Mail server: Microsoft Outlook
    Name server: Microsoft Azure
    Web server IP: 40.81.95.116
Domain 2: Walmart.com
    Mail server: www.walmart.com.edgekey.net
    Name server: www.walmart.com.edgekey.net
    Web server IP: 23.201.200.129
Domain 3: Ebay.com
    Mail server: slot9428.ebay.com.edgekey.net,
    Name server: slot9428.ebay.com.edgekey.net
    Web server IP: 23.50.253.89
Activity 4: Robots.txt
Robots.txt is publicly available and found on websites – it gives instructions to web robots (search engine crawlers) about what is and is not visible using the Robots Exclusion Protocol. The Disallow: / statement tells a
owser not to visit a source. Disallow can give an attacker intelligence on what a target hopes not to disclose to the public. The Robots.txt file can be found in the root directory of a target website and is publicly available.

Go to your web
owser and type in the following address: http:
www.facebook.com
obots.txt. Your search should return something like Figure 3 below.
Figure 3: Results of robot.txt search on Facebook
Robots can ignore a
obots.txt disallow command, especially malware robots that scan the web for security vulnerabilities. Email address harvesters used by spammers will also pay no attention to the robots.txt file disallow command. Anyone can see what sections of the server that the organisation doesn’t want robots to use or see – this information can be used to find information that the company wants to keep private (and this usually means that there is something there that the company wants to hide). ...

Abishek A · Accepted Answer

Week 9 Lab – Open Web Information Gathering
Students
	Student ID
	Name
	
	
	
	
	
	
	
	
Notes
· This seminar can be performed without a virtual machine however it’s written for Kali (setup in previous weeks). If you are running it without Kali then google to find an alternative command for your OS.
· You can perform this exercise in groups over your online meeting tool of choice or individually if you would prefer.
· If you are performing the exercise as a group, please make sure you've joined the same group in the group selection tool on iLearn.
Background
Cyber criminals and hackers spend a lot of time browsing the web, looking for background information about their target organisation.  Things that they will be interested in are: What does their target organisation/individual do? How do they interact with the world?  Do they have a sales department?  Are they hiring?  Cyber criminals will browse the organisation’s website, looking for general information such as contact information, phone and fax numbers, emails, company structure etc. They will also look for sites that link to the target site, or for company emails floating around the web.  
A lot of the time, the smallest details can give an attacker the most information.  For example, how well designed is the target website?  How clean is their HTML code? These things might give an attacker a clue about the organisation’s web development budget, which may reflect on their security budget.
Google is a hacker’s best friend, especially when it comes to information gathering.
Enumerating with Google
Google supports the use of various search operators, which allow a user to narrow down and pinpoint search results.  For example, the ‘site’ operator will limit Google search results to a single domain.  Say we want to know the approximate web presence of an organisation, we can use ‘site:Microsoft.com’ to show only results for the Microsoft.com domain.   Figure 1 below shows that on 22nd March 2017, Google indexed around 34.5 million pages from the Microsoft.com domain.  These specific queries are referred to as “Google Dorks”
Figure 1: The Google ‘site’ operator in action
Activity 1: Practice with the ‘site’ operator
Use the ‘site’ operator and perform a Google index on 3 companies of your choice. Ideally selecting small or medium size organisations would be ideal. Record in the box below the companies that you have selected and the number of pages that Google indexed for each.
Company 1:  Ebay                                                  No of pages: 3,77,00,000 
Company 2:   Walmart                                          No of pages: 4,12,00,000
Company 3:     Tata                                               No of pages: 1170 Pages
In the Microsoft example shown in Figure 1, you will notice how most of the results originate from the www.microsoft.com subdomain.  Now let’s filter those out to see what other subdomains may exist at microsoft.com.  We can do this using the following command: 
site:microsoft.com –site:www.microsoft.com
These two simple queries have revealed quite a lot of background information about the microsoft.com domain, such as their Internet presence and a list of their web accessible subdomains.
Use this simple query on your selected 3 companies and record the number of results returned for each and three subdomains for each in the box below:
Company 1:  Tata   
                                                                                              No of pages: 850 Pages
Subdomains: 
 https://egotata.com/
 https://www.tata.com/jrd
 https://www.tata.com/tsmg
Company 2:  Wallmart                                                       No of pages: 5,12,00,000
Subdomains:
https://www.walmartshoplive.com
https://www.walmart.com/plus
https://www.walmartpetrx.com
Company 3:   Ebay                                                             No of pages: 3,78,00,000
Subdomains:
https://www.ebay.com/sns/
https://careers.ebayinc.com
https://pages.ebay.com
Activity 2: Research
Perform some research and provide 3 Google Dorks that can be used to find sensitive information
	
Dork 1:  site:.edu “phone number”
Purpose: This Dork searches for websites on .edu domains that contain the words “student” and “phone number”.
Dork 2: link:www.google.com
Purpose: lists webpages that have links pointing to the Google homepage.
Dork 3: allintitle: google search
Purpose: It will return only documents that have both “google” and “search” in the title.
Activity 3: DNS lookups
We’re going to perform a zone file lookup; this can be done using specific tools but there are some websites that will allow us to do this too. We’re going to use https://www.ultratools.com/tools/dnsLookup. Perform a lookup on the 3 domains you have chosen above.
Example: zoom.us
	Mail server: Google
	Name server: AWS
	Web server IP: 52.202.62.196
Domain 1: Tata.com
	Mail server: Microsoft Outlook
	Name server: Microsoft Azure
	Web server IP: 40.81.95.116
Domain 2: Walmart.com
	Mail server: www.walmart.com.edgekey.net
	Name server: www.walmart.com.edgekey.net
	Web server IP: 23.201.200.129
Domain 3: Ebay.com
	Mail server: slot9428.ebay.com.edgekey.net, 
	Name server: slot9428.ebay.com.edgekey.net
	Web server IP: 23.50.253.89
Activity 4: Robots.txt
Robots.txt is publicly available and found on websites – it gives instructions to web robots (search engine crawlers) about what is and is not visible using the Robots Exclusion Protocol.  The Disallow: / statement tells a browser not to visit a source.  Disallow can give an attacker intelligence on what a target hopes not to disclose to the public.  The Robots.txt file can be found in the root directory of a target website and is publicly available.  
 
Go to your web browser and type in the following address:  http://www.facebook.com/robots.txt.  Your search should return something like Figure 3 below. 
Figure 3: Results of robot.txt search on Facebook
Robots can ignore a /robots.txt disallow command, especially malware robots that scan the web for security vulnerabilities.  Email address harvesters used by spammers will also pay no attention to the robots.txt file disallow command.

Week 9 Lab – Open Web Information Gathering Students Student ID Name Notes · This seminar can be performed without a virtual machine however it’s written for Kali (setup in previous weeks). If you...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment

Student ID	Name