pentest

A Guide to Resource Discovery for Penetration Testing

Introduction

Resource enumeration is an essential part of both the pentest preparation and reconnaissance during the pentest. Our team often finds customer internal resources in public access or vulnerable services that are not reflected in the original scope. It is always important to understand that there may be a resource/service on the customer's side that is unknown to them. This article lists the main methods of gathering information about the target and associated resources. This article does not cover all data-gathering methods and focuses only on the most effective methods used by our team. In the context of the article, resources are understood to include IP addresses, servers, services, domain names, autonomous systems (AS), but not limited to these.

Resource Discovery Methods

Passive Data Collection from Public Services

Passive information gathering excludes active interaction with target resources and consists of processing information from public services.

Searching for IP addresses by domain history

Domain A-record history search services:

Data obtained between services sometimes do not match but allow you to build an assumption of the connection between an IP address and the target domain based on the intersection of information about the IP address between several services. Dnsistory.org and Viewdns.info allow you to search only for the main domain. Virustotal.com and Securitytrails.com store and provide access to the domain and subdomains.

Search for domains and subdomains

To search for domains and subdomains, you can use several services that collect data using different mechanisms:

https://crt.sh/?q=linkedin.com – domains from the SSL-certificate history;
https://www.shodan.io/domain/linkedin.com – subdomains and IP addresses;
https://www.virustotal.com/gui/domain/linkedin.com/relations – subdomains and IP;
https://securitytrails.com/list/apex_domain/nl.linkedin.com – subdomains and IP;
https://searchdns.netcraft.com/?restriction=site+ends+with&host=.linkedin.com – subdomains;
http://web.archive.org/cdx/search/cdx?matchType=domain fl=original url=linkedin.com – search for domain and subdomain URLs using the CDX API web.archive.org. Instructions on search parameters: https://github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md. It is recommended to download the response via wget/curl and study locally since for a large project the service can generate a response of several hundred megabytes. Since the service will return all URLs for the entire history of the website snapshots - it can be used to search for deleted files or pages and gain access to them through the main site http://web.archive.org/;
All well-known "Google dorking" through search operators, for example, "site:.linkedin.com";
https://rsecloud.com/results?host_type=global&query=linkedin – search for related domains and subdomains via organization name, certificates and domains;
https://builtwith.com/relationships/linkedin.com – search for related domains and subdomains through matching advertising trackers/identifiers;
https://app.netlas.io/host/linkedin.com/ – subdomains and IP;
https://shdn.io/analyze?target=linkedin.com – search by company, domain or IP.

Searching domains by IP address

It is important to check for the presence of other domains on a shared resource (IP address) with the target domain, for example, you can detect technical domains, domains for different regions or domains of other projects, and, through attacking them, to capture the target server. At the same time, the detection of a large amount of domains associated with one IP address may indicate that this is a public hosting for websites or a proxy/cloud solution. To obtain a list of domains associated with an IP address, you can use the following services:

https://www.shodan.io/host/8.8.8.8 – domains of the IP address, open ports, and other useful information;
https://www.virustotal.com/gui/ip-address/8.8.8.8/relations – domains associated with the IP address over the entire history of domain tracking by the service;
https://securitytrails.com/list/ip/8.8.8.8 – search for domains by IP (current records);
https://bgp.he.net/ip/8.8.8.8#dns – domains pointing to the IP address, WHOIS, ASN. The same service stores information on the DNS records of subnets and autonomous systems. Having determined the operator, subnet, and ASN, you can study the pool of records about domains and find related resources in case the owner of the resource uses the same subnets/ASN to publish his resources;
https://builtwith.com/relationships/ip-address/13.107.42.14 - domains associated with the IP address based on advertising trackers/identifiers.

Active Data Collection through Resource Interaction

Active data collection is associated with interaction with target resources and usually does not require prior approval from the customer. However, activity can be detected by the firewall and the source of activity can be blocked, resulting in the loss of access to previously discovered resources as well as the failure to detect existing resources. Try to agree with the resource owner to include a list of IP addresses in the exclusions to avoid such a situation. Similarly to the passive collection, it is better to use several tools, since each of them can use different mechanisms for detecting resources, and after studying the results, determine the optimal set that is convenient for you.

Tools for searching IP addresses, domains, and subdomains

To discover IP addresses, request domain A-records. Domains can be found through CNAME, NS, and MX-record queries. These requests can be made with the native utilities of your OS, such as “nslookup” and “dig”. If you invested all your points of experience in luck, then you may be lucky enough to copy the DNS zone through the AXFR protocol.

To search for subdomains by brute force, ready-made tools will work well:

Ready dictionaries can be found in tool repositories and on GitHub: https://github.com/danielmiessler/SecLists/tree/master/Discovery/DNS

Do not forget that the domain can have wildcard records and the utilities can give false results, so it is very important to recheck the results of the tools.

Manual resource search

In manual mode to search for resources it is necessary to check:

server response headers, for example, Content-Security-Policy (contains a list of trusted sites);
the Subject Alternative Name parameter in SSL-certificates may contain a list of domains for which this certificate is issued;
the contents of the .well-known/security.txt and crossdomain.xml files (instead of manual enumeration, you can use ready-made tools, for example, https://github.com/maurosoria/dirsearch);
the domains with which the target site interacts through JS-code or content loading;
requesting the server through a TLS/SSL connection without sending a Host header may return a technical SSL certificate or one issued for another target company resource.

Mobile App Analysis

If the customer has a mobile application, it is necessary to analyze its traffic to obtain a list of domains that the application interacts with. You can use static and/or dynamic analysis. Detailed instructions for mobile app analysis are outside the scope of this article, only a general concept is described, to find out the requested domains.

The simplest way to extract domains and URLs from the application as part of static analysis is to use the MobSF framework (https://mobsf.github.io/docs/#/docker).

In the case of dynamic analysis, it is enough to configure the proxy and DNS settings on the phone or emulator so that requests are sent to the controlled DNS server and proxy server. We will see the names of the domains with which the application interacts, from the history of requests to the DNS server and requests to the proxy server.

Iterations and Resource Relationships

The process of discovering resources is usually a constant iteration of similar actions and analysis of the data obtained, where, in an ideal situation, each iteration increases the number of collected resources. For example, from the domain we find subdomains, from subdomains we find IP addresses, from IP addresses we find new domains and subnets. After each iteration, it is important to correctly assess the obtained information, as it will be used as input/source data for subsequent iterations and/or should confirm previously collected information. It is very important to recheck and compare data extracted from external services, for example, by comparing several independent sources, for common intersections, and manual rechecking. Record all found resources, indicating the source and status of the check, to avoid chaos. The more you document at the initial stage, the easier it will be to work with this data.

#pentest