Open Source Intelligence (OSINT)

Posted Oct 3, 2025

By Hyoeun Choi

14 min read

Technical Deep Dive into OSINT and Dual-Perspective Strategies

OSINT (Open Source Intelligence) refers to the process of collecting data from public sources and analyzing it to transform it into actionable intelligence. It distinguishes itself from simple information searching by analyzing correlations between collected data points to derive insights. From a cybersecurity perspective, OSINT is an essential methodology for identifying attack surfaces or detecting potential threats early.

Core Mechanisms of OSINT

OSINT is the core of Passive Reconnaissance. Since it gathers information without sending direct packets to the target system, it allows for mapping the attack surface without leaving records in IDS/IPS or firewall logs.

Data collection occurs primarily across the following layers:

Public Facing Assets: Domains, subdomains, IP ranges, SSL/TLS certificates
Social Media & Professional Networks: Understanding organizational structure and technology stacks via LinkedIn, Twitter, etc.
Code Repositories: Credentials or internal code accidentally uploaded to GitHub, GitLab, etc.
Technical Data: Port, service, and banner information via Shodan, Censys, etc.

Utilization from an Attacker’s (Red Team) Perspective

For Red Teams or penetration testing experts, OSINT is the area where the most time should be allocated during the reconnaissance phase of the Kill Chain.

1. Infrastructure Mapping and Subdomain Enumeration

Attackers look for development servers, staging servers, or forgotten legacy systems that are less secure than the main website. Utilizing Certificate Transparency (CT) logs is effective for this. Since CT logs record all issued SSL certificates, subdomains like dev.target.com can be identified immediately.

2. Technology Stack Identification

By analyzing job postings or developers’ LinkedIn profiles, attackers identify the specific technology stack used by the target company (e.g., React, Django version, use of specific WAFs). This increases the accuracy of exploit selection later on.

3. Credential Harvesting

Attackers search for emails and password hashes collected from past data breaches to assess the feasibility of Credential Stuffing attacks.

Utilization from a Defender’s (Blue Team) Perspective

For Blue Teams and security engineers, OSINT is a means of asset management and threat detection.

1. Shadow IT Identification and Asset Management

Identify cloud instances created without organizational approval or internal management tools exposed externally. These must be discovered and access-controlled or decommissioned before attackers find them.

2. Early Detection of Data Leaks

Periodically monitor repositories like GitHub to see if AWS keys or API tokens have been accidentally committed. It is essential to monitor public repositories using automated scanners.

3. Threat Intelligence and Attacker Profiling

Collect external threat information to analyze whether currently prevalent attack techniques could affect the organization’s assets. For example, when a vulnerability in a specific open-source library is disclosed, OSINT is used to backtrack assets using that library.

Technical Implementation: CT Log Monitoring with Python

While OSINT tools like TheHarvester, Recon-ng, and Maltego exist, engineers should be able to write direct automation scripts to integrate into their pipelines.

Below is a code example using Python to call the crt.sh API, extract a list of subdomains for a specific domain, and return it in JSON format.

  
import requests
import json
import sys

def fetch_subdomains_from_crtsh(target_domain):
    """
    Queries crt.sh transparency logs to extract subdomains.
    
    Args:
        target_domain (str): Target domain to query (e.g., example.com)
        
    Returns:
        set: A set of unique subdomains
    """
    # crt.sh supports PostgreSQL wildcards (%)
    url = f"https://crt.sh/?q=%.{target_domain}&output=json"
    
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        
        data = response.json()
        subdomains = set()
        
        for entry in data:
            # The name_value field contains the domain name (multi-domain certs may separate with newlines)
            name_value = entry.get('name_value')
            if name_value:
                # Split by newline and clean up
                domains = name_value.split('\n')
                for domain in domains:
                    # Exclude or handle wildcards (*) separately if needed
                    if '*' not in domain:
                        subdomains.add(domain)
                        
        return subdomains

    except requests.exceptions.RequestException as e:
        print(f"[Error] API request failed: {e}", file=sys.stderr)
        return set()
    except json.JSONDecodeError:
        print("[Error] JSON response parsing failed", file=sys.stderr)
        return set()

if __name__ == "__main__":
    target = "google.com"  # Example domain
    print(f"[*] Analyzing subdomains for: {target}")
    
    results = fetch_subdomains_from_crtsh(target)
    
    if results:
        print(f"[*] Found {len(results)} unique subdomains:")
        for sub in sorted(results):
            print(f" - {sub}")
    else:
        print("[!] No subdomains found or error occurred.")

Code Analysis and Utilization Tips

API Utilization: The code above uses a JSON API rather than web crawling to fetch data structurally. This is a mandatory approach when building automation tools.
Data Refinement: A set data structure was used to remove duplicate domains. In a real environment, a step to verify if DNS resolution is possible using the socket library should be added for the extracted domains.
Expansion: If you build a system where this script runs periodically (e.g., CronJob) and sends notifications via Slack or email whenever a new subdomain is discovered, it becomes an excellent asset monitoring tool for defenders.

OSINT is not just searching; it is the technology of connecting fragmented data to create meaningful context. Attackers use this to find the weakest link, and defenders must use it to proactively eliminate their weaknesses.

OSINT의 기술적 심층 분석과 양방향 활용 전략

OSINT(Open Source Intelligence)는 공개된 출처에서 데이터를 수집하고 이를 분석하여 실행 가능한 정보(Intelligence)로 변환하는 과정을 의미합니다. 단순한 정보 검색(Search)과 구별되는 점은 수집된 데이터들 간의 연관 관계를 분석하여 통찰력을 도출한다는 점에 있습니다. 사이버 보안 관점에서 OSINT는 공격 표면(Attack Surface)을 식별하거나 잠재적 위협을 조기에 탐지하는 데 필수적인 방법론입니다.

OSINT의 핵심 메커니즘

OSINT는 수동적 정찰(Passive Reconnaissance)의 핵심입니다. 타겟 시스템에 직접적인 패킷을 보내지 않고 정보를 수집하기 때문에, IDS/IPS나 방화벽 로그에 기록을 남기지 않고 공격 표면을 매핑할 수 있습니다.

데이터 수집은 크게 다음과 같은 레이어에서 이루어집니다.

Public Facing Assets: 도메인, 서브도메인, IP 대역, SSL/TLS 인증서
Social Media & Professional Networks: LinkedIn, Twitter 등을 통한 조직 구조 및 사용 기술 스택 파악
Code Repositories: GitHub, GitLab 등에 실수로 업로드된 자격 증명(Credential) 또는 내부 코드
Technical Data: Shodan, Censys 등을 통한 포트, 서비스, 배너 정보

공격자(Red Team) 관점의 활용

Red Team 또는 모의 해킹 전문가에게 OSINT는 킬 체인(Kill Chain)의 정찰 단계에서 가장 많은 시간을 할애해야 하는 영역입니다.

1. 인프라 매핑 및 서브도메인 열거

공격자는 메인 웹사이트보다 보안이 취약한 개발 서버, 스테이징 서버, 또는 잊혀진 레거시 시스템을 찾습니다. 이를 위해 Certificate Transparency(CT) 로그를 활용하는 것이 효과적입니다. CT 로그는 발급된 모든 SSL 인증서를 기록하므로, dev.target.com과 같은 서브도메인을 즉시 식별할 수 있습니다.

2. 기술 스택 식별

채용 공고나 개발자의 LinkedIn 프로필을 분석하여 대상 기업이 사용하는 구체적인 기술 스택(예: React, Django version, 특정 WAF 사용 여부)을 파악합니다. 이는 추후 익스플로잇(Exploit) 선정의 정확도를 높입니다.

3. 자격 증명 수집

과거 데이터 유출 사고(Data Breach)에서 수집된 이메일과 패스워드 해시를 검색하여 Credential Stuffing 공격 가능성을 타진합니다.

방어자(Blue Team) 관점의 활용

Blue Team 및 보안 엔지니어에게 OSINT는 자산 관리와 위협 탐지의 수단입니다.

1. Shadow IT 식별 및 자산 관리

조직의 승인 없이 생성된 클라우드 인스턴스나 외부로 노출된 내부 관리 도구를 식별합니다. 공격자가 발견하기 전에 먼저 발견하여 접근 제어를 수행하거나 폐기해야 합니다.

2. 데이터 유출 조기 탐지

GitHub와 같은 저장소에 AWS 키나 API 토큰이 실수로 커밋되었는지 주기적으로 모니터링합니다. 자동화된 스캐너를 통해 퍼블릭 리포지토리를 감시하는 것이 필수적입니다.

3. 위협 인텔리전스 및 공격자 프로파일링

외부 위협 정보를 수집하여 현재 유행하는 공격 기법이 우리 조직의 자산에 영향을 미칠 수 있는지 분석합니다. 예를 들어, 특정 오픈소스 라이브러리의 취약점이 공개되었을 때, OSINT를 통해 해당 라이브러리를 사용하는 자산을 역추적합니다.

기술적 구현: Python을 활용한 CT 로그 모니터링

OSINT 도구는 TheHarvester, Recon-ng, Maltego 등이 존재하지만, 엔지니어라면 직접 자동화 스크립트를 작성하여 파이프라인에 통합할 수 있어야 합니다.

아래는 Python을 사용하여 crt.sh의 API를 호출하고, 특정 도메인에 대한 서브도메인 목록을 추출하여 JSON 형태로 반환하는 코드 예제입니다.

  
import requests
import json
import sys

def fetch_subdomains_from_crtsh(target_domain):
    """
    crt.sh를 통해 도메인의 인증서 투명성 로그를 조회하여 서브도메인을 추출합니다.
    
    Args:
        target_domain (str): 조회할 타겟 도메인 (예: example.com)
        
    Returns:
        set: 중복이 제거된 서브도메인 집합
    """
    # crt.sh는 PostgreSQL 와일드카드(%)를 지원합니다.
    url = f"https://crt.sh/?q=%.{target_domain}&output=json"
    
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        
        data = response.json()
        subdomains = set()
        
        for entry in data:
            # name_value 필드에 도메인 이름이 포함됨 (멀티 도메인 인증서의 경우 개행으로 구분될 수 있음)
            name_value = entry.get('name_value')
            if name_value:
                # 개행 문자로 분리하고 정리
                domains = name_value.split('\n')
                for domain in domains:
                    # 와일드카드(*)가 포함된 도메인은 제외하거나 별도 처리 가능
                    if '*' not in domain:
                        subdomains.add(domain)
                        
        return subdomains

    except requests.exceptions.RequestException as e:
        print(f"[Error] API 요청 실패: {e}", file=sys.stderr)
        return set()
    except json.JSONDecodeError:
        print("[Error] JSON 응답 파싱 실패", file=sys.stderr)
        return set()

if __name__ == "__main__":
    target = "google.com"  # 예시 도메인
    print(f"[*] Analyzing subdomains for: {target}")
    
    results = fetch_subdomains_from_crtsh(target)
    
    if results:
        print(f"[*] Found {len(results)} unique subdomains:")
        for sub in sorted(results):
            print(f" - {sub}")
    else:
        print("[!] No subdomains found or error occurred.")

코드 분석 및 활용 팁

API 활용: 위 코드는 웹 크롤링이 아닌 JSON API를 활용하여 데이터를 구조적으로 가져옵니다. 이는 자동화 도구 제작 시 필수적인 접근 방식입니다.
데이터 정제: set 자료구조를 사용하여 중복된 도메인을 제거했습니다. 실제 환경에서는 추출된 도메인을 대상으로 socket 라이브러리를 이용해 DNS 해석(Resolution)이 가능한지 검증하는 단계가 추가되어야 합니다.
확장: 이 스크립트를 주기적으로 실행(CronJob 등)하여 새로운 서브도메인이 발견될 때마다 Slack이나 이메일로 알림을 보내는 시스템을 구축하면, 방어자 입장에서는 훌륭한 자산 모니터링 도구가 됩니다.

OSINT는 단순한 검색이 아니라, 파편화된 데이터를 연결하여 의미 있는 문맥(Context)을 만들어내는 기술입니다. 공격자는 이를 통해 가장 약한 고리를 찾고, 방어자는 이를 통해 자신의 약점을 선제적으로 제거해야 합니다.

Cybersecurity, Threat Intelligence

OSINT

This post is licensed under CC BY 4.0 by the author.