Machine learning detections

The Investigator analytics engine provides several machine learning models that analyze Corelight logs and generate alerts when they detect invariant patterns and potential threats.

_images/ml-overview.png

The machine learning engine in Investigator ingests the log input, normalizes and organizes the data, extracts and analyzes the features, and applies the machine learning models. The result is a focused security event with a severity score.

_images/ml-engine.png

Machine learning alerts complement notices, Suricata alerts, and Falcon LogScale queries.

Machine learning models

This section describes the machine learning model detections available in Investigator.

You can find more information for each machine learning model in Investigator by clicking the alert name in the Alert dashboard. The alert details provide a summary and suggest next steps for each detection to guide your investigation and troubleshooting.

ASCII homograph

The adversary registers a domain containing one or more ASCII homoglyphs, making it visually similar to a trusted domain. This is possible because the ASCII table contains characters that look similar, for example the uppercase character “O” and the number “0”. Homograph domains are often used by adversaries to trick Internet users into visiting phishing sites.

  • Detections are based on information from the HTTP log.

  • Default severity: 5

  • This model is silent by default and can be tuned per account.

Attempted connection to a DGA domain

One or more hosts attempted to connect to the domain. The domains generated by Domain Generation Algorithms (DGAs) exhibit random-looking patterns of letters, numbers, or words, and are generally long to minimize collisions with existing domains.

  • Detections are based on information from the DNS log.

  • Default severity: 5

  • Enabled and set to alert by default.

DGA malware

A Domain Generation Algorithm (DGA) is a technique used by cyber attackers to generate new domain names and IP addresses for malware command and control servers.

Malware often relies on DGAs to obtain C2 (Command and Control) rendez-vous locations. DGAs result in an elevated number of DNS requests performed by the same internal IP, where the requests attempt and mostly fail to resolve random domains (NXDOMAIN response code).

This attack is considered stealthy because organizations often allow web traffic by default and new AGDs (algorithmically generated domains) bypass blacklists.

  • Detections are based on information from the DNS log.

  • Default severity: 8

  • Enabled and set to alert by default.

Discovery via network service scanning

Adversaries perform port scans to discover the ports and services available in a network, and which could be used to establish connections with the targeted machines. Port scans result in an elevated number of connection attempts performed by the same source IP, targeting a wide range of destination ports in one or more destination IPs.

  • Detections are based on information from the conn log.

  • Default severity: 6

  • Enabled and set to alert by default.

DNS reconnaissance

A DNS reconnaissance attack tries to get information about the network infrastructure of the company, in particular, about the DNS servers and their records. The two main DNS reconnaissance techniques are zone transfer attacks and brute force subdomain enumeration.

This attack is considered stealthy because most organizations do not monitor DNS traffic.

  • Detections are based on information from the DNS log.

  • Default severity: 3

  • Enabled and set to alert by default.

Domain combosquatting

This detection involves the identification of domain names that are similar to legitimate domain names by combining multiple words or phrases.

For example, a cybercriminal might create a domain name like “facebooksocialnetwork.com” to trick users into thinking it is the official Facebook website. This technique is effective because it can be difficult for users to differentiate between legitimate domain names and similar, but fake ones.

To detect domain combosquatting, machine learning algorithms are trained on large datasets of domain names to learn patterns of legitimate and abusive behavior. The algorithms can then identify domain names that are likely to be used for combosquatting based on features such as the similarity of the name to a legitimate domain, the use of common keywords, and the registration date of the domain. By identifying these malicious domain names, machine learning can help prevent users from falling victim to phishing attacks and other forms of online fraud.

  • Detections are based on information from the HTTP log.

  • Default severity: 5

  • This model is silent by default and can be tuned per account.

Domain typosquatting

Adversaries rely on errors made by users when typing a website address to deliver malware, to redirect to a malicious site, to commit fraud, or to phish credentials.

  • Detections are based on information from the HTTP log.

  • Default severity: 5

  • This model is silent by default and can be tuned per customer.

Exfiltration via DNS

Adversaries can exfiltrate data by encoding data in the subdomains of DNS queries. This technique results in a high volume of DNS queries generated by the same source IP, where the queries aim at resolving different random (and often long) subdomains of a domain owned by an adversary. This activity can be difficult to detect since continuous monitoring is rarely applied to DNS traffic.

  • Detections are based on information from the DNS log.

  • Default severity: 9

  • This model is silent by default and can be tuned per account.

C2 HTTP Frameworks

This detection identifies Command and Control (C2) frameworks that use the HTTP (Hypertext Transfer Protocol) for communication between compromised systems (infected devices or computers) and a central command server controlled by an attacker.

This detection considers multiple aspects of the HTTP connections between the pair such as beaconing activity, payload, and header characteristics.

C2 frameworks might differ in implementation but can share similar characteristics such as connection duration and frequency, URI pattern, and payload consistency.

  • Detections are based on information from the HTTP log.

  • Default severity: 9

  • Enabled and set to alert by default.

HTTP C2 Infrastructure Using Multiple Domains

Adversaries use redundancies to increase the robustness of their infrastructure and decoy domains to hide C2 servers and evade blacklist-based detections. Repeated HTTP paths or URIs across several rare domains or dotted quad hosts are strong malware indicators.

  • Detections are based on information from the HTTP log.

  • Default severity: 8

  • Enabled and set to alert by default.

IDN homograph

The adversary registers a domain containing one or more homoglyphs, making it visually similar to a trusted domain. This is possible because different international alphabets contain letters that look the same and are coded as different Unicode characters, for example the Latin character “a” and the Cyrillic character “a”. Homograph domains are often used by adversaries to trick Internet users into visiting phishing sites.

  • Detections are based on information from the HTTP log.

  • Default severity: 5

  • This model is silent by default and can be tuned per account.

Malicious file download

An adversary might rely on a user opening a malicious file to gain execution. Users may be subjected to social engineering to get them to open a file that will lead to code execution. Adversaries use several types of files that require a user to execute them, including .doc, .pdf, .xls, .rtf, .scr, .exe, .lnk, .pif, and .cpl.

  • Detections are based on information from the HTTP log.

  • Default severity: 7

  • This model is silent by default and can be tuned per account.

Malicious SSL certificate

Malicious SSL certificates are digital certificates that have been obtained or issued fraudulently or through an unauthorized channel. Attackers can use these certificates to conduct malicious activities, including man-in-the-middle (MITM) attacks, phishing, and malware distribution.

This detection indicates one or more hosts attempted to connect to the domain with a potentially malicious SSL certificate. The certificate associated with the domain shares characteristics with certificates used by malware and adversaries. These certificates often contain random, unpopular domain names in the certificate subject common name (CN), use free Certificate Authorities, and do not populate optional subject and issuer information such as the CountryName (C), Locality (L), Organization (O), OrganizationalUnit (OU), or the StateOrProvinceName (ST).

  • Detections are based on information in the x509 log.

  • Default severity: 5

  • Enabled and set to alert by default.

NXDOMAIN beaconing

Attackers and malware can rely on hard-coded domains for C2 and often these domains are unavailable. Malware and adversaries repeatedly attempt to connect to the predefined domains (beaconing) until a successful connection is established or a limit of attempts is reached. When the same IP tries to connect periodically to a domain that does not resolve, the result is an NXDOMAIN response code.

  • Detections are based on information from the DNS log.

  • Default severity: 4

  • This model is silent by default and can be tuned per account.

Social engineering domains

Social engineering domains trick internet users into visiting malicious sites on the false pretext of prizes, free software updates, fake antivirus alerts, and such. These sites attempt to phish credentials, install malware, or redirect users to other malicious sites.

  • Detections are based on information from the HTTP log.

  • Default severity: 5

  • Enabled and set to alert by default.

Tor connections

Adversaries, malware, and insider threats use Tor to bypass blacklist-based detection and prevention. Tor can be detected by analyzing the subject names of SSL requests. Tor connections often populate the subject name with random-looking domains that do not actually exist, where the domains are preceded by www., have a .com or .net top-level domain, range in size from 8 to 20 characters, and are base-32 encoded. Next-generation firewalls (NGFWs) often misclassify Tor traffic due to encryption.

  • Detections are based on information from the SSL log.

  • Default severity: 7

  • Enabled and set to alert by default.

Machine learning analysis

The Alerts tab in the Detailed View for machine learning detections shows details about the ML analysis, including features that contributed to the calculation of that score.

_images/ml-overview-new.png