Regtech SteelEye Shares Insights On Building Lexicon Based Communications Surveillance System

Communications surveillance plays a critical role in ensuring firms detect and prevent market abuse, insider trading, and other forms of misconduct. A recent update from Regtech SteelEye, a compliance technology and data analytics firm, delves into the fundamentals of building an effective lexicon-based communications surveillance system.

Lexicon-based communications surveillance involves using predefined lists of keywords and phrases to flag potentially risky or prohibited behaviors, such as market manipulation, insider dealing, or non-financial misconduct.

These systems are designed to detect specific, predictable risks, like references to sanctioned countries or offensive language, making them a cornerstone of compliance programs in financial institutions.

However, as SteelEye’s blog post points out, many firms struggle with high false positive rates and operational burdens due to poorly designed lexicons.

A well-calibrated lexicon is essential for meeting regulations like the UK and Europe’s Market Abuse Regulation (MAR), US rules from FINRA and the SEC, and requirements from regulators in Australia (ASIC), Singapore (MAS), and Hong Kong (HKMA).

Historically, lexicons have been the standard for communications surveillance, but they come with significant challenges.

Many solutions lack suggested terms, leaving compliance analysts to build lexicons from scratch, while others provide pre-loaded packages that are rigid and context-insensitive, leading to excessive false positives.

For example, a phrase like “I just heard that” could trigger an alert for insider trading but might simply refer to a colleague’s departure.

This lack of context results in keyword fatigue, where compliance teams are overwhelmed by alerts, leading to cursory reviews, bulk closures, or missed risks.

SteelEye emphasizes that the fundamental issue is the narrow scope of lexicon-based searches, which capture only a subsection of intent without understanding the broader context.

Lexicons excel at detecting high-certainty scenarios, such as explicit phrases like “push this rate higher,” and are easily extensible for new use cases, like monitoring a new department.

However, they fall short in handling nuance, coded language, sarcasm, or evolving communication styles.

They also struggle with multi-language complexity and require constant updates to stay relevant.

Firms must balance the need for comprehensive coverage with the risk of generating unmanageable alert volumes.

SteelEye’s update stresses the importance of understanding these limitations to set realistic expectations and optimize surveillance outcomes.

The foundation of a strong lexicon lies in mapping terms to specific risk scenarios.

SteelEye categorizes risks into areas like market abuse (e.g., “big order coming” for market manipulation), conduct risk (e.g., “you owe me a” for general misconduct), and operational risk (e.g., terms related to unauthorized disclosures).

Multinational firms must also account for linguistic diversity, as cultural nuances and regional slang can alter meanings.

For instance, a phrase innocent in one language may be problematic in another, requiring tailored rule sets for each language.

SteelEye outlines four levels of lexicon sophistication to help firms tailor their surveillance systems:

Basic Keyword Matching: Searches for exact phrases like “fixing the price” but risks missing variations or generating false positives (e.g., “fixing the coffee machine”).
Advanced Matching: Uses permutations, regular expressions, stemming, and fuzzy matching to capture variations like “don’t/dnt disclose” while excluding irrelevant contexts.
Applied Filters and Machine Learning: Incorporates metadata filters (e.g., communications from specific departments) and machine learning to reduce false positives by analyzing context, such as distinguishing newsletters from suspicious emails.
Intelligent Scoring with AI: AI scores alerts based on risk levels, potentially automating the closure of false positives, enhancing efficiency.

By combining a range of search terms with AI-driven context analysis, firms can cast a broader surveillance net while reducing inefficiencies.

This approach not only ensures compliance with global regulations but also sharpens competitive edges by unlocking insights from communications data.

As regulatory scrutiny intensifies, mastering lexicon-based surveillance is a critical step for financial firms to protect market integrity and avoid costly oversights.