VeritySignals:
A Multi-Factor Framework for Extracting
Actionable Insider Trading Signals
Methodology, signal classification, and performance observations from a systematic analysis of SEC Form 4 filings.
Vasileios Koutsokostas · VeritySignals · veritysignals.com
Not financial advice. All data sourced from publicly available SEC EDGAR filings.
Abstract
This paper describes the methodology underlying VeritySignals, a platform designed to extract high-conviction investment signals from U.S. Securities and Exchange Commission (SEC) Form 4 insider trading disclosures. Corporate insiders (executives, directors, and major shareholders) are legally required to disclose open-market transactions within two business days, producing a continuous, public stream of data on how company principals allocate personal capital.
The challenge is not data availability but signal quality. The majority of Form 4 filings represent non-discretionary events (option grants, compensatory shares, pre-scheduled 10b5-1 plan sales) with no informational value. VeritySignals applies a multi-stage pipeline to isolate discretionary, open-market transactions and scores each surviving signal across more than fifteen independent factors, including transaction size, insider seniority, cluster activity, and the insider's historical predictive accuracy. Signals are classified into four strength tiers: VERY_STRONG, STRONG, MODERATE, and WEAK.
Internal observations suggest that top-tier signals from insiders with strong historical track records substantially outperform a passive SPY benchmark over three- and six-month horizons. This paper documents the framework, data sources, scoring methodology, architectural design, and key limitations of the system.
1. Introduction
The efficient market hypothesis in its semi-strong form holds that all publicly available information is already reflected in asset prices. Insider trading data presents an interesting empirical challenge to this claim. While Form 4 filings are unambiguously public (they are filed electronically with the SEC and immediately available on the EDGAR database), the information they contain is not uniformly incorporated into market prices in a timely fashion.
Three structural factors explain this lag. First, the volume of Form 4 filings is enormous: tens of thousands are filed each month, the vast majority of which carry no informational content. The signal-to-noise ratio is low enough that most market participants rationally ignore the feed entirely. Second, extracting the signal requires cross-referencing transaction metadata (role, history, prior purchases in the same security) that is not available in any single filing. Third, academic research suggests the market underreacts to genuine insider purchase signals in particular, especially among smaller-capitalization companies with lower institutional coverage.
VeritySignals was designed to address precisely this gap: to ingest the complete Form 4 stream, eliminate the noise, and surface the transactions most likely to represent genuine informational advantages held by company insiders.
This whitepaper documents the platform's complete analytical framework, covering data ingestion through signal classification, with sufficient detail to evaluate its methodology while protecting proprietary implementation specifics.
2. Background Research on Insider Trading
2.1 The Academic Foundation
The academic literature on legal insider trading is well-established and largely consistent in its conclusions. Corporate insiders do, on average, earn excess returns on their open-market stock purchases. The seminal studies include:
- Jaffe (1974)— One of the first systematic studies demonstrating insider trading profitability, finding positive abnormal returns following intensive insider buying.
- Seyhun (1986)— Established that the market does not fully and immediately incorporate insider trading disclosures, leaving a window for informed investors to act.
- Lakonishok & Lee (2001)— Demonstrated that insider purchases predict future stock returns significantly better than insider sales, and that smaller companies show stronger effects.
- Cohen, Malloy & Pomorski (2012)— Introduced the critical distinction between "routine" and "opportunistic" insider trades. Routine trades (part of a recurring annual pattern) contain no incremental information; opportunistic trades (deviating from the insider's own historical pattern) generate strong abnormal returns.
- Jeng, Metrick & Zeckhauser (2003)— Estimated insider purchase portfolios earn approximately 6% per year in abnormal returns, with the effect concentrated in small-cap stocks.
2.2 What the Research Tells Us About Signal Quality
The literature converges on several actionable insights that inform VeritySignals' filtering approach:
- Purchases matter far more than sales.Insiders sell for many reasons unrelated to their view on the stock (diversification, tax planning, estate planning, personal liquidity). Purchases, by contrast, have fewer innocent explanations. An executive spending personal capital on their own company's stock is a meaningful signal.
- Role matters.C-suite executives (CEO, CFO, President) possess broader informational advantages than outside directors or 10% beneficial owners. Purchases by high-seniority insiders carry greater predictive weight.
- Cluster activity amplifies the signal.When multiple independent insiders buy the same stock within a short window, the probability of coincidental timing decreases and the informational content increases. Several insiders independently concluding that their company is undervalued is more informative than a single insider acting.
- Trade size relative to wealth matters.A $50,000 purchase by an executive holding $2M in company stock is a more meaningful commitment than the same purchase by an executive holding $50M.
- Historical track records are predictive.Insiders who have historically been correct, measured by actual price returns following their prior purchases, are more likely to be correct again. This is the most powerful single predictor in VeritySignals' scoring system.
2.3 The Noise Problem
Despite the academic evidence, most retail investors find insider data difficult to use in practice. The primary obstacle is noise. Of the approximately 20,000–30,000 Form 4 filings per month, the majority fall into categories with no informational value:
- Option grants and exercises (compensatory, not discretionary)
- Restricted stock vesting and release events
- 10b5-1 plan transactions (pre-scheduled, non-discretionary sales)
- Derivative instrument transactions
- Obligatory disclosure of ownership changes due to corporate actions (mergers, splits)
VeritySignals addresses this by filtering to open-market purchases and sales only (transaction codes P and S in Form 4 nomenclature), and applying additional logic to identify and exclude 10b5-1 plan disclosures where indicated.
3. Data Sources and Collection
3.1 SEC EDGAR Form 4 Filings
The primary data source for VeritySignals is the U.S. Securities and Exchange Commission's Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system. Under Section 16(a) of the Securities Exchange Act of 1934, corporate insiders, defined as officers, directors, and beneficial owners of more than 10% of a class of registered equity security, are required to file Form 4 reports within two business days of each transaction.
EDGAR provides two access mechanisms that VeritySignals uses in combination: a real-time RSS feed for new filings and a full-text search API for historical data access. The platform's data ingestion component continuously monitors the RSS feed, downloading new Form 4 filings as they are published and queuing them for parsing.
3.2 Data Ingestion Pipeline
Each Form 4 filing is an XML document containing structured transaction data. The ingestion pipeline processes these documents through the following stages:
- Deduplication:SHA-256 content hashing prevents duplicate processing of the same filing, which can occur when filings are amended or re-submitted.
- XML Parsing:The parser extracts all structured fields from the XML schema including transaction type, date, shares, price, post-transaction ownership, and filer metadata.
- Role Extraction:Filing metadata includes the reporting owner's title and boolean flags indicating whether they are an officer, director, or 10% owner. These fields are preserved for downstream scoring.
- Storage:Parsed data is stored in a dedicated scraper database, isolated from the application database to prevent write contention.
As of this writing, the platform has processed over four million individual insider trades spanning multiple years of historical EDGAR data.
3.3 Market Data
Signal scoring and performance measurement require accurate historical price data. VeritySignals maintains a local cache of end-of-day stock prices covering over 3,400 tickers from 1990 to present, comprising more than eleven million price records. This local cache eliminates repeated external API calls during signal generation and backtesting and ensures consistent pricing across all calculations.
Fundamental financial data, including market capitalization, P/E ratio, price-to-book ratio, and free cash flow yield, is sourced from a third-party financial data provider and refreshed periodically. Earnings calendar data (historical EPS estimates vs. actuals) is used to compute earnings beat rates for qualifying companies.
4. Signal Generation Framework
4.1 Transaction Filtering
The first stage of signal generation is aggressive filtering. The raw Form 4 stream is reduced to open-market transactions only, defined as:
- Transaction code P(open-market purchase) or S(open-market sale)
- Non-derivative securities only (common stock, not options or warrants)
- Excluding transactions flagged as part of a 10b5-1 pre-arranged trading plan, where disclosure is available
After filtering, each retained transaction represents a discretionary decision by a corporate insider to buy or sell equity in their own company's stock at prevailing market prices.
4.2 Cluster Construction
A key design principle of VeritySignals is that signals are generated at the cluster levelrather than the individual trade level. A cluster is defined as the set of all qualifying transactions in the same security and direction (purchase or sale) made by any insider within a rolling 30-day window.
This approach reflects the academic finding that coordinated insider activity is more informative than isolated trades. A cluster aggregates the following attributes:
- All individual transactions contributing to the cluster
- Total dollar value (sum of shares × price per share across all trades)
- Count of distinct insiders participating
- The highest-seniority insider in the cluster (designated the "primary" insider)
- For the primary insider: trade size relative to existing holding, ownership structure (direct vs. indirect), and historical performance record
The rolling 30-day window (rather than calendar-month bucketing) ensures that trades spanning month boundaries (for example, trades on December 30th and January 2nd) are correctly recognized as part of the same cluster.
4.3 Enrichment
Before scoring, each cluster is enriched with contextual data assembled from multiple sources:
- Insider performance history:The platform computes, for each insider, their historical win rate (fraction of prior purchases followed by positive 3-month returns) and alpha (excess return over the S&P 500 index over the same period).
- 52-week price context:The stock's proximity to its 52-week low at the time of the trade, as a proxy for relative valuation.
- Fundamental data:Where available, P/E ratio, P/B ratio, free cash flow yield, and sector classification.
- Earnings history:The company's historical EPS beat rate and the proximity of the trade to the next earnings announcement.
- Prior purchase history:Whether the primary insider has purchased this security in the prior 12 months (to flag new vs. repeat positions).
- Consecutive purchase pattern:Whether the insider has made repeated purchases in the same security over the recent period.
All enrichment queries use an as-of dateparameter to prevent look-ahead bias in historical analysis. When generating signals for a given date, only information that would have been available on that date is used in scoring.
5. Signal Classification Methodology
5.1 Multi-Factor Scoring
Each enriched cluster is scored using a registry of independent scoring factors. Each factor evaluates a specific attribute of the cluster and returns an incremental score contribution or null (no contribution). The final signal score is the sum of all contributing factors.
The scoring architecture is designed to be modular and independently testable. Factors can be added, removed, or reweighted without modifying the core pipeline. The current framework applies more than fifteen factors, organized into the following categories:
| Category | Factors | Signal Type |
|---|---|---|
| Transaction Scale | Dollar value of the cluster; % of insider's existing holdings traded | Both |
| Cluster Activity | Number of distinct insiders participating in the cluster | Both |
| Insider Seniority | Primary insider role (CEO, CFO, President, COO, Officer, Director) | Both |
| Executive Concentration | Count of C-suite executives (CEO, CFO, President, COO) in the same cluster | Both |
| Track Record | Primary insider's historical win rate and alpha on prior purchases | Purchase |
| Price Context | Stock proximity to 52-week low at time of filing | Purchase |
| Ownership Structure | Direct account vs. indirect (trust, LLC, family account) | Both |
| Filing Timeliness | Days between transaction and SEC filing | Both |
| Company Size | Market capitalization tier (micro, small, mid, large, mega) | Both |
| Fundamental Value | P/E ratio, P/B ratio, free cash flow yield relative to thresholds | Purchase |
| Behavioral Pattern | Consecutive purchases; first position vs. repeat buyer | Purchase |
| Earnings Context | Proximity to earnings announcement; historical EPS beat rate | Both |
5.2 Strength Tier Classification
The aggregate score is mapped to one of four discrete strength tiers. The tier boundaries are calibrated to produce a distribution in which VERY_STRONG signals represent genuine outliers: transactions where multiple independent factors align simultaneously to indicate exceptional insider conviction.
| Tier | Description | Typical Characteristics |
|---|---|---|
| VERY_STRONG | Top-tier signal with multiple factors aligned | High dollar value, C-suite buyer, clustering, near 52w low, strong track record |
| STRONG | Above-average signal with several factors present | Significant value, senior insider, or moderate clustering |
| MODERATE | Some positive indicators, below high-conviction threshold | Meaningful trade, standard role, limited corroboration |
| WEAK | Minimal score contribution | Small trade, low-seniority insider, no corroborating factors |
5.3 Track Record Scoring in Detail
Internal backtesting consistently identifies insider track record as the single most predictive factor. The platform maintains a running computation of each insider's historical win rate, defined as the fraction of their open-market purchases followed by positive stock returns over a three-month horizon, and their alpha, the excess return over the S&P 500 index over the same period.
Insiders who demonstrate a sustained pattern of predictively accurate purchases (subject to a minimum number of prior trades to ensure statistical validity) receive a substantial incremental score boost. Internal observations suggest that filtering signals to only those from insiders with an elite track record (meeting both a minimum win rate threshold and a positive alpha criterion) increases backtest win rates from a baseline of approximately 58% to over 80% at the three-month horizon.
5.4 Look-Ahead Bias Controls
Financial signal systems are prone to look-ahead bias: the inadvertent use of future information when evaluating historical signals. VeritySignals mitigates this through a strict as-of date parameter that propagates through all enrichment queries. When generating or evaluating a signal for date D:
- Insider performance statistics are computed using only trades filed on or before date D
- 52-week low calculations use only prices available on or before date D
- Fundamental data uses the most recent available snapshot before date D
- No forward-looking information (post-D returns, post-D filings) is accessible to any scoring factor
This architecture enables robust historical backtesting without the overfitting bias that invalidates many quantitative signal systems.
6. System Architecture
6.1 Component Overview
VeritySignals is composed of two independently deployed systems that communicate through a shared database boundary:
- Data Acquisition Layer (Python):Continuously fetches, parses, and stores Form 4 filings from SEC EDGAR into an isolated scraper database. This component runs independently of the application layer and is designed for reliability under intermittent network conditions.
- Application Layer (Node.js / TypeScript):Consumes raw trade data from the scraper database, runs the signal generation pipeline, serves the web application, and manages user accounts and subscriptions.
The dual-database architecture provides write isolation: the high-volume scraper writes do not contend with application reads, and this allows each component to be scaled, deployed, or modified independently.
6.2 Signal Generation Pipeline
The signal generation pipeline runs on a scheduled cadence and produces signals through the following stages:
- Trade Query:Raw trades from the scraper database are retrieved for the target date window.
- Same-Day Aggregation:Multiple trades by the same insider on the same day are combined into a single record to reduce noise.
- Cluster Construction:Trades are grouped into clusters by security and direction using the rolling window algorithm described in Section 4.
- Parallel Enrichment:Cluster enrichment (performance history, price data, fundamentals, earnings, behavioral patterns) is performed in parallel batches to minimize latency.
- Scoring:Each cluster is evaluated against all registered scoring factors.
- Upsert:Scored signals are written to the application database using idempotent upsert operations, ensuring that re-running the pipeline for the same date window produces consistent results.
- Notification:Newly generated high-conviction signals are checked against user watchlists, triggering in-app and external notifications as appropriate.
6.3 Performance Tracking
Signal performance is computed asynchronously against actual market prices. For each signal, the system calculates returns at one-day, one-week, one-month, three-month, six-month, and one-year horizons from the signal date, and computes the equivalent S&P 500 return over each identical period (alpha). These figures are updated as price data becomes available and are displayed on signal and actor detail pages.
Individual insider performance statistics, including win rate, average return, and average alpha, are aggregated from historical signal performance into a materialized database view that is refreshed periodically. This view is consumed by the track record scorer during signal generation, creating a closed feedback loop between observed historical performance and future signal scoring.
6.4 Backtest Engine
The platform includes a strategy backtesting engine that allows users to define custom insider filtering criteria and simulate portfolio performance over historical periods. The backtesting system supports:
- Custom entry criteria (signal strength, insider roles, cluster size, dollar thresholds, ownership percentages)
- Configurable exit strategies (fixed hold period, stop-loss, take-profit, tiered partial exit)
- Portfolio simulation (equal-weight, fixed-dollar, or percentage-based position sizing)
- Transaction cost modeling (commission and slippage baked into the simulation)
- Liquidity constraints (maximum position size as a percentage of average daily volume)
The backtest engine uses the same look-ahead-bias-free data as the live signal engine: historical signals are only scored using information that was available at the time they were generated. Performance metrics include total return, CAGR, Sharpe ratio, Sortino ratio, Calmar ratio, maximum drawdown, win rate, and average hold period.
7. Historical Signal Observations
The following observations are drawn from internal analysis of signals generated by the platform over historical data. They are presented for informational purposes. Past performance is not indicative of future results. All figures are based on internal backtesting and have not been independently audited.
7.1 Signal Volume and Distribution
Of the total Form 4 filings processed, fewer than 5% survive the initial filtering pass to become scored signals. This figure is consistent with academic estimates of the fraction of insider transactions that represent genuine discretionary activity. The distribution of surviving signals across strength tiers follows an approximately power-law distribution, with WEAK signals representing the large majority and VERY_STRONG signals comprising a small fraction of the total.
7.2 Win Rate by Signal Strength
Purchase signals classified as VERY_STRONG demonstrate substantially higher three-month win rates than weaker signals in historical data. The gap between top-tier and bottom-tier signal win rates is statistically meaningful and persistent across different time windows and market regimes. Sale signals, while included in the platform, show lower predictive consistency, consistent with the academic literature on insider sales.
7.3 Impact of Track Record Filter
The single most impactful filtering decision in the scoring system is the requirement for an elite track record. Restricting signals to those originating from insiders who have historically demonstrated strong predictive accuracy (as measured by both win rate and alpha thresholds, computed as of the signal date) increases backtested win rates dramatically compared to unfiltered baselines. This observation is consistent with the Cohen, Malloy & Pomorski (2012) finding that opportunistic insider trades, those deviating from the insider's historical pattern, carry outsized informational value.
7.4 Cluster Amplification
Signals involving multiple insiders purchasing within the same 30-day window outperform single-insider signals on both win rate and average return metrics in historical analysis. The effect is roughly monotonic with cluster size: larger clusters (three or more insiders) show stronger performance than two-insider clusters. This is consistent with the interpretation that coordinated buying reflects independent informational assessments converging on the same conclusion.
7.5 Market Capitalization Effects
Consistent with the academic literature (Lakonishok & Lee 2001; Jeng, Metrick & Zeckhauser 2003), insider signals in smaller-capitalization companies show larger average returns in historical data, albeit with higher volatility. Signals in large-capitalization companies show more modest average returns but lower dispersion. The platform's scoring system incorporates a market cap adjustment factor to account for this size effect.
8. Limitations and Risk Considerations
8.1 This Is Not Financial Advice
VeritySignals provides analytical tools and data-driven signals for informational purposes only. Nothing on the platform constitutes investment advice, a recommendation to buy or sell any security, or a solicitation of any investment. All investment decisions involve risk, including the potential loss of principal.
8.2 Signal Decay and Market Adaptation
Financial signals that become widely known tend to weaken over time as market participants arbitrage away the excess return. If insider trading signals are widely acted upon by a large number of investors simultaneously, the informational edge may compress. The platform's signals are based on publicly available data and are, in principle, replicable by any market participant with sufficient data infrastructure.
8.3 Data Quality and Completeness
SEC Form 4 data, while systematically collected, contains errors and inconsistencies. Insiders occasionally file amendments correcting prior disclosures. Titles and role designations are self-reported and inconsistently formatted across filers. The platform applies normalization logic to handle common variations, but edge cases and misclassifications occur.
Fundamental data coverage is currently incomplete. A majority of signals involve companies for which the platform has not yet loaded fundamental metrics, limiting the applicability of value-based scoring factors to a subset of signals.
8.4 Look-Ahead Bias Residual Risk
While the platform implements systematic look-ahead bias controls, residual bias risk exists in any system that uses historical data for model calibration. The scoring factor thresholds and tier boundaries were developed and refined using historical data, introducing a form of indirect look-ahead bias. Results observed in in-sample historical periods should be interpreted with appropriate skepticism.
8.5 Execution and Liquidity Risk
Backtested signals assume execution at next-day open prices with configurable slippage estimates. In practice, actual execution prices may differ from modeled prices, particularly for smaller-capitalization stocks with limited liquidity. Backtested results do not account for the market impact of position sizes large relative to average daily volume.
8.6 Regulatory and Legal Considerations
Trading on the basis of material non-public information is illegal. VeritySignals operates exclusively on publicly disclosed Form 4 filings. However, users bear sole responsibility for ensuring that their use of the platform and its signals complies with all applicable securities laws and regulations in their jurisdiction.
8.7 Black Swan and Regime Risk
Insider signals, like all quantitative signals, are calibrated on historical data from specific market regimes. Unusual market conditions, including financial crises, pandemics, regulatory changes, or structural breaks in market microstructure, may invalidate historical relationships. No signal system provides protection against extreme outlier events.
9. Conclusion
VeritySignals represents a systematic attempt to extract actionable investment signals from one of the most consistently documented sources of market alpha in the empirical literature: corporate insider open-market purchases. The platform's methodology is grounded in decades of academic research identifying the conditions under which insider trades are most likely to be informative.
The core insight is simple: most insider data is noise. The market is awash in Form 4 filings representing compensation events, pre-arranged sales, and routine disclosures with no informational content. The signal extraction problem is therefore primarily a filtering problem: removing the noise to surface the small fraction of transactions that represent genuine insider conviction.
VeritySignals addresses this through a multi-stage pipeline: aggressive transaction filtering, cluster construction to capture coordinated activity, multi-factor conviction scoring, and a feedback loop that weights historical track records. The result is a signal classification system calibrated to identify transactions where multiple independent indicators simultaneously suggest that a company's management believes the stock is undervalued.
Internal observations suggest the approach produces meaningfully differentiated signal quality across strength tiers, with top-tier signals demonstrating substantially higher historical win rates than the unfiltered baseline. The track record factor, in particular, shows strong predictive power consistent with academic findings on opportunistic insider trading.
The platform is a living system: scoring factors are continuously refined based on observed performance, new data sources are incorporated as they become available, and the backtesting infrastructure allows systematic evaluation of methodology changes before deployment. The goal is not a static model but an iterative analytical framework that improves with data.
Insider data has always been public. The problem was never access. It was quality. VeritySignals exists to solve the quality problem at scale.
References
- Cohen, L., Malloy, C., & Pomorski, L. (2012). Decoding Inside Information. The Journal of Finance, 67(3), 1009–1043.
- Jaffe, J.F. (1974). Special Information and Insider Trading. The Journal of Business, 47(3), 410–428.
- Jeng, L.A., Metrick, A., & Zeckhauser, R. (2003). Estimating the Returns to Insider Trading: A Performance-Evaluation Perspective. Review of Economics and Statistics, 85(2), 453–471.
- Lakonishok, J., & Lee, I. (2001). Are Insider Trades Informative? The Review of Financial Studies, 14(1), 79–111.
- Seyhun, H.N. (1986). Insiders' Profits, Costs of Trading, and Market Efficiency. Journal of Financial Economics, 16(2), 189–212.
- U.S. Securities and Exchange Commission. Form 4 — Statement of Changes in Beneficial Ownership. EDGAR, sec.gov/cgi-bin/browse-edgar.
© 2026 VeritySignals · All rights reserved · Not financial advice · All data sourced from public SEC EDGAR filings · Past performance of insider signals does not guarantee future results.