Tuesday, December 23, 2025

A Rule-Based Multi-Indicator Trading Strategy Built for Machine Learning

This post presents a rule-based forex trading strategy using Stochastic Oscillator, RSI, MACD, and EMA-200.
On its own, the strategy delivers around a 50% win/loss ratio (sometimes slightly less). That is intentional.

The real objective is not immediate profitability, but to collect large volumes of structured, labeled trading data that can be used to train machine learning (ML) and AI models capable of identifying higher-probability winning trades.


1. Strategy Philosophy: Rules as a Data Generator

Most trading systems are judged solely by win rate.
This one is judged by data quality.

The strategy is designed to:

  • Systematically trigger trades under clear, repeatable conditions

  • Capture both winning and losing outcomes

  • Cover as many market scenarios as possible

  • Produce balanced, unbiased datasets

A ~50% win rate is acceptable and even desirable at this stage, because:

  • It avoids skewed labels

  • It forces the ML model to learn real distinctions

  • It reduces overfitting risk

In ML-driven trading, coverage and consistency matter more than raw performance.


2. Why a 50% Strategy Is Valuable for ML

A rule-based strategy that wins half the time creates:

  • Clean decision boundaries

  • Equal exposure to success and failure

  • Honest representations of market behavior

This allows an ML/AI model to learn:

When does this setup work — and when should it be ignored?

With sufficient data and proper training, the model can learn to filter out low-quality trades and identify conditions with a higher probability of success.


3. Indicators Used

The strategy combines four core indicators, each modeling a different market dimension:

  • Stochastic Oscillator – entry timing

  • RSI – momentum bias

  • MACD – trend confirmation

  • EMA-200 – higher-timeframe trend filter

Each indicator is encoded as a discrete state, making the system deterministic, explainable, and ML-friendly.


4. Indicator Rules and State Encoding

A. Stochastic Oscillator (x)

SELL (x = 1):

  • %K > 80

  • %D > 80

  • %K crosses below %D

  • EMA-200 > Close Price

BUY (x = 2):

  • %K < 20

  • %D < 20

  • %K crosses above %D

  • EMA-200 < Close Price

Stochastic provides timing, not trend prediction.


B. RSI (y)

  • SELL bias (y = 1): RSI < 50

  • BUY bias (y = 2): RSI > 50

RSI defines momentum alignment.


C. MACD (z)

  • SELL (z = 1): MACD line crosses below Signal line

  • BUY (z = 2): MACD line crosses above Signal line

MACD confirms momentum transition.


5. Entry Logic: Controlled and Repeatable

Trades are entered only when all indicators align.

BUY Entry

x = 2 AND y = 2 AND z = 2

SELL Entry

x = 1 AND y = 1 AND z = 1

This strict confluence ensures clear trade intent and produces clean training samples.


6. Exit Logic: Outcome Labeling Over Optimization

Exits are not optimized for maximum profit.
They are designed for consistent, unambiguous outcome labeling.

Definitions

  • b → number of 5-minute candles after entry

  • a → trade direction (1 = sell, 2 = buy)

  • c / e → current profit or loss

  • k, j → counters for consecutive losses

Exits are triggered by:

  • Time in trade

  • Profit/loss behavior

  • Drawdown and loss-streak protection

This ensures:

  • Trades close within predictable windows

  • Outcomes are well defined

  • Labels remain reliable for ML training


7. The Role of ML / AI in This System

The rule-based layer:

  • Generates structure

  • Captures intent

  • Labels reality honestly

The ML/AI layer:

  • Learns patterns the rules cannot express

  • Identifies market contexts where the strategy performs better

  • Filters low-probability trades

  • Estimates win probability instead of blindly executing rules

When done properly — with enough data, correct labeling, and strict validation — an ML/AI model can learn to predict which trade setups are more likely to win, even if the underlying rule set itself has only a ~50% win rate.

The edge does not come from the rules alone.
It comes from the model’s ability to discriminate.


8. Final Thoughts

This strategy is intentionally imperfect.

Its purpose is to:

  • Be consistent, not clever

  • Capture every meaningful scenario

  • Produce massive, diverse datasets

  • Serve as a foundation for AI-driven decision making

A rule-based system that wins 50% of the time but records everything correctly is far more valuable than an over-optimized strategy that collapses outside backtests.

The rules collect the data.
The AI finds the edge.

Sunday, December 14, 2025

Generating RSI, MACD, and Stochastic Indicators in Python Using Pandas

I know I’ve covered parts of this before, but this time I decided to start fresh with a new project. What I’ve built so far is a data pipeline designed specifically for a machine learning–based trading system. The main reason for creating this pipeline is a limitation in MT5: it can only provide up to 80,000 rows of historical data, which is nowhere near enough for training a reliable machine learning model. For my use case, I need at least 2 million historical records to properly train and validate the model.

In this post, I’ll walk you through how I compute three essential technical indicators—RSI, MACD, and the Stochastic Oscillator—which form the foundation of my training dataset. In upcoming posts, I’ll dive into the trading strategy itself. I’ve already implemented and backtested the system, and the results are very promising. That said, I still plan to add more features and refinements so readers can learn from each step. Who knows—maybe following this blog might even help someone build a path toward becoming a billionaire.

Technical indicators are the backbone of most trading strategies. In this post, I’ll walk through how to generate three widely used indicators — RSI, MACD, and the Stochastic Oscillator — using pure Python and Pandas, starting from raw OHLC price data.

This approach is lightweight, transparent, and ideal for backtesting, signal generation, or machine learning feature engineering.


📊 Prerequisites

Your input data must contain the following columns:

  • time

  • open

  • high

  • low

  • close

The data is loaded from a CSV file (init_data.csv) and processed into a new dataset (training_data.csv) with the computed indicators.


🔹 Relative Strength Index (RSI)

RSI measures momentum by comparing recent gains and losses. It oscillates between 0 and 100 and is commonly used to identify overbought and oversold conditions.

Formula logic:

  • Compute price differences

  • Separate gains and losses

  • Calculate rolling averages

  • Convert to RSI scale

def calculate_rsi(data, period=14): delta = data['close'].diff() gain = delta.where(delta > 0, 0) loss = -delta.where(delta < 0, 0) avg_gain = gain.rolling(window=period).mean() avg_loss = loss.rolling(window=period).mean() rs = avg_gain / avg_loss rsi = 100 - (100 / (1 + rs)) return rsi

🔹 Moving Average Convergence Divergence (MACD)

MACD is a trend-following momentum indicator based on Exponential Moving Averages (EMAs).

  • MACD Line = EMA(12) − EMA(26)

  • Signal Line = EMA(9) of MACD

def calculate_macd(data, short_period=12, long_period=26, signal_period=9): short_ema = data['close'].ewm(span=short_period, adjust=False).mean() long_ema = data['close'].ewm(span=long_period, adjust=False).mean() macd_line = short_ema - long_ema signal_line = macd_line.ewm(span=signal_period, adjust=False).mean() return macd_line, signal_line

MACD crossovers are commonly used to detect trend reversals and momentum shifts.


🔹 Stochastic Oscillator

The Stochastic Oscillator compares the current close to the recent price range.

  • %K shows the current position within the range

  • %D is a moving average of %K

def calculate_stochastic(data, k_period=14, d_period=3): low_min = data['low'].rolling(window=k_period).min() high_max = data['high'].rolling(window=k_period).max() percent_k = ((data['close'] - low_min) / (high_max - low_min)) * 100 percent_d = percent_k.rolling(window=d_period).mean() return percent_k, percent_d

Values above 80 typically indicate overbought conditions, while values below 20 suggest oversold levels.


🔄 Updating the Dataset with Indicators

All indicators are computed and appended to the dataset in a single function. Rolling calculations naturally produce NaN values, which are removed afterward.

def update_data(data): required_cols = ['time', 'open', 'high', 'low', 'close'] for col in required_cols: if col not in data.columns: raise Exception(f"Missing required column: {col}") data['RSI'] = calculate_rsi(data) data['MACD_Line'], data['Signal_Line'] = calculate_macd(data) data['Stoch_K'], data['Stoch_D'] = calculate_stochastic(data) data.dropna(inplace=True) return data

🚀 Final Output

The main program:

  1. Loads init_data.csv

  2. Computes RSI, MACD, and Stochastic

  3. Saves the enriched dataset as training_data.csv

This output can be used for:

  • Strategy backtesting

  • Signal detection

  • Machine learning model training

  • Trade analytics


🧠 Final Thoughts

Generating indicators manually gives you full control and transparency over your trading logic. It also helps avoid black-box dependencies and makes your system easier to debug and extend.

In future posts, I’ll build on this foundation by:

  • Combining indicators into trading signals

  • Adding backtesting logic

  • Preparing features for machine learning models

If you’re building your own trading system, this is a solid place to start.

Friday, December 12, 2025

The Myth of Guaranteed ML Profits in Forex Trading

 The idea is seductive: What if you could build a machine learning (ML) system that guarantees winning in forex trading? In theory, such a breakthrough would change not only personal wealth but the structure of global financial markets. In reality, however, markets are far more complex, adaptive, and unforgiving.

This article breaks down what would theoretically happen, why guarantees don’t exist, and what is actually achievable—and extremely valuable—when ML is used correctly in forex trading.





Theoretically: What Would Happen If a Guaranteed System Existed

If a forex ML system were truly provably consistent, the consequences would be dramatic.

You Would Outperform Banks and Hedge Funds

Most large financial institutions already deploy advanced ML, AI, and quantitative models. A guaranteed system would outperform even these players, giving its owner an unprecedented edge over banks, hedge funds, and market makers.

Compounding Would Make You Extremely Wealthy

Even without spectacular win rates, compounding does the heavy lifting. A modest, consistent edge—say a 55–60% win rate with solid risk–reward (RR)—can grow capital exponentially over time.

Liquidity Becomes Your Enemy

Success creates its own problem. As position sizes grow, your trades start to move the market. Slippage increases, fills worsen, and the very edge that made you profitable begins to decay.

Brokers and Regulators Take Notice

Unusual consistency triggers attention. Accounts may be flagged, spreads may widen, execution quality may degrade, or regulatory scrutiny may increase. Markets do not reward anomalies for long.


Reality Check: Why “Guaranteed” Does Not Exist

Forex markets are structurally hostile to certainty.

The Nature of Forex Markets

  • Non-stationary – Patterns change over time

  • Reflexive – Traders influence the very markets they trade

  • Noise-dominated – Randomness overwhelms short-term signals

  • Exposed to black swans – Wars, central bank shocks, flash crashes

The Limits of Machine Learning

  • ML learns historical correlations, not future truths

  • Models overfit easily to past data

  • Performance collapses when market regimes change (rate cycles, crises)

Even the most advanced firms—Renaissance Technologies, Citadel, JPMorgan—do not possess guaranteed models. They operate on probabilistic edges, not certainty.


What Is Actually Achievable (And Extremely Valuable)

The real holy grail is not perfection—it is robust expectancy.

A small, repeatable statistical edge combined with strict risk control.

A Realistic Example

  • Win rate: 52–58%

  • Risk–Reward: 1:1.5 or higher

  • Risk per trade: 0.25–1%

  • Drawdown: Strictly capped

This approach alone already outperforms over 95% of retail traders.


ML’s Best Role in Forex (Where It Actually Works)

Machine learning excels as a support system, not a crystal ball.

1. Regime Detection

  • Trending vs. ranging markets

  • High vs. low volatility environments

  • News- and event-sensitive periods

2. Trade Filtering

  • Identifying when not to trade

  • Avoiding low-quality, low-probability setups

3. Position Sizing and Risk Control

  • Dynamic position sizing

  • Volatility-adjusted stop losses

4. Ensemble Decision Systems

  • Combining ML with rules-based strategies

  • Using confidence scoring, not absolute buy/sell predictions


A Practical Path for Serious Traders

If you have a technical background, the correct path is disciplined and structured:

  1. Define a rule-based strategy first

  2. Use ML to:

    • Improve entries and exits

    • Filter poor trades

  3. Backtest rigorously across:

    • Multiple currency pairs

    • Multiple years

    • Multiple market regimes

  4. Forward test (paper trading → small capital → gradual scaling)

  5. Expect months of drawdowns, even with a valid edge

This is how professionals build systems that last.


The Brutal Truth

If someone genuinely possessed a guaranteed-winning forex ML system, they would:

  • Not sell it

  • Not advertise it

  • Not trade retail-sized accounts

  • Use it quietly, with strict capital limits

Sunday, December 7, 2025

📊 Building a Robust EURUSD Data Pipeline with Python and MetaTrader 5

 Are you tired of manually downloading historical data for your Forex trading strategies? To build truly effective backtests and AI models, you need a clean, persistent, and automated data pipeline.

This post will guide you through creating a Python script that uses the MetaTrader 5 (MT5) terminal to automatically manage a data file for the EURUSD pair. Our pipeline will handle two critical tasks: an initial bulk download and seamless daily updates.




🛠️ Prerequisites

To follow this tutorial, you'll need:

  1. MetaTrader 5 Terminal: Installed and running (even in the background).

  2. Python: (3.8+ recommended).

  3. Required Libraries: Install them using pip:

pip install MetaTrader5 pandas

Step 1: Connecting to the Data Source

Our first step is establishing a secure, programmatic connection to your MT5 terminal using your demo account credentials. We'll use the initialize_mt5() function to handle this cleanly.

Note: We are using 5-minute bars (mt5.TIMEFRAME_M5) and targeting a MetaQuotes Demo server for this example.

Step 2: The Core Logic: Initial Fetch vs. Daily Update

The true power of this pipeline is its ability to switch modes. When the script runs, it first checks if the target file, init_data.csv, exists.

🚀 Mode A: Initial Data Load

If init_data.csv is not found, we assume this is the first run. We execute a bulk download of the last 80,000 bars (5-minute interval) using mt5.copy_rates_from_pos. This ensures you have a strong foundational dataset.

🔄 Mode B: Seamless Daily Update

If init_data.csv is found, the script switches to update mode. Since you plan to run this after the market closes (or once per day), we only fetch data for the previous full trading day to avoid gaps and duplicates.

We use a helper function, get_last_trading_day_dates(), to determine the precise start and end times, and then use mt5.copy_rates_range() to pull the specific 24-hour block of data.

Step 3: The Complete Data Pipeline Script

Here is the complete, robust code. You can save this as a Python file (e.g., eurusd_pipeline.py) and set it up to run once daily via a cron job (Linux/macOS) or Task Scheduler (Windows).

📊 Building a Robust EURUSD Data Pipeline with Python and MetaTrader 5

Are you tired of manually downloading historical data for your Forex trading strategies? To build truly effective backtests and AI models, you need a clean, persistent, and automated data pipeline.

This post will guide you through creating a Python script that uses the MetaTrader 5 (MT5) terminal to automatically manage a data file for the EURUSD pair. Our pipeline will handle two critical tasks: an initial bulk download and seamless daily updates.


🛠️ Prerequisites

To follow this tutorial, you'll need:

  1. MetaTrader 5 Terminal: Installed and running (even in the background).

  2. Python: (3.8+ recommended).

  3. Required Libraries: Install them using pip:

    Bash
    pip install MetaTrader5 pandas
    

Step 1: Connecting to the Data Source

Our first step is establishing a secure, programmatic connection to your MT5 terminal using your demo account credentials. We'll use the initialize_mt5() function to handle this cleanly.

Note: We are using 5-minute bars (mt5.TIMEFRAME_M5) and targeting a MetaQuotes Demo server for this example.

Python
import MetaTrader5 as mt5
import pandas as pd
from datetime import datetime, timedelta
import os

# --- Configuration (Use your actual demo account details) ---
account = 99805772
password = "J!RbLq6h"
server = "MetaQuotes-Demo"

SYMBOL = "EURUSD"
TIMEFRAME = mt5.TIMEFRAME_M5      # 5-minute interval
INITIAL_DATA_COUNT = 80000       # 80,000 bars for the first run
OUTPUT_FILE = "init_data.csv"    # Persistent CSV file name

# ... [rest of the initialization and time functions] ...

Step 2: The Core Logic: Initial Fetch vs. Daily Update

The true power of this pipeline is its ability to switch modes. When the script runs, it first checks if the target file, init_data.csv, exists.

🚀 Mode A: Initial Data Load

If init_data.csv is not found, we assume this is the first run. We execute a bulk download of the last 80,000 bars (5-minute interval) using mt5.copy_rates_from_pos. This ensures you have a strong foundational dataset.

# If the file does NOT exist:
if not os.path.exists(OUTPUT_FILE):
    print("Performing initial fetch...")
    rates = mt5.copy_rates_from_pos(SYMBOL, TIMEFRAME, 0, INITIAL_DATA_COUNT)
    # ... process and save data using mode='w' (write, creates the file)

🔄 Mode B: Seamless Daily Update

If init_data.csv is found, the script switches to update mode. Since you plan to run this after the market closes (or once per day), we only fetch data for the previous full trading day to avoid gaps and duplicates.

We use a helper function, get_last_trading_day_dates(), to determine the precise start and end times, and then use mt5.copy_rates_range() to pull the specific 24-hour block of data.

# If the file EXISTS:
else:
    print("Fetching data for the previous trading day...")
    start_date, end_date = get_last_trading_day_dates()
    rates = mt5.copy_rates_range(SYMBOL, TIMEFRAME, start_date, end_date)
    # ... process and save data using mode='a' (append, adds to existing file)

Step 3: The Complete Data Pipeline Script

Here is the complete, robust code. You can save this as a Python file (e.g., eurusd_pipeline.py) and set it up to run once daily via a cron job (Linux/macOS) or Task Scheduler (Windows).

Check it out the end of this post. I made some changes.

🚀 Conclusion

You have successfully built an automated, self-managing data pipeline for your Forex backtesting.

  • On the first run, it grabs 80,000 bars of history.

  • On subsequent runs, it intelligently pulls only the newest day's data and appends it to your file.

Your init_data.csv file now grows automatically, providing a clean, single source of truth for your algorithmic trading research.

Ready to start building your trading strategy on top of this reliable data source?

 

import MetaTrader5 as mt5
import pandas as pd
from datetime import datetime, timedelta
import os

# --- Configuration ---
account = nnn
password = "J!RbLq6h"
server = "password"

SYMBOL = "EURUSD"
TIMEFRAME = mt5.TIMEFRAME_M5
DATA_COUNT = 80000
OUTPUT_FILE = "init_data.csv"
RUN_LOG = "run_date.csv"


# ----------------------------
# Function to get cutoff datetime
# ----------------------------
def get_cutoff_datetime():
    now = datetime.now()

    if now.weekday() == 0:   # Monday
        target_date = (now - timedelta(days=3)).replace(
            hour=23, minute=59, second=59, microsecond=0
        )
    else:
        target_date = (now - timedelta(days=1)).replace(
            hour=23, minute=59, second=59, microsecond=0
        )
    return target_date


# ----------------------------
# MT5 Initialization
# ----------------------------
if not mt5.initialize():
    print("MT5 Initialize failed:", mt5.last_error())
    quit()

authorized = mt5.login(account, password=password, server=server)
if not authorized:
    print("Failed to connect to account:", mt5.last_error())
    mt5.shutdown()
    quit()

print("MT5 connection successful.")


# ----------------------------
# Determine end date
# ----------------------------
end_dt = get_cutoff_datetime()
print("Computed end date:", end_dt)


# -----------------------------------------------------------------------------
# NEW LOGIC (keeps original program intact)
# -----------------------------------------------------------------------------

need_to_fetch = True
date_fetch = False

if os.path.exists(OUTPUT_FILE):
    print("init_data.csv exists; checking if end-date already recorded...")

    df_existing = pd.read_csv(OUTPUT_FILE)

    if "time" in df_existing.columns and not df_existing.empty:
        df_existing["time"] = pd.to_datetime(df_existing["time"], errors="coerce")
        df_existing = df_existing.dropna(subset=["time"])

        # --- Extract date only from end_dt ---
        end_date_only = end_dt.date()

        # --- Extract date only from existing data ---
        existing_dates = df_existing["time"].dt.date

        # --- Check if that date already exists ---
        if end_date_only in set(existing_dates):
            print(f"Date {end_date_only} already exists in init_data.csv")
            need_to_fetch = False
        else:
            print(f"Date {end_date_only} NOT FOUND. Will fetch ONLY this date.")
            need_to_fetch = False
            date_fetch = True
    else:
        print("init_data.csv has no valid 'time' column. Fetch normally.")
else:
    print("init_data.csv does NOT exist. Running full initialization.")

# -----------------------------------------------------------------------------
# Fetch historical datafor yesterday's, only if needed
# -----------------------------------------------------------------------------
if date_fetch:
    print("Fetching historical date:", end_dt)

    year  = end_dt.year
    month = end_dt.month
    day   = end_dt.day

    start_dt1 = datetime(year, month, day, 0, 0, 0)
    end_dt1   = datetime(year, month, day, 23, 59, 59)
    
    rates = mt5.copy_rates_range(SYMBOL, TIMEFRAME, start_dt1, end_dt1)
    if rates is None or len(rates) == 0:
        print("No data returned:", mt5.last_error())
        mt5.shutdown()
        quit()
    print(f"Fetched {len(rates)} bars.")

    df_new = pd.DataFrame(rates)
    df_new["time"] = pd.to_datetime(df_new["time"], unit="s")

    # If file exists → append new data
    if os.path.exists(OUTPUT_FILE):
        df_existing = pd.read_csv(OUTPUT_FILE)
        df_existing = pd.concat([df_existing, df_new], ignore_index=True)
        df_existing.drop_duplicates(subset=["time"], inplace=True)
        df_existing.to_csv(OUTPUT_FILE, index=False)
        print("Appended new data to init_data.csv")
    else:
        df_new.to_csv(OUTPUT_FILE, index=False)
        print("Created init_data.csv")

    # Append run date to log file
    with open(RUN_LOG, "a") as f:
        f.write(str(datetime.now()) + "\n")

    print("Logged run date to run_date.csv")

        
# -----------------------------------------------------------------------------
# Fetch historical data, only if needed
# -----------------------------------------------------------------------------
if need_to_fetch:
    print("Fetching historical data until:", end_dt)

    rates = mt5.copy_rates_from(SYMBOL, TIMEFRAME, end_dt, DATA_COUNT)

    if rates is None or len(rates) == 0:
        print("No data returned:", mt5.last_error())
        mt5.shutdown()
        quit()

    print(f"Fetched {len(rates)} bars.")

    df_new = pd.DataFrame(rates)
    df_new["time"] = pd.to_datetime(df_new["time"], unit="s")

    # If file exists → append new data
    if os.path.exists(OUTPUT_FILE):
        df_existing = pd.read_csv(OUTPUT_FILE)
        df_existing = pd.concat([df_existing, df_new], ignore_index=True)
        df_existing.drop_duplicates(subset=["time"], inplace=True)
        df_existing.to_csv(OUTPUT_FILE, index=False)
        print("Appended new data to init_data.csv")
    else:
        df_new.to_csv(OUTPUT_FILE, index=False)
        print("Created init_data.csv")

    # Append run date to log file
    with open(RUN_LOG, "a") as f:
        f.write(str(datetime.now()) + "\n")

    print("Logged run date to run_date.csv")

else:
    print("Skipping MT5 download step because data already exists.")

# Shutdown MT5
mt5.shutdown()

A Rule-Based Multi-Indicator Trading Strategy Built for Machine Learning

This post presents a rule-based forex trading strategy using Stochastic Oscillator, RSI, MACD, and EMA-200 . On its own, the strategy deli...