Sunday, December 7, 2025

πŸ“Š Building a Robust EURUSD Data Pipeline with Python and MetaTrader 5

 Are you tired of manually downloading historical data for your Forex trading strategies? To build truly effective backtests and AI models, you need a clean, persistent, and automated data pipeline.

This post will guide you through creating a Python script that uses the MetaTrader 5 (MT5) terminal to automatically manage a data file for the EURUSD pair. Our pipeline will handle two critical tasks: an initial bulk download and seamless daily updates.




πŸ› ️ Prerequisites

To follow this tutorial, you'll need:

  1. MetaTrader 5 Terminal: Installed and running (even in the background).

  2. Python: (3.8+ recommended).

  3. Required Libraries: Install them using pip:

pip install MetaTrader5 pandas

Step 1: Connecting to the Data Source

Our first step is establishing a secure, programmatic connection to your MT5 terminal using your demo account credentials. We'll use the initialize_mt5() function to handle this cleanly.

Note: We are using 5-minute bars (mt5.TIMEFRAME_M5) and targeting a MetaQuotes Demo server for this example.

Step 2: The Core Logic: Initial Fetch vs. Daily Update

The true power of this pipeline is its ability to switch modes. When the script runs, it first checks if the target file, init_data.csv, exists.

πŸš€ Mode A: Initial Data Load

If init_data.csv is not found, we assume this is the first run. We execute a bulk download of the last 80,000 bars (5-minute interval) using mt5.copy_rates_from_pos. This ensures you have a strong foundational dataset.

πŸ”„ Mode B: Seamless Daily Update

If init_data.csv is found, the script switches to update mode. Since you plan to run this after the market closes (or once per day), we only fetch data for the previous full trading day to avoid gaps and duplicates.

We use a helper function, get_last_trading_day_dates(), to determine the precise start and end times, and then use mt5.copy_rates_range() to pull the specific 24-hour block of data.

Step 3: The Complete Data Pipeline Script

Here is the complete, robust code. You can save this as a Python file (e.g., eurusd_pipeline.py) and set it up to run once daily via a cron job (Linux/macOS) or Task Scheduler (Windows).

πŸ“Š Building a Robust EURUSD Data Pipeline with Python and MetaTrader 5

Are you tired of manually downloading historical data for your Forex trading strategies? To build truly effective backtests and AI models, you need a clean, persistent, and automated data pipeline.

This post will guide you through creating a Python script that uses the MetaTrader 5 (MT5) terminal to automatically manage a data file for the EURUSD pair. Our pipeline will handle two critical tasks: an initial bulk download and seamless daily updates.


πŸ› ️ Prerequisites

To follow this tutorial, you'll need:

  1. MetaTrader 5 Terminal: Installed and running (even in the background).

  2. Python: (3.8+ recommended).

  3. Required Libraries: Install them using pip:

    Bash
    pip install MetaTrader5 pandas
    

Step 1: Connecting to the Data Source

Our first step is establishing a secure, programmatic connection to your MT5 terminal using your demo account credentials. We'll use the initialize_mt5() function to handle this cleanly.

Note: We are using 5-minute bars (mt5.TIMEFRAME_M5) and targeting a MetaQuotes Demo server for this example.

Python
import MetaTrader5 as mt5
import pandas as pd
from datetime import datetime, timedelta
import os

# --- Configuration (Use your actual demo account details) ---
account = 99805772
password = "J!RbLq6h"
server = "MetaQuotes-Demo"

SYMBOL = "EURUSD"
TIMEFRAME = mt5.TIMEFRAME_M5      # 5-minute interval
INITIAL_DATA_COUNT = 80000       # 80,000 bars for the first run
OUTPUT_FILE = "init_data.csv"    # Persistent CSV file name

# ... [rest of the initialization and time functions] ...

Step 2: The Core Logic: Initial Fetch vs. Daily Update

The true power of this pipeline is its ability to switch modes. When the script runs, it first checks if the target file, init_data.csv, exists.

πŸš€ Mode A: Initial Data Load

If init_data.csv is not found, we assume this is the first run. We execute a bulk download of the last 80,000 bars (5-minute interval) using mt5.copy_rates_from_pos. This ensures you have a strong foundational dataset.

# If the file does NOT exist:
if not os.path.exists(OUTPUT_FILE):
    print("Performing initial fetch...")
    rates = mt5.copy_rates_from_pos(SYMBOL, TIMEFRAME, 0, INITIAL_DATA_COUNT)
    # ... process and save data using mode='w' (write, creates the file)

πŸ”„ Mode B: Seamless Daily Update

If init_data.csv is found, the script switches to update mode. Since you plan to run this after the market closes (or once per day), we only fetch data for the previous full trading day to avoid gaps and duplicates.

We use a helper function, get_last_trading_day_dates(), to determine the precise start and end times, and then use mt5.copy_rates_range() to pull the specific 24-hour block of data.

# If the file EXISTS:
else:
    print("Fetching data for the previous trading day...")
    start_date, end_date = get_last_trading_day_dates()
    rates = mt5.copy_rates_range(SYMBOL, TIMEFRAME, start_date, end_date)
    # ... process and save data using mode='a' (append, adds to existing file)

Step 3: The Complete Data Pipeline Script

Here is the complete, robust code. You can save this as a Python file (e.g., eurusd_pipeline.py) and set it up to run once daily via a cron job (Linux/macOS) or Task Scheduler (Windows).

Check it out the end of this post. I made some changes.

πŸš€ Conclusion

You have successfully built an automated, self-managing data pipeline for your Forex backtesting.

  • On the first run, it grabs 80,000 bars of history.

  • On subsequent runs, it intelligently pulls only the newest day's data and appends it to your file.

Your init_data.csv file now grows automatically, providing a clean, single source of truth for your algorithmic trading research.

Ready to start building your trading strategy on top of this reliable data source?

 

import MetaTrader5 as mt5
import pandas as pd
from datetime import datetime, timedelta
import os

# --- Configuration ---
account = nnn
password = "J!RbLq6h"
server = "password"

SYMBOL = "EURUSD"
TIMEFRAME = mt5.TIMEFRAME_M5
DATA_COUNT = 80000
OUTPUT_FILE = "init_data.csv"
RUN_LOG = "run_date.csv"


# ----------------------------
# Function to get cutoff datetime
# ----------------------------
def get_cutoff_datetime():
    now = datetime.now()

    if now.weekday() == 0:   # Monday
        target_date = (now - timedelta(days=3)).replace(
            hour=23, minute=59, second=59, microsecond=0
        )
    else:
        target_date = (now - timedelta(days=1)).replace(
            hour=23, minute=59, second=59, microsecond=0
        )
    return target_date


# ----------------------------
# MT5 Initialization
# ----------------------------
if not mt5.initialize():
    print("MT5 Initialize failed:", mt5.last_error())
    quit()

authorized = mt5.login(account, password=password, server=server)
if not authorized:
    print("Failed to connect to account:", mt5.last_error())
    mt5.shutdown()
    quit()

print("MT5 connection successful.")


# ----------------------------
# Determine end date
# ----------------------------
end_dt = get_cutoff_datetime()
print("Computed end date:", end_dt)


# -----------------------------------------------------------------------------
# NEW LOGIC (keeps original program intact)
# -----------------------------------------------------------------------------

need_to_fetch = True
date_fetch = False

if os.path.exists(OUTPUT_FILE):
    print("init_data.csv exists; checking if end-date already recorded...")

    df_existing = pd.read_csv(OUTPUT_FILE)

    if "time" in df_existing.columns and not df_existing.empty:
        df_existing["time"] = pd.to_datetime(df_existing["time"], errors="coerce")
        df_existing = df_existing.dropna(subset=["time"])

        # --- Extract date only from end_dt ---
        end_date_only = end_dt.date()

        # --- Extract date only from existing data ---
        existing_dates = df_existing["time"].dt.date

        # --- Check if that date already exists ---
        if end_date_only in set(existing_dates):
            print(f"Date {end_date_only} already exists in init_data.csv")
            need_to_fetch = False
        else:
            print(f"Date {end_date_only} NOT FOUND. Will fetch ONLY this date.")
            need_to_fetch = False
            date_fetch = True
    else:
        print("init_data.csv has no valid 'time' column. Fetch normally.")
else:
    print("init_data.csv does NOT exist. Running full initialization.")

# -----------------------------------------------------------------------------
# Fetch historical datafor yesterday's, only if needed
# -----------------------------------------------------------------------------
if date_fetch:
    print("Fetching historical date:", end_dt)

    year  = end_dt.year
    month = end_dt.month
    day   = end_dt.day

    start_dt1 = datetime(year, month, day, 0, 0, 0)
    end_dt1   = datetime(year, month, day, 23, 59, 59)
    
    rates = mt5.copy_rates_range(SYMBOL, TIMEFRAME, start_dt1, end_dt1)
    if rates is None or len(rates) == 0:
        print("No data returned:", mt5.last_error())
        mt5.shutdown()
        quit()
    print(f"Fetched {len(rates)} bars.")

    df_new = pd.DataFrame(rates)
    df_new["time"] = pd.to_datetime(df_new["time"], unit="s")

    # If file exists → append new data
    if os.path.exists(OUTPUT_FILE):
        df_existing = pd.read_csv(OUTPUT_FILE)
        df_existing = pd.concat([df_existing, df_new], ignore_index=True)
        df_existing.drop_duplicates(subset=["time"], inplace=True)
        df_existing.to_csv(OUTPUT_FILE, index=False)
        print("Appended new data to init_data.csv")
    else:
        df_new.to_csv(OUTPUT_FILE, index=False)
        print("Created init_data.csv")

    # Append run date to log file
    with open(RUN_LOG, "a") as f:
        f.write(str(datetime.now()) + "\n")

    print("Logged run date to run_date.csv")

        
# -----------------------------------------------------------------------------
# Fetch historical data, only if needed
# -----------------------------------------------------------------------------
if need_to_fetch:
    print("Fetching historical data until:", end_dt)

    rates = mt5.copy_rates_from(SYMBOL, TIMEFRAME, end_dt, DATA_COUNT)

    if rates is None or len(rates) == 0:
        print("No data returned:", mt5.last_error())
        mt5.shutdown()
        quit()

    print(f"Fetched {len(rates)} bars.")

    df_new = pd.DataFrame(rates)
    df_new["time"] = pd.to_datetime(df_new["time"], unit="s")

    # If file exists → append new data
    if os.path.exists(OUTPUT_FILE):
        df_existing = pd.read_csv(OUTPUT_FILE)
        df_existing = pd.concat([df_existing, df_new], ignore_index=True)
        df_existing.drop_duplicates(subset=["time"], inplace=True)
        df_existing.to_csv(OUTPUT_FILE, index=False)
        print("Appended new data to init_data.csv")
    else:
        df_new.to_csv(OUTPUT_FILE, index=False)
        print("Created init_data.csv")

    # Append run date to log file
    with open(RUN_LOG, "a") as f:
        f.write(str(datetime.now()) + "\n")

    print("Logged run date to run_date.csv")

else:
    print("Skipping MT5 download step because data already exists.")

# Shutdown MT5
mt5.shutdown()

No comments:

Post a Comment

Generating RSI, MACD, and Stochastic Indicators in Python Using Pandas

I know I’ve covered parts of this before, but this time I decided to start fresh with a new project. What I’ve built so far is a data pipeli...