Are you tired of manually downloading historical data for your Forex trading strategies? To build truly effective backtests and AI models, you need a clean, persistent, and automated data pipeline.
This post will guide you through creating a Python script that uses the MetaTrader 5 (MT5) terminal to automatically manage a data file for the EURUSD pair. Our pipeline will handle two critical tasks: an initial bulk download and seamless daily updates.
🛠️ Prerequisites
To follow this tutorial, you'll need:
MetaTrader 5 Terminal: Installed and running (even in the background).
Python: (3.8+ recommended).
Required Libraries: Install them using pip:
pip install MetaTrader5 pandas
Step 1: Connecting to the Data Source
Our first step is establishing a secure, programmatic connection to your MT5 terminal using your demo account credentials. We'll use the initialize_mt5() function to handle this cleanly.
Note: We are using 5-minute bars (mt5.TIMEFRAME_M5) and targeting a MetaQuotes Demo server for this example.
Step 2: The Core Logic: Initial Fetch vs. Daily Update
The true power of this pipeline is its ability to switch modes. When the script runs, it first checks if the target file, init_data.csv, exists.
🚀 Mode A: Initial Data Load
If init_data.csv is not found, we assume this is the first run. We execute a bulk download of the last 80,000 bars (5-minute interval) using mt5.copy_rates_from_pos. This ensures you have a strong foundational dataset.
🔄 Mode B: Seamless Daily Update
If init_data.csv is found, the script switches to update mode. Since you plan to run this after the market closes (or once per day), we only fetch data for the previous full trading day to avoid gaps and duplicates.
We use a helper function, get_last_trading_day_dates(), to determine the precise start and end times, and then use mt5.copy_rates_range() to pull the specific 24-hour block of data.
Step 3: The Complete Data Pipeline Script
Here is the complete, robust code. You can save this as a Python file (e.g., eurusd_pipeline.py) and set it up to run once daily via a cron job (Linux/macOS) or Task Scheduler (Windows).
📊 Building a Robust EURUSD Data Pipeline with Python and MetaTrader 5
Are you tired of manually downloading historical data for your Forex trading strategies? To build truly effective backtests and AI models, you need a clean, persistent, and automated data pipeline.
This post will guide you through creating a Python script that uses the MetaTrader 5 (MT5) terminal to automatically manage a data file for the EURUSD pair. Our pipeline will handle two critical tasks: an initial bulk download and seamless daily updates.
🛠️ Prerequisites
To follow this tutorial, you'll need:
MetaTrader 5 Terminal: Installed and running (even in the background).
Python: (3.8+ recommended).
Required Libraries: Install them using pip:
Step 1: Connecting to the Data Source
Our first step is establishing a secure, programmatic connection to your MT5 terminal using your demo account credentials. We'll use the initialize_mt5() function to handle this cleanly.
Note: We are using 5-minute bars (mt5.TIMEFRAME_M5) and targeting a MetaQuotes Demo server for this example.
Step 2: The Core Logic: Initial Fetch vs. Daily Update
The true power of this pipeline is its ability to switch modes. When the script runs, it first checks if the target file, init_data.csv, exists.
🚀 Mode A: Initial Data Load
If init_data.csv is not found, we assume this is the first run. We execute a bulk download of the last 80,000 bars (5-minute interval) using mt5.copy_rates_from_pos. This ensures you have a strong foundational dataset.
🔄 Mode B: Seamless Daily Update
If init_data.csv is found, the script switches to update mode. Since you plan to run this after the market closes (or once per day), we only fetch data for the previous full trading day to avoid gaps and duplicates.
We use a helper function, get_last_trading_day_dates(), to determine the precise start and end times, and then use mt5.copy_rates_range() to pull the specific 24-hour block of data.
Step 3: The Complete Data Pipeline Script
Here is the complete, robust code. You can save this as a Python file (e.g., eurusd_pipeline.py) and set it up to run once daily via a cron job (Linux/macOS) or Task Scheduler (Windows).
Check it out the end of this post. I made some changes.
🚀 Conclusion
You have successfully built an automated, self-managing data pipeline for your Forex backtesting.
On the first run, it grabs 80,000 bars of history.
On subsequent runs, it intelligently pulls only the newest day's data and appends it to your file.
Your init_data.csv file now grows automatically, providing a clean, single source of truth for your algorithmic trading research.
Ready to start building your trading strategy on top of this reliable data source?
import MetaTrader5 as mt5
import pandas as pd
from datetime import datetime, timedelta
import os
# --- Configuration ---
account = nnn
password = "J!RbLq6h"
server = "password"
SYMBOL = "EURUSD"
TIMEFRAME = mt5.TIMEFRAME_M5
DATA_COUNT = 80000
OUTPUT_FILE = "init_data.csv"
RUN_LOG = "run_date.csv"
# ----------------------------
# Function to get cutoff datetime
# ----------------------------
def get_cutoff_datetime():
now = datetime.now()
if now.weekday() == 0: # Monday
target_date = (now - timedelta(days=3)).replace(
hour=23, minute=59, second=59, microsecond=0
)
else:
target_date = (now - timedelta(days=1)).replace(
hour=23, minute=59, second=59, microsecond=0
)
return target_date
# ----------------------------
# MT5 Initialization
# ----------------------------
if not mt5.initialize():
print("MT5 Initialize failed:", mt5.last_error())
quit()
authorized = mt5.login(account, password=password, server=server)
if not authorized:
print("Failed to connect to account:", mt5.last_error())
mt5.shutdown()
quit()
print("MT5 connection successful.")
# ----------------------------
# Determine end date
# ----------------------------
end_dt = get_cutoff_datetime()
print("Computed end date:", end_dt)
# -----------------------------------------------------------------------------
# NEW LOGIC (keeps original program intact)
# -----------------------------------------------------------------------------
need_to_fetch = True
date_fetch = False
if os.path.exists(OUTPUT_FILE):
print("init_data.csv exists; checking if end-date already recorded...")
df_existing = pd.read_csv(OUTPUT_FILE)
if "time" in df_existing.columns and not df_existing.empty:
df_existing["time"] = pd.to_datetime(df_existing["time"], errors="coerce")
df_existing = df_existing.dropna(subset=["time"])
# --- Extract date only from end_dt ---
end_date_only = end_dt.date()
# --- Extract date only from existing data ---
existing_dates = df_existing["time"].dt.date
# --- Check if that date already exists ---
if end_date_only in set(existing_dates):
print(f"Date {end_date_only} already exists in init_data.csv")
need_to_fetch = False
else:
print(f"Date {end_date_only} NOT FOUND. Will fetch ONLY this date.")
need_to_fetch = False
date_fetch = True
else:
print("init_data.csv has no valid 'time' column. Fetch normally.")
else:
print("init_data.csv does NOT exist. Running full initialization.")
# -----------------------------------------------------------------------------
# Fetch historical datafor yesterday's, only if needed
# -----------------------------------------------------------------------------
if date_fetch:
print("Fetching historical date:", end_dt)
year = end_dt.year
month = end_dt.month
day = end_dt.day
start_dt1 = datetime(year, month, day, 0, 0, 0)
end_dt1 = datetime(year, month, day, 23, 59, 59)
rates = mt5.copy_rates_range(SYMBOL, TIMEFRAME, start_dt1, end_dt1)
if rates is None or len(rates) == 0:
print("No data returned:", mt5.last_error())
mt5.shutdown()
quit()
print(f"Fetched {len(rates)} bars.")
df_new = pd.DataFrame(rates)
df_new["time"] = pd.to_datetime(df_new["time"], unit="s")
# If file exists → append new data
if os.path.exists(OUTPUT_FILE):
df_existing = pd.read_csv(OUTPUT_FILE)
df_existing = pd.concat([df_existing, df_new], ignore_index=True)
df_existing.drop_duplicates(subset=["time"], inplace=True)
df_existing.to_csv(OUTPUT_FILE, index=False)
print("Appended new data to init_data.csv")
else:
df_new.to_csv(OUTPUT_FILE, index=False)
print("Created init_data.csv")
# Append run date to log file
with open(RUN_LOG, "a") as f:
f.write(str(datetime.now()) + "\n")
print("Logged run date to run_date.csv")
# -----------------------------------------------------------------------------
# Fetch historical data, only if needed
# -----------------------------------------------------------------------------
if need_to_fetch:
print("Fetching historical data until:", end_dt)
rates = mt5.copy_rates_from(SYMBOL, TIMEFRAME, end_dt, DATA_COUNT)
if rates is None or len(rates) == 0:
print("No data returned:", mt5.last_error())
mt5.shutdown()
quit()
print(f"Fetched {len(rates)} bars.")
df_new = pd.DataFrame(rates)
df_new["time"] = pd.to_datetime(df_new["time"], unit="s")
# If file exists → append new data
if os.path.exists(OUTPUT_FILE):
df_existing = pd.read_csv(OUTPUT_FILE)
df_existing = pd.concat([df_existing, df_new], ignore_index=True)
df_existing.drop_duplicates(subset=["time"], inplace=True)
df_existing.to_csv(OUTPUT_FILE, index=False)
print("Appended new data to init_data.csv")
else:
df_new.to_csv(OUTPUT_FILE, index=False)
print("Created init_data.csv")
# Append run date to log file
with open(RUN_LOG, "a") as f:
f.write(str(datetime.now()) + "\n")
print("Logged run date to run_date.csv")
else:
print("Skipping MT5 download step because data already exists.")
# Shutdown MT5
mt5.shutdown()