Are you tired of manually downloading historical data for your Forex trading strategies? To build truly effective backtests and AI models, you need a clean, persistent, and automated data pipeline.
This post will guide you through creating a Python script that uses the MetaTrader 5 (MT5) terminal to automatically manage a data file for the EURUSD pair. Our pipeline will handle two critical tasks: an initial bulk download and seamless daily updates.
π ️ Prerequisites
To follow this tutorial, you'll need:
MetaTrader 5 Terminal: Installed and running (even in the background).
Python: (3.8+ recommended).
Required Libraries: Install them using pip:
pip install MetaTrader5 pandas
Step 1: Connecting to the Data Source
Our first step is establishing a secure, programmatic connection to your MT5 terminal using your demo account credentials. We'll use the initialize_mt5() function to handle this cleanly.
Note: We are using 5-minute bars (mt5.TIMEFRAME_M5) and targeting a MetaQuotes Demo server for this example.
Step 2: The Core Logic: Initial Fetch vs. Daily Update
The true power of this pipeline is its ability to switch modes. When the script runs, it first checks if the target file, init_data.csv, exists.
π Mode A: Initial Data Load
If init_data.csv is not found, we assume this is the first run. We execute a bulk download of the last 80,000 bars (5-minute interval) using mt5.copy_rates_from_pos. This ensures you have a strong foundational dataset.
π Mode B: Seamless Daily Update
If init_data.csv is found, the script switches to update mode. Since you plan to run this after the market closes (or once per day), we only fetch data for the previous full trading day to avoid gaps and duplicates.
We use a helper function, get_last_trading_day_dates(), to determine the precise start and end times, and then use mt5.copy_rates_range() to pull the specific 24-hour block of data.
Step 3: The Complete Data Pipeline Script
Here is the complete, robust code. You can save this as a Python file (e.g., eurusd_pipeline.py) and set it up to run once daily via a cron job (Linux/macOS) or Task Scheduler (Windows).
π Building a Robust EURUSD Data Pipeline with Python and MetaTrader 5
Are you tired of manually downloading historical data for your Forex trading strategies? To build truly effective backtests and AI models, you need a clean, persistent, and automated data pipeline.
This post will guide you through creating a Python script that uses the MetaTrader 5 (MT5) terminal to automatically manage a data file for the EURUSD pair. Our pipeline will handle two critical tasks: an initial bulk download and seamless daily updates.
π ️ Prerequisites
To follow this tutorial, you'll need:
MetaTrader 5 Terminal: Installed and running (even in the background).
Python: (3.8+ recommended).
Required Libraries: Install them using pip:
Bashpip install MetaTrader5 pandas
Step 1: Connecting to the Data Source
Our first step is establishing a secure, programmatic connection to your MT5 terminal using your demo account credentials. We'll use the initialize_mt5() function to handle this cleanly.
Note: We are using 5-minute bars (
mt5.TIMEFRAME_M5) and targeting a MetaQuotes Demo server for this example.
import MetaTrader5 as mt5
import pandas as pd
from datetime import datetime, timedelta
import os
# --- Configuration (Use your actual demo account details) ---
account = 99805772
password = "J!RbLq6h"
server = "MetaQuotes-Demo"
SYMBOL = "EURUSD"
TIMEFRAME = mt5.TIMEFRAME_M5 # 5-minute interval
INITIAL_DATA_COUNT = 80000 # 80,000 bars for the first run
OUTPUT_FILE = "init_data.csv" # Persistent CSV file name
# ... [rest of the initialization and time functions] ...
Step 2: The Core Logic: Initial Fetch vs. Daily Update
The true power of this pipeline is its ability to switch modes. When the script runs, it first checks if the target file, init_data.csv, exists.
π Mode A: Initial Data Load
If init_data.csv is not found, we assume this is the first run. We execute a bulk download of the last 80,000 bars (5-minute interval) using mt5.copy_rates_from_pos. This ensures you have a strong foundational dataset.
if not os.path.exists(OUTPUT_FILE):
print("Performing initial fetch...")
rates = mt5.copy_rates_from_pos(SYMBOL, TIMEFRAME, 0, INITIAL_DATA_COUNT)
# ... process and save data using mode='w' (write, creates the file)
π Mode B: Seamless Daily Update
If init_data.csv is found, the script switches to update mode. Since you plan to run this after the market closes (or once per day), we only fetch data for the previous full trading day to avoid gaps and duplicates.
We use a helper function, get_last_trading_day_dates(), to determine the precise start and end times, and then use mt5.copy_rates_range() to pull the specific 24-hour block of data.
else:
print("Fetching data for the previous trading day...")
start_date, end_date = get_last_trading_day_dates()
rates = mt5.copy_rates_range(SYMBOL, TIMEFRAME, start_date, end_date)
# ... process and save data using mode='a' (append, adds to existing file)
Step 3: The Complete Data Pipeline Script
Here is the complete, robust code. You can save this as a Python file (e.g., eurusd_pipeline.py) and set it up to run once daily via a cron job (Linux/macOS) or Task Scheduler (Windows).
π Conclusion
You have successfully built an automated, self-managing data pipeline for your Forex backtesting.
On the first run, it grabs 80,000 bars of history.
On subsequent runs, it intelligently pulls only the newest day's data and appends it to your file.
Your init_data.csv file now grows automatically, providing a clean, single source of truth for your algorithmic trading research.
Ready to start building your trading strategy on top of this reliable data source?
import MetaTrader5 as mt5 import pandas as pd from datetime import datetime, timedelta import os # --- Configuration --- account = nnn password = "J!RbLq6h" server = "password" SYMBOL = "EURUSD" TIMEFRAME = mt5.TIMEFRAME_M5 DATA_COUNT = 80000 OUTPUT_FILE = "init_data.csv" RUN_LOG = "run_date.csv" # ---------------------------- # Function to get cutoff datetime # ---------------------------- def get_cutoff_datetime(): now = datetime.now() if now.weekday() == 0: # Monday target_date = (now - timedelta(days=3)).replace( hour=23, minute=59, second=59, microsecond=0 ) else: target_date = (now - timedelta(days=1)).replace( hour=23, minute=59, second=59, microsecond=0 ) return target_date # ---------------------------- # MT5 Initialization # ---------------------------- if not mt5.initialize(): print("MT5 Initialize failed:", mt5.last_error()) quit() authorized = mt5.login(account, password=password, server=server) if not authorized: print("Failed to connect to account:", mt5.last_error()) mt5.shutdown() quit() print("MT5 connection successful.") # ---------------------------- # Determine end date # ---------------------------- end_dt = get_cutoff_datetime() print("Computed end date:", end_dt) # ----------------------------------------------------------------------------- # NEW LOGIC (keeps original program intact) # ----------------------------------------------------------------------------- need_to_fetch = True date_fetch = False if os.path.exists(OUTPUT_FILE): print("init_data.csv exists; checking if end-date already recorded...") df_existing = pd.read_csv(OUTPUT_FILE) if "time" in df_existing.columns and not df_existing.empty: df_existing["time"] = pd.to_datetime(df_existing["time"], errors="coerce") df_existing = df_existing.dropna(subset=["time"]) # --- Extract date only from end_dt --- end_date_only = end_dt.date() # --- Extract date only from existing data --- existing_dates = df_existing["time"].dt.date # --- Check if that date already exists --- if end_date_only in set(existing_dates): print(f"Date {end_date_only} already exists in init_data.csv") need_to_fetch = False else: print(f"Date {end_date_only} NOT FOUND. Will fetch ONLY this date.") need_to_fetch = False date_fetch = True else: print("init_data.csv has no valid 'time' column. Fetch normally.") else: print("init_data.csv does NOT exist. Running full initialization.") # ----------------------------------------------------------------------------- # Fetch historical datafor yesterday's, only if needed # ----------------------------------------------------------------------------- if date_fetch: print("Fetching historical date:", end_dt) year = end_dt.year month = end_dt.month day = end_dt.day start_dt1 = datetime(year, month, day, 0, 0, 0) end_dt1 = datetime(year, month, day, 23, 59, 59) rates = mt5.copy_rates_range(SYMBOL, TIMEFRAME, start_dt1, end_dt1) if rates is None or len(rates) == 0: print("No data returned:", mt5.last_error()) mt5.shutdown() quit() print(f"Fetched {len(rates)} bars.") df_new = pd.DataFrame(rates) df_new["time"] = pd.to_datetime(df_new["time"], unit="s") # If file exists → append new data if os.path.exists(OUTPUT_FILE): df_existing = pd.read_csv(OUTPUT_FILE) df_existing = pd.concat([df_existing, df_new], ignore_index=True) df_existing.drop_duplicates(subset=["time"], inplace=True) df_existing.to_csv(OUTPUT_FILE, index=False) print("Appended new data to init_data.csv") else: df_new.to_csv(OUTPUT_FILE, index=False) print("Created init_data.csv") # Append run date to log file with open(RUN_LOG, "a") as f: f.write(str(datetime.now()) + "\n") print("Logged run date to run_date.csv") # ----------------------------------------------------------------------------- # Fetch historical data, only if needed # ----------------------------------------------------------------------------- if need_to_fetch: print("Fetching historical data until:", end_dt) rates = mt5.copy_rates_from(SYMBOL, TIMEFRAME, end_dt, DATA_COUNT) if rates is None or len(rates) == 0: print("No data returned:", mt5.last_error()) mt5.shutdown() quit() print(f"Fetched {len(rates)} bars.") df_new = pd.DataFrame(rates) df_new["time"] = pd.to_datetime(df_new["time"], unit="s") # If file exists → append new data if os.path.exists(OUTPUT_FILE): df_existing = pd.read_csv(OUTPUT_FILE) df_existing = pd.concat([df_existing, df_new], ignore_index=True) df_existing.drop_duplicates(subset=["time"], inplace=True) df_existing.to_csv(OUTPUT_FILE, index=False) print("Appended new data to init_data.csv") else: df_new.to_csv(OUTPUT_FILE, index=False) print("Created init_data.csv") # Append run date to log file with open(RUN_LOG, "a") as f: f.write(str(datetime.now()) + "\n") print("Logged run date to run_date.csv") else: print("Skipping MT5 download step because data already exists.") # Shutdown MT5 mt5.shutdown()

No comments:
Post a Comment