Richie R. Ma
  • Research
  • Teaching
  • Programming Packages
  • Futures Market Blogs
  • Personal

On this page

  • R
  • Python

The OFOR and I are committed to developing user-friendly functions and packages both in R and Python, to reduce the workload of market data cleaning and enhance the quality of market data to improve the research in electronic trading and market microstructure.

R

cme.mdp: Clean and Analyze Chicago Mercantile Exchange Market Data in R with Brian G. Peterson

  • The goal of cme.mdp is to clean Chicago Mercantile Exchange (CME) market data with FIX protocol more easily (pretty user-friendly) in the R environment, including but not limited to trade summaries, quote updates, and limit order book reconstruction. This package is not restricted to only the agricultural futures markets, and it could be used in energy, metal, treasury, FX, and stock index futures as well.
  • Market microstructure researchers can use this package to obtain high-quality market data that are ready to use for their research inputs. This package is user-friendly, and users tell where the data are stored and the results are in a good and easy-to-manipulate format. It can apply to any data products under FIX protocol, including Market by Price (MBP) and Market by Order (MBO). No strong prior knowledge is needed for the CME datasets.
  • Presentations: 2025 Open Source Quantitative Finance (osQF) Conference Slides
# install.packages("remotes") # if not installed
remotes::install_github("richie-ma/cme.mdp")
library(cme.mdp)

pricediscovery: Price discovery analysis under “one-security-many-markets” setting

  • The goal of pricediscovery is to conduct Hasbrouck (1995)1’s price discovery analysis easily within \(N\) markets. The current package can calculate component shares2, information shares3, and information leadership shares4
  • Price discovery characterizes a process of which new information is incorporated into markets timely and efficiently. In modern price discovery analysis, price discovery typically includes two dimensions: Timeliness and Efficiency (Informativeness). One asset can be traded in multiple venues, and their prices share an implicit common efficient price. This means that cointegrated markets should not observe persistent price deviations or arbitrage opportunities.
  • An example could be E-mini S&P 500 futures and SPY (or other S&P 500 index ETFs).
#install.packages("remotes")
remotes::install_github("richie-ma/pricediscovery")
library(pricediscovery)

Python

cmemdp: Clean and Analyze Chicago Mercantile Exchange Market Data in Python

  • The Python package cmemdp is inspired by the R package cme.mdp. The cmemdp covers almost all features in that package and it also includes other important functions, e.g., CME Packet Capture (PCAP) data cleaning.
  • Market microstructure researcher can rely on the PCAP data parser to obtain huge amount of market data not only in a single futures market, where both MBP and MBO data are included. This is a cost-efficient way to acquire more data to support possible cross-market analyses, such as soybean complex. No strong prior knowledge is needed for the PCAP data.
from cmemdp.cme_parser import cme_parser_datamine
example = cme_parser_datamine(
    path="R:/_RawData/PCAP/20250420-PCAP_318_0_0_0_e",
    max_read_packets=None, cme_header=True,
    save_file_path="R:/_RawData/PCAP/", disable_progress_bar=False, chunk_size=5000)

Footnotes

  1. Hasbrouck, J. 1995. “One Security, Many Markets: Determining the Contributions to Price Discovery.” Journal of Finance 50: 1175–99.↩︎

  2. Gonzalo, J., and C. Granger. 1995. “Estimation of Common Long-Memory Components in Cointegrated Systems.” Journal of Business and Economic Statistics 13: 27–35.↩︎

  3. Hasbrouck, J. 1995. “One Security, Many Markets: Determining the Contributions to Price Discovery.” Journal of Finance 50: 1175–99.↩︎

  4. Putniņš, T. J. 2013. “What Do Price Discovery Metrics Really Measure?” Journal of Empirical Finance 23: 68–83.↩︎