Categories
Solace

Streaming market data simulator for your projects

You can now subscribe to get notified of my latest posts!

This post is the first post in a series of posts I have written as part of a data analytics pipeline spanning multiple languages, databases, and environments. You can find more about the pipeline in my final post here.

Over the last few years, I have worked on several side projects in the market data domain and the most challenging part of them was getting market (pricing) data for securities. I used to use a python wrapper someone wrote on top of Yahoo! Finance but it wasn’t an official API so when Yahoo! made changes, that API stopped working. Consequently, I moved on to IEX and wrote my own python and q/kdb+ API to get data from them. However, soon, they changed their API model as well so I wrote another API to get data from IEX Cloud (github). Finally, no matter which source I used for market data, there was always a limit on how many requests you could make in free tier.

So, while there are cases where you need accurate free market data, in many use-cases, you just need some simulated market data which would allow you to build higher-level applications on top of it. Furthermore, instead of polling data and processing batches, I wanted streaming pricing data to build real-time applications.

With that in mind, I decided to build a simple application which I would provide me with Level 1 (top of the book) quotes and trades streaming data for different securities. The application is aptly named market-data-simulator, no points for creativity here. You can find the code on github.

So what does it do?

market-data-simulator publishes streaming market data for securities in JSON format to a JMS compliant event broker, such as Solace’s PubSub+ broker. Here are two published messages for AAPL and IBM:

{
"symbol":"AAPL",
"askPrice":250.3121,
"bidSize":630,
"tradeSize":180,
"exchange":"NASDAQ",
"currency":"USD",
"tradePrice":249.9996,
"askSize":140,
"bidPrice":249.6871,
"timestamp":2020-03-23T09:32:10.610764-04:00
}

{
"symbol":"IBM",
"askPrice":101.0025,
"bidSize":720,
"tradeSize":490,
"exchange":"NYSE",
"currency":"USD",
"tradePrice":100.5,
"askSize":340,
"bidPrice":99.9975,
"timestamp":2020-03-23T09:32:09.609035-04:00
}

The data is published to a topic of this structure: <assetClass>/marketData/v1/<country>/<exchange>/<name>

In the example above, data would be published to: EQ/marketData/v1/US/NASDAQ/AAPL

This specific topic hierarchy is used to take full advantage of Solace PubSub+’s rich hierarchical topics which provide strong wildcard support and advance filtering logic. You can read more about that here.

Additionally, publishing to topics, instead of queues, means that we can implement a pub-sub model where we can have multiple subscribers consuming this data.

Generating random data

I mostly focused on randomizing three values: askPrice, bidPrice, and tradePrice. Instead of publishing quotes data separately from trade data, I choose to combine the two for the sake of simplicity because each trade corresponds to available quotes at that given time and it is much easier to keep track of quotes and trades data if they are generated together.

The random prices generated by market-data-simulator are constrained by: bidPrice < tradePrice < askPrice. This logic makes sure that we don’t encounter crossed markets where bidPrice > askPrice.

All new prices are simply slightly modified versions of the previous prices to make sure we don’t get random prices that don’t make sense. For example, AAPL’s stock should not be worth $100 today and $500 tomorrow (because that would be crazy!).

I haven’t put many constraints on sizes (askSize, bidSize, and tradeSize). If that’s a requirement for your use-case, feel free to make changes.

You can find the exact logic I used to generate these random values in generatePriceAndSize method in Stock.java class.

Configurations

There are two main configurations file:

  • src/main/resources/broker.yaml
  • src/main/resources/securities.yaml

broker.yaml contains connection properties for your JMS compliant event broker. You need to populate the necessary fields with your specific broker properties. Those fields are: hostvpnuser, and password.

In my case, I am using free service on Solace Cloud which lets me quickly sping up Solace’s PubSub+ broker on AWS. Here are step-by-step instructions on how to create your own service and find connection details. This is what my sample broker.yaml file looks like:

host: <unique_host_name>.messaging.solace.cloud:55555  
vpn: <vpn_name>  
user: <username> 
pass: <password>

securities.yaml file contains useful information about securities for which you would like to generate sample market data. Usually, in companies, you would have a separate team which stores and maintains all this data for all the securities that your company is interested in. However, in our case, we need to provide this ourselves. For now, you need to provide: nameexchangeassetClasscurrencylastTradePricelastAskPrice, and lastBidPrice.

exchangeassetClass and currency data is used to provide some context about the security. Which exchange does this security primarily trade on? Which asset class (EQ, FX etc) does it belong to? Which currency are the prices quoted in? Exchange information is used to link securities to exchanges and their corresponding market hours so that we only publish data when the markets are open. For example, US equities data will only be published from 09:30am to 04:00pm. In future, I would like to add support for other asset classes as well, which is why I added the assetClass property.

If you would like to add support for a new Exchange besides the ones that are currently there by default (NYSE, NASDAQ, LSE, and SGX), you would need to create a new class for each exchange (i.e. TSX.java) and invoke super Exchange.java class with the necessary information such as namecountrytimezoneopenTime and closeTime.

lastTradePricelastAskPrice, and lastBidPrice are all used to provide baseline for random prices which will be generated. You can enter whatever values you like here but to be a bit realistic, it is recommended that you use last values for these fields. The code will generate random askPrice and bidPrice and a tradePrice that falls between those two values (no crossed markets here ).

Receiving data

Purpose of market-data-simulator is to only publish market data. However, for testing purposes, I have included code for a simple subscriber in SampleConsumer.java which will subscribe to a given topic and print out any messages being published to that topic. Keep in mind, your consumer application doesn’t have to be written in Java. You can use whichever APIs your broker supports to consume the data.

To set the topic, modify this line: 

final String TOPIC_NAME = "*/US/>";

Note: You can use Solace‘s powerful wildcards when specifying the topic which allows you to abstract away multiple levels and apply filtering logic. Here are some examples:

  • EQ/> – to subscribe to all equities
  • */*/*/*/NYSE/> – to subscribe to all NYSE securities
  • EQ/marketData/v1/US/> – to subscribe to all US equities

You can learn more about Solace’s wildcards here.

Getting Started

So how do you get started with this code? Follow these simple steps:

  1. Clone the repo locally
  2. Spin up an instance of PubSub+ broker
  3. Update broker.yaml with your connection settings
  4. [Optional] Update securities.yaml with the securities you want to publish sample data for and their corresponding last (trade/ask/bid) prices.
  5. Run Main.java and watch data flow. Note that data will only be published during market hours.

Hopefully, this application will make it easier for you to build high-level applications such as PNL and Portfolio Analysis apps.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.