Universal DB

Question: Working with historical databases (HDB) is something most people are familiar with. Data is located on disk, and different types of data are scattered across different locations on disk. For example, an equities HDB may be stored in an 'equities' directory, while an options HDB may be stored in an 'options' directory. Operating within the context of a single database is usually straightfoward and simple. All relevant tables are loaded in at the same time and all data is enumerated across the same domain, so you can go in and start writing queries immediately. Operating within the context of multiple databses, however, can be quite challenging if you do not have the proper environment set up. The reason for this is because q can only load one database at a time. This eliminates the possibility for loading different tables across different databases simultaneously out of the box. Another issue is that most HDBs are enumerated across the same enum domain called 'sym', which prevents the usage of different tables across different databases simultaneously unless the symbols are denumerated. Denumeration is an expensive operation so doing this is not feasible when working with many different tables. So, loading in different databases and writing queries that operate across different tables is not something you can do easily out of the box. We will solve these problems presented by doing the following:
 - Creating a single universal database location on disk that has access to other databases
 - Enumerating each database against a separate symbol domain (using .Q.ens)
By having a universal db (UDB, a new acronym I've coined. Please let me know if there's an existing acronym) you can load in a single database which gives access to all other databases, and lets you write queries across all database tables out of the box. The below example creates an 'equities' and 'options' HDB and populates data for a single date. Each HDB is enumerated across a different enum domain. After creating the HDBs, use filesystem commands to manually create a new HDB directory called 'udb' which aggregates the tables from 'equities' and 'options' HDBs along with their enum domains. The table names should be prefixed with the HDB name and underscore, otherwise a naming collision will happen (both tables are named 'trade'). In a real production environment, this manual process of UDB creation would be scripted and run on some reoccuring interval to keep the UDB up to date with all database tables.

Example

                                
                                q).z.zd:17 2 6
q)`:equities/2020.06.04/trade/ set .Q.ens[`:equities;;`equities]([]time:16:00:00.389756102 16:00:00.584880525;exch:`Q`Q;sym:`AAPL`MSFT;price:322.32 182.92)
`:equities/2020.06.04/trade/
q)`:options/2020.06.04/trade/ set .Q.ens[`:options;;`options]([]time:15:59:51.375 15:59:15.450n;sym:`AAPL`MSFT;expr:2020.06.05 2020.06.05;strike:325 185f;opttype:`p`p;size:10 1;price:3.4 2.52)
`:options/2020.06.04/trade/
q)\\

/ Your commands (Unix only)

q)\l udb
q)tables[]
`s#`equities_trade`options_trade
q)aj[`sym`time;select from equities_trade where date=2020.06.04;select sym,time,expr,strike,opttype,opt_size:size,opt_price:price from options_trade where date=2020.06.04]
date       time                 exch sym  price  expr       strike opttype opt_size opt_price
---------------------------------------------------------------------------------------------
2020.06.04 0D16:00:00.389756102 Q    AAPL 322.32 2020.06.05 325    p       10       3.4
2020.06.04 0D16:00:00.584880525 Q    MSFT 182.92 2020.06.05 185    p       1        2.52
                                
                            

Solution

Tags:
architecture disk tables
Searchable Tags
algorithms api architecture asynchronous c csv data structures dictionaries disk feedhandler finance functions ingestion ipc iterators machine learning math multithreading optimizations realtime shared library sql statistics streaming strings tables temporal utility websockets