Stratified Sampling

Question: Stratified sampling is a method of sampling that involves the division of a population to into smaller sub-groups known as strata. The size of each strata of the sample should be proportional to the size of each strata of the population. Define a function 'sfsmpl' that takes in a list or table, a column name that represents the column to base the strata on (for table data), an interval for creating strata bins (numeric strata values), and a sample size, and returns a sample of the data whose strata sizes are proportional to the strata sizes of the population. The strata can be numerical or symbols.

More Information:

https://en.wikipedia.org/wiki/Stratified_sampling

Example

                                
                                
q)d1:([]gender:1000000?`M`F;age:1+1000000?65;score:(1500+200000?901),800000?1500) 
 
// table, strata by gender values 
q)select count each age from 0!`gender xgroup sfsmpl[d1;`gender;`;1000] 
age 
--- 
500 
500 
 
// table, strata by score values 500 increment bins 
q)sfsmpl[d1;`score;500;1000] 
gender age score 
---------------- 
M      44  1623 
F      38  1703 
F      23  1578 
M      13  1561 
F      52  1910 
F      5   1814 
F      33  1645 
F      25  1719 
M      19  1501 
F      29  1853 
M      16  1958 
F      51  1698 
F      21  1895 
M      53  1525 
M      8   1761 
.. 
 
q)select count score by 500 xbar score from sfsmpl[d1;`score;500;1000] 
score| score 
-----| ----- 
0    | 267 
500  | 266 
1000 | 267 
1500 | 111 
2000 | 89 
 
// list, 500 increment bins 
q)count each group 500 xbar sfsmpl[d1`score;`;500;1000] 
1500| 111 
2000| 89 
500 | 266 
1000| 267 
0   | 267
                                
                            

Solution

Tags:
functions machine learning statistics
Searchable Tags
algorithms api architecture csv data structures dictionaries disk feedhandler finance functions ingestion ipc iterators machine learning math multithreading optimizations realtime sql statistics streaming strings tables temporal websockets

Email sent!

Email not sent