Jump to content

How to get a random sample in SQL

  odewahn1's Photo
Posted Oct 15 2009 06:36 AM

It can be a real pain to develop and test queries against really big tables. So, rather than query the entire table, you can use a simple trick to select a random sample (a smaller subset). Here's how:

select * from foo order by rand() limit sample_size

The "order by rand()" clause will (obviously) return the results in random order, while the "limit" clause will (obviously) limit the results sets you get back. If you want to get all scientific about it, you should pick a sample size that is related to the overall size of your table. There are several resources you can use, like this table of sample sizes.

Once you're query is working and you feel comfortable with the data you're getting back, then execute remove the order by and rand clauses.

0 Subscribe

1 Reply

  gavin.carothers's Photo
Posted Oct 27 2009 12:47 PM

Don't use order by RAND() on even moderately sized tables RAND will be far too expensive. Remember it's calling RAND for EVERY row in the table, not in your result set. It also has to sort ALL the random numbers generated. For other options please see http://www.titov.net...ows-from-table/