Generate Random Rows¶
Let's say you have a database of customers and you initiated a contest wherein you determine the winners by randomly selecting 5 customers from your database. How would you go about selecting 5 random records from your customers table?
NEWID() System Function To The Rescue¶
In SQL Server, it is quite easy to do this thanks to the NEWID() system function. The NEWID() system function creates a unique value of type uniqueidentifier. There's no need to add a new column to your table just to have the ability of randomly selecting records from your table. All that needs to be done is include the NEWID() system function in the ORDER BY clause when doing your SELECT statement.
Let's use the Customers table in the Northwind database. To randomly SELECT 5 customers from the table, we simply have to add the ORDER BY NEWID() in the SELECT statement. Since we are only concerned with 5 records, we also have to include the TOP 5 clause in the statement. The SELECT statement will look as follows:
SELECT TOP 5 [CustomerID], [CompanyName], [ContactName]
FROM [dbo].[Customers]
ORDER BY NEWID()
Running this SELECT statement generated the following 5 random Customer records (please note that the results shown here will not match your results even if the same query has been executed):
CustomerID CompanyName ContactName
---------- ---------------------------------------- ------------------------------
WANDK Die Wandernde Kuh Rita Müller
BERGS Berglunds snabbköp Christina Berglund
ROMEY Romero y tomillo Alejandra Camino
OTTIK Ottilies Käseladen Henriette Pfalzheim
AROUT Around the Horn Thomas Hardy
Running the same query again now yields me the following output, keeping in mind that the results shown here will again not be the same results you will get due to the randomness of the NEWID() system function.
CustomerID CompanyName ContactName
---------- ---------------------------------------- ------------------------------
SIMOB Simons bistro Jytte Petersen
QUEDE Que Delícia Bernardo Batista
RICAR Ricardo Adocicados Janete Limeira
VINET Vins et alcools Chevalier Paul Henriot
LAMAI La maison d'Asie Annette Roulet
RAND() System Function - Will It Work?¶
The randomness of the result set generated by the previous SELECT statement was due to the NEWID() system function. There's another system function that generates a random value when executed, which is the RAND() system function. If we substitute the NEWID() with the RAND() system function, will it generate random records similar to the above?
Let's put that question to the test by modifying the SELECT statement above and use RAND() instead of NEWID().
SELECT TOP 5 [CustomerID], [CompanyName], [ContactName]
FROM [dbo].[Customers]
ORDER BY RAND()
Running the above SELECT statement generated the following result set:
CustomerID CompanyName ContactName
---------- ---------------------------------------- ------------------------------
ALFKI Alfreds Futterkiste Maria Anders
ANATR Ana Trujillo Emparedados y helados Ana Trujillo
ANTON Antonio Moreno Taquería Antonio Moreno
AROUT Around the Horn Thomas Hardy
BERGS Berglunds snabbköp Christina Berglund
Running it over and over again generated the same result set. If the RAND() system function is supposed to generate random numbers when executed, how come the result set generated are all the same? To investigate this further, let's include the result of the RAND() system function as part of the result set.
SELECT TOP 5 RAND() AS [Random], [CustomerID], [CompanyName], [ContactName]
FROM [dbo].[Customers]
ORDER BY RAND()
Executing this SELECT statement generated the following result set (please note again that the results shown here will be different from the results you will get, particularly to the first column).
Random CustomerID CompanyName ContactName
------------------- ---------- ---------------------------------- ------------------
0.73381230186299884 ALFKI Alfreds Futterkiste Maria Anders
0.73381230186299884 ANATR Ana Trujillo Emparedados y helados Ana Trujillo
0.73381230186299884 ANTON Antonio Moreno Taquería Antonio Moreno
0.73381230186299884 AROUT Around the Horn Thomas Hardy
0.73381230186299884 BERGS Berglunds snabbköp Christina Berglund
As can be seen from the output, the output generated by the RAND() system function are all the same. If the RAND() system function is supposed to generate random values, how come it's generating the same value when executed in a SELECT statement?
The answer lies in the SQL Server's Books Online. According to Books Online under the RAND functions topic:
"Repetitive invocations of RAND() in a single query will produce the same value."
This is the reason why the same value is returned in the SELECT statement above for the RAND() column.
Conclusion¶
In conclusion, the only way to generate a random number of records from any table is to combine the TOP clause and the ORDER BY NEWID()
in the SELECT statement.