Use of Numpy random seed and random_state in train_test split function
- neovijayk
- Jul 6, 2020
- 2 min read
In this article we will take a look at following points:
numpy random seed function? results.
After every execution of code how to get same train and test data each time
Both points are important and going to need in Machine Learning and Deep Learning or in other computation purpose. Let’s take look at them one by one:
numpy random seed function
What is numpy random seed function?
NumPy random seed is simply a function that sets the random seed of the NumPy pseudo-random number generator.
It provides an essential input that enables NumPy to generate pseudo-random numbers for random processes.
What is pseudo
A pseudo-random number is a number. A number that’s sort-of random. Pseudo-random.
So essentially, a pseudo-random number is a number that’s almost random, but not really random.
Pseudo-random numbers are numbers that appear to be random, but are not actually random.
Pseudo-random numbers are computer generated numbers that appear random, but are actually predetermined.
Execution of code and results by setting seed
for seed = 0
Example 1: if we execute following line of code repeatedly each time we get different set of numbers in an numpy array
1
2
3
4
5
6
>>> np.random.rand(4)
[0.4236548 0.64589411 0.43758721 0.891773 ]
>>> np.random.rand(4)
[0.96366276 0.38344152 0.79172504 0.52889492]
>>> np.random.rand(4)
[0.56804456 0.92559664 0.07103606 0.0871293 ]
Now we want these numbers should be fixed set of numbers after each execution therefore we will set it to seed = 0.
np.random.seed(0) makes the random numbers predictable
With the seed reset (every time), the same set of numbers will appear every time.
If the random seed is not reset, different numbers appear with every invocation
1
2
3
4
>>> numpy.random.seed(0) ; numpy.random.rand(4)
[0.5488135 0.71518937 0.60276338 0.54488318]
>>> numpy.random.seed(0) ; numpy.random.rand(4)
[0.5488135 0.71518937 0.60276338 0.54488318]
Observation:
For seed = 0 we are getting same set of values in the same order each time
Even after restarting the kernel of the notebook still same set of numbers are generated in the same order
for seed = 1, 2, 3 and 1000
1
2
3
4
5
6
7
8
>>> numpy.random.seed(1) ; numpy.random.rand(4)
[4.17022005e-01 7.20324493e-01 1.14374817e-04 3.02332573e-01]
>>> numpy.random.seed(2) ; numpy.random.rand(4)
[0.4359949 0.02592623 0.54966248 0.43532239]
>>> numpy.random.seed(3) ; numpy.random.rand(4)
[0.5507979 0.70814782 0.29090474 0.51082761]
>>> numpy.random.seed(1000) ; numpy.random.rand(4)
[0.65358959 0.11500694 0.95028286 0.4821914 ]
As we can see with each value of seed we are getting different combinations of different numbers
random_state in train and test split
As we saw above setting random seed generates same set of values in the same order. random_state in sklearn’s train_test_split function performs the same. Let’s take a look at the results of the code implementation:

Before random_state

Application of random_state: generating same output after every execution
For the complete code please refer my GitHub repository.
Comentários