top of page

Use of Numpy random seed and random_state in train_test split function

  • neovijayk
  • Jul 6, 2020
  • 2 min read

In this article we will take a look at following points:

  1. numpy random seed function? results.

  2. After every execution of code how to get same train and test data each time

Both points are important and going to need in Machine Learning and Deep Learning or in other computation purpose. Let’s take look at them one by one:

numpy random seed function

What is numpy random seed function?

  1. NumPy random seed is simply a function that sets the random seed of the NumPy pseudo-random number generator.

  2. It provides an essential input that enables NumPy to generate pseudo-random numbers for random processes.

What is pseudo

  1. A pseudo-random number is a number. A number that’s sort-of random. Pseudo-random.

  2. So essentially, a pseudo-random number is a number that’s almost random, but not really random.

  3. Pseudo-random numbers are numbers that appear to be random, but are not actually random.

  4.  Pseudo-random numbers are computer generated numbers that appear random, but are actually predetermined.

Execution of code and results by setting seed

for seed = 0

Example 1: if we execute following line of code repeatedly each time we get different set of numbers in an numpy array



1

2

3

4

5

6


>>> np.random.rand(4)

[0.4236548 0.64589411 0.43758721 0.891773 ]

>>> np.random.rand(4)

[0.96366276 0.38344152 0.79172504 0.52889492]

>>> np.random.rand(4)

[0.56804456 0.92559664 0.07103606 0.0871293 ]

Now we want these numbers should be fixed set of numbers after each execution therefore we will set it to seed = 0.

  1. np.random.seed(0) makes the random numbers predictable

  2. With the seed reset (every time), the same set of numbers will appear every time.

  3. If the random seed is not reset, different numbers appear with every invocation



1

2

3

4


>>> numpy.random.seed(0) ; numpy.random.rand(4)

[0.5488135 0.71518937 0.60276338 0.54488318]

>>> numpy.random.seed(0) ; numpy.random.rand(4)

[0.5488135 0.71518937 0.60276338 0.54488318]

Observation:

  1. For seed = 0 we are getting same set of values in the same order each time

  2. Even after restarting the kernel of the notebook still same set of numbers are generated in the same order

for seed = 1, 2, 3 and 1000



1

2

3

4

5

6

7

8


>>> numpy.random.seed(1) ; numpy.random.rand(4)

[4.17022005e-01 7.20324493e-01 1.14374817e-04 3.02332573e-01]

>>> numpy.random.seed(2) ; numpy.random.rand(4)

[0.4359949 0.02592623 0.54966248 0.43532239]

>>> numpy.random.seed(3) ; numpy.random.rand(4)

[0.5507979 0.70814782 0.29090474 0.51082761]

>>> numpy.random.seed(1000) ; numpy.random.rand(4)

[0.65358959 0.11500694 0.95028286 0.4821914 ]

As we can see with each value of seed we are getting different combinations of different numbers

random_state in train and test split

As we saw above setting random seed generates same set of values in the same order. random_state in sklearn’s train_test_split function performs the same. Let’s take a look at the results of the code implementation:

Before random_state


Application of random_state: generating same output after every execution


For the complete code please refer my GitHub repository.

Comentários


Subscribe to BrainStorm newsletter

For notifications on latest posts/blogs

Thanks for submitting!

  • Twitter
  • Facebook
  • Linkedin

© 2023 by my-learnings.   Copy rights Vijay@my-learnings.com

bottom of page