[tests, minhashing] Properly fix the random seed from minhashing tests
authorSimon Chabot <simon.chabot@logilab.fr>
Tue, 01 Oct 2019 10:09:47 +0200
changeset 567 d5f2a4fdb810
parent 566 f6ba33f1779a
child 568 e34db1c9be44
[tests, minhashing] Properly fix the random seed from minhashing tests The seed must be fixed in the module where the random function is called. To do that, we have to patch the randint function. Fixing the seed in the test module is not enought, as the randint function the minhashing module is *an other* instance of the function.
test/test_minhashing.py
--- a/test/test_minhashing.py	Tue Oct 01 10:04:38 2019 +0200
+++ b/test/test_minhashing.py	Tue Oct 01 10:09:47 2019 +0200
@@ -24,7 +24,7 @@
 from functools import partial
 from os import path
 import random
-random.seed(6)  # Make sure tests are repeatable
+from unittest.mock import patch
 
 from nazca.utils.normalize import simplify
 from nazca.utils.minhashing import Minlsh, count_vectorizer_func
@@ -62,8 +62,21 @@
                      u"Je les ai vus ensemble à plusieurs occasions.",
                      ]
         minlsh = Minlsh()
-        # XXX Should works independantly of the seed. Unstability due to the bands number ?
-        minlsh.train((simplify(s, FRENCH_LEMMAS, remove_stopwords=True) for s in sentences), 1, 200)
+
+        # the minhashing function is based on a « hash » function. This hash
+        # function is itself based on two integers randomly chosen in a
+        # given interval. Different runs of this procedure will produce
+        # different results because the hash function will be different.
+        # This is the expected behaviour. Excepted in the tests, where we
+        # need the result to be reproductible. Therefore we patch the
+        # `random.randint` with a fixed-seed version.
+        myrandom = random.Random("spam eggs bacon")
+        with patch("nazca.utils.minhashing.randint", myrandom.randint):
+            minlsh.train(
+                (simplify(s, FRENCH_LEMMAS, remove_stopwords=True) for s in sentences),
+                1,
+                200,
+            )
         self.assertEqual(set([(0, 1), (2, 3), (5, 6)]), minlsh.predict(0.4))