[aligner] add findneighbours docstring
authorSimon Chabot <simon.chabot@logilab.fr>
Fri, 09 Nov 2012 11:15:41 +0100
changeset 106 4b06851fa2c1
parent 105 aec512170eac
child 107 5de6850d5183
[aligner] add findneighbours docstring
TODO
aligner.py
--- a/TODO	Fri Nov 09 11:08:24 2012 +0100
+++ b/TODO	Fri Nov 09 11:15:41 2012 +0100
@@ -1,2 +1,1 @@
 Écrire des test pour aligner.py
-Add missing docstrings (findneighbours…)
--- a/aligner.py	Fri Nov 09 11:08:24 2012 +0100
+++ b/aligner.py	Fri Nov 09 11:15:41 2012 +0100
@@ -38,6 +38,37 @@
 
 def findneighbours(alignset, targetset, indexes=(1, 1), mode='kdtree',
                    threshold=0.1, n_clusters=None, kwordsgram=1, siglen=200):
+    """ This function helps to find neighbours from items of alignset and
+        targetset. “Neighbours” are items that are “not so far”, ie having a
+        close label, are located in the same area etc.
+
+        This function handles two types of neighbouring : text and numeric.
+        For text value, you have to use the “minhashing” and for numeric, you
+        can choose from “kdtree“, “kdmeans“ and “minibatch”
+
+        The arguments to give are :
+            - `alignset` and `targetset` are the sets where neighbours have to
+              be found.
+            - `indexes` are the location of items to compare
+            - `mode` is the search type to use
+            - `threshold` is the `mode` threshold
+
+            - `n_clusters` is used for "kmeans" and "minibatch" methods, and it
+              is the number of clusters to use.
+
+            - `kwordsgram` and `siglen` are used for "minhashing". `kwordsgram`
+              is the length of wordsgrams to use, and `siglen` is the length of
+              the minhashing signature matrix.
+
+        return a list of lists, built as the following :
+            [
+                [[indexes_of_alignset_0], [indexes_of_targetset_0]],
+                [[indexes_of_alignset_1], [indexes_of_targetset_1]],
+                [[indexes_of_alignset_2], [indexes_of_targetset_2]],
+                [[indexes_of_alignset_3], [indexes_of_targetset_3]],
+                ...
+            ]
+    """
 
     SEARCHERS = set(['kdtree', 'minhashing', 'kmeans', 'minibatch'])
     mode = mode.lower()