Mon, 29 Oct 2012 10:17:24 +0100 [Minhashing] Trained data can be saved and loaded
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 10:17:24 +0100] rev 57
[Minhashing] Trained data can be saved and loaded
Mon, 29 Oct 2012 09:52:16 +0100 [distances] The planet radius for the geographical distance can be given
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 09:52:16 +0100] rev 56
[distances] The planet radius for the geographical distance can be given
Mon, 29 Oct 2012 09:46:26 +0100 [distances] Don't concat two strings to know if there is a space in one of them. Make two tests instead
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 09:46:26 +0100] rev 55
[distances] Don't concat two strings to know if there is a space in one of them. Make two tests instead
Fri, 26 Oct 2012 11:44:50 +0200 [typo] alignement --> alignment
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 11:44:50 +0200] rev 54
[typo] alignement --> alignment
Fri, 26 Oct 2012 11:43:57 +0200 [Matrix] Maximum distance computation delegated to max() method of numpy.matrix
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 11:43:57 +0200] rev 53
[Matrix] Maximum distance computation delegated to max() method of numpy.matrix
Fri, 26 Oct 2012 11:41:27 +0200 [Minhash] Tests written
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 11:41:27 +0200] rev 52
[Minhash] Tests written
Fri, 26 Oct 2012 09:49:07 +0200 [Matrix] Improvement of the matched() method
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 09:49:07 +0200] rev 51
[Matrix] Improvement of the matched() method Don't run over all indices but ask directly to scipy for the wanted ones.
Fri, 26 Oct 2012 09:54:16 +0200 [Matrix] Change lil_matrix for dense matrix
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 09:54:16 +0200] rev 50
[Matrix] Change lil_matrix for dense matrix Some experiments shown the matrix wasn't sparse at all. So it's better to use the appropriate data structure. *** [Matrix] Why use todense() method if the matrix is… already dense
Fri, 26 Oct 2012 13:43:08 +0200 [Distance] Add geographical distance
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 13:43:08 +0200] rev 49
[Distance] Add geographical distance (Equirectangular projection)
Thu, 25 Oct 2012 16:43:50 +0200 [testing] make the alignment testing a little bit more cleaner
Simon Chabot <simon.chabot@logilab.fr> [Thu, 25 Oct 2012 16:43:50 +0200] rev 48
[testing] make the alignment testing a little bit more cleaner Now use rdflib instead of a (bad) homemade code
Thu, 25 Oct 2012 16:41:53 +0200 [Matrix] Adapt the globalalignmentmatrix computation to the previous changeset
Simon Chabot <simon.chabot@logilab.fr> [Thu, 25 Oct 2012 16:41:53 +0200] rev 47
[Matrix] Adapt the globalalignmentmatrix computation to the previous changeset (4c2f7553490b)
Thu, 25 Oct 2012 16:40:42 +0200 [Matrix] Give weighting and normalization at the contruction of the matrix
Simon Chabot <simon.chabot@logilab.fr> [Thu, 25 Oct 2012 16:40:42 +0200] rev 46
[Matrix] Give weighting and normalization at the contruction of the matrix It makes the computation faster
Wed, 24 Oct 2012 19:10:54 +0200 wip
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 19:10:54 +0200] rev 45
wip
Wed, 24 Oct 2012 15:39:38 +0200 [Minhashing] : Really, really, really faster training
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 15:39:38 +0200] rev 44
[Minhashing] : Really, really, really faster training
Wed, 24 Oct 2012 14:51:09 +0200 [Matrix] Compute the global alignement matrix
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 14:51:09 +0200] rev 43
[Matrix] Compute the global alignement matrix
Wed, 24 Oct 2012 14:50:23 +0200 [Matrix] Can pass extra arguments to distance functions
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 14:50:23 +0200] rev 42
[Matrix] Can pass extra arguments to distance functions
Wed, 24 Oct 2012 12:37:56 +0200 [Matrix] Computation handles unknown values
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 12:37:56 +0200] rev 41
[Matrix] Computation handles unknown values
Wed, 24 Oct 2012 11:51:57 +0200 [Test] Updated to the changement of simplify()
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 11:51:57 +0200] rev 40
[Test] Updated to the changement of simplify()
Wed, 24 Oct 2012 11:43:11 +0200 [Normalize] Add docstring to simplify()
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 11:43:11 +0200] rev 39
[Normalize] Add docstring to simplify()
Wed, 24 Oct 2012 11:52:22 +0200 [normalize] Add a stopword removing option to simplify()
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 11:52:22 +0200] rev 38
[normalize] Add a stopword removing option to simplify()
Wed, 24 Oct 2012 11:29:08 +0200 [Minhashing] Api fonctionnelle
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 11:29:08 +0200] rev 37
[Minhashing] Api fonctionnelle
Mon, 22 Oct 2012 18:17:17 +0200 [minhashing] First try of minhashing (related to #129000)
Simon Chabot <simon.chabot@logilab.fr> [Mon, 22 Oct 2012 18:17:17 +0200] rev 36
[minhashing] First try of minhashing (related to #129000)
Fri, 19 Oct 2012 18:22:14 +0200 [Matrix] Add basic operations such as add, mul sub, etc
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 18:22:14 +0200] rev 35
[Matrix] Add basic operations such as add, mul sub, etc
Fri, 19 Oct 2012 17:27:40 +0200 [Matrix] Add __repr__ method()
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 17:27:40 +0200] rev 34
[Matrix] Add __repr__ method()
Fri, 19 Oct 2012 16:57:40 +0200 [Matrix] Don't store inputs inside DistanceMatrix object
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 16:57:40 +0200] rev 33
[Matrix] Don't store inputs inside DistanceMatrix object Storing them had not sense, was useless, and it was annoying for future summation of matrices.
Fri, 19 Oct 2012 14:44:26 +0200 [Distance] Spaces are correctly supported by distances functions
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 14:44:26 +0200] rev 32
[Distance] Spaces are correctly supported by distances functions In fact, the problem was 'Victor Hugo' and 'Hugo Victor' had a big distance when we'd like in this case to have a small one (even a zero one !) So, the approach followed was : Construct a distance matrix : | Victor | Hugo Victor | 0 | 5 Hugo | 5 | 0 And return the minimun of the minimun of each row. In fact, we return the maximun of the minimum of the previous matrix, and the its transpose, to handle the following case : | Victor | Hugo | Jean | Victor | 0 | 5 | 6 |--> min of each row : 0 Hugo | 5 | 0 | 4 | | Victor | Hugo | Victor | 0 | 5 | Hugo | 5 | 0 |--> min of each row : 4 Jean | 6 | 4 | Return the max, ie : 4.
Fri, 19 Oct 2012 11:38:54 +0200 [Matrix] Cannot use zip if it not a square matrix \!
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 11:38:54 +0200] rev 31
[Matrix] Cannot use zip if it not a square matrix \!
Fri, 19 Oct 2012 11:21:44 +0200 [Matrix] Matched() return index and value, as tuples
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 11:21:44 +0200] rev 30
[Matrix] Matched() return index and value, as tuples
Fri, 19 Oct 2012 10:46:14 +0200 [Matrix] Matched() has a lower complexity
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 10:46:14 +0200] rev 29
[Matrix] Matched() has a lower complexity Instead of reading the whole matrix and get indexes where the value is under the cutoff (O(N²)), the idea is : - Get all indexes where the value is not null (ie not exact matched) O(N) - Append all indexes that are not in the previous list (exact matched) O(N) - If cutoff > 0, for all indexes where not null, test if the O(N) çvalue < cutoff, and add or not
Fri, 19 Oct 2012 16:58:27 +0200 [Matrix] Matrices are not symetric. Correct it
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 16:58:27 +0200] rev 28
[Matrix] Matrices are not symetric. Correct it
Fri, 19 Oct 2012 10:06:50 +0200 [Test] Assure distances are symetric
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 10:06:50 +0200] rev 27
[Test] Assure distances are symetric
Thu, 18 Oct 2012 19:01:27 +0200 [Test] Add tests for matrix
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 19:01:27 +0200] rev 26
[Test] Add tests for matrix
Thu, 18 Oct 2012 18:18:40 +0200 [Test] Tests don't inherite anymore from CWTestCase by directly from unittest2 (be independant)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 18:18:40 +0200] rev 25
[Test] Tests don't inherite anymore from CWTestCase by directly from unittest2 (be independant)
Thu, 18 Oct 2012 18:15:25 +0200 [Normalizer] Lematizer returns a string (better for comparison)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 18:15:25 +0200] rev 24
[Normalizer] Lematizer returns a string (better for comparison)
Thu, 18 Oct 2012 17:58:53 +0200 [Matrix] Enables normalization
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 17:58:53 +0200] rev 23
[Matrix] Enables normalization
Thu, 18 Oct 2012 17:58:28 +0200 [Test] Distance, test euclidean distance on strings too
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 17:58:28 +0200] rev 22
[Test] Distance, test euclidean distance on strings too
Thu, 18 Oct 2012 17:16:44 +0200 [normalize] Using a class was a bad idea, I removed it
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 17:16:44 +0200] rev 21
[normalize] Using a class was a bad idea, I removed it
Thu, 18 Oct 2012 16:35:18 +0200 [Matrix] Compute a distance matrix
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 16:35:18 +0200] rev 20
[Matrix] Compute a distance matrix
Thu, 18 Oct 2012 15:01:47 +0200 [Distance] Exchange 1 and 0 in the soundex distance, because we try to minimize the distance
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 15:01:47 +0200] rev 19
[Distance] Exchange 1 and 0 in the soundex distance, because we try to minimize the distance
Thu, 18 Oct 2012 13:55:25 +0200 [Distance] Temporal distance supports ambiguity and fuzzyness
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 13:55:25 +0200] rev 18
[Distance] Temporal distance supports ambiguity and fuzzyness - You can precise if the day or the year is given in first (day/month/year or year/month/day or month/day/year format). By default, it assumes the current format is the french common used one, ie day/month/year - You can give fuzzy sentence and compare dates : temporal('Jean est né le 1er octobre 1958', 'Le 01-10-1958, Jean est né') yields 0 !
Thu, 18 Oct 2012 12:22:30 +0200 [Normalizer] Add the format method (related to #128998)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 12:22:30 +0200] rev 17
[Normalizer] Add the format method (related to #128998)
Thu, 18 Oct 2012 11:41:59 +0200 Add LGPL to distance.py and normalize.py
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 11:41:59 +0200] rev 16
Add LGPL to distance.py and normalize.py
Thu, 18 Oct 2012 10:19:15 +0200 [Normalizer] Add a normaliser (related to #128998)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 10:19:15 +0200] rev 15
[Normalizer] Add a normaliser (related to #128998) - unormalize - tokenize - lemmatize - round
Wed, 17 Oct 2012 16:47:07 +0200 [distances] Add an euclidean distance function between two numbers (closes #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 16:47:07 +0200] rev 14
[distances] Add an euclidean distance function between two numbers (closes #128982)
Wed, 17 Oct 2012 15:31:11 +0200 [distance] Add a new distance between dates (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 15:31:11 +0200] rev 13
[distance] Add a new distance between dates (related #128982)
Wed, 17 Oct 2012 12:34:28 +0200 [distances] Add jaccard distance (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 12:34:28 +0200] rev 12
[distances] Add jaccard distance (related #128982)
Wed, 17 Oct 2012 12:05:02 +0200 [distances] Add the soundex distance (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 12:05:02 +0200] rev 11
[distances] Add the soundex distance (related #128982)
Wed, 17 Oct 2012 12:04:41 +0200 [distance] move soundex to soundexcode (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 12:04:41 +0200] rev 10
[distance] move soundex to soundexcode (related #128982) In fact, soundexcode is the function returning the soundex code of a word, and soundex will be the 1/0 distance between two words. (1 meaning both have the same code, 0 otherwise)
Wed, 17 Oct 2012 11:56:22 +0200 [distances] Remove some trailing spaces (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:56:22 +0200] rev 9
[distances] Remove some trailing spaces (related #128982)
Wed, 17 Oct 2012 11:56:04 +0200 [distance] Start the iteration at 1, not 0 because we don't care about the first
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:56:04 +0200] rev 8
[distance] Start the iteration at 1, not 0 because we don't care about the first letter (related #128982)
Wed, 17 Oct 2012 11:55:23 +0200 [distances] Correct an IndexError in the soundex code (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:55:23 +0200] rev 7
[distances] Correct an IndexError in the soundex code (related #128982) As far as we don't know if word[i + 2] is consonant, we use get because it can be a vowel (and crash…)
Wed, 17 Oct 2012 11:53:17 +0200 [distance] Soudex : if there is a vowel between two identical numbered
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:53:17 +0200] rev 6
[distance] Soudex : if there is a vowel between two identical numbered consonants, count those consonants twice. (related #128982)
Wed, 17 Oct 2012 11:51:42 +0200 [test] Add some other tests to soudex, and some explanations too
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:51:42 +0200] rev 5
[test] Add some other tests to soudex, and some explanations too (related #128982)
Wed, 17 Oct 2012 11:50:55 +0200 [test] The test for soundex was false, I corrected it (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:50:55 +0200] rev 4
[test] The test for soundex was false, I corrected it (related #128982)
Wed, 17 Oct 2012 16:43:42 +0200 [tests] Add tests for soundex and levenshtein (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 16:43:42 +0200] rev 3
[tests] Add tests for soundex and levenshtein (related #128982)
Wed, 17 Oct 2012 16:42:48 +0200 [distances] Add soundex code (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 16:42:48 +0200] rev 2
[distances] Add soundex code (related #128982)
Wed, 17 Oct 2012 16:40:51 +0200 [distance] add Levenshtein distance (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 16:40:51 +0200] rev 1
[distance] add Levenshtein distance (related #128982)
Fri, 12 Oct 2012 10:23:58 +0200 Initial commit
Simon Chabot <simon.chabot@logilab.fr> [Fri, 12 Oct 2012 10:23:58 +0200] rev 0
Initial commit
(0) +200 tip