Tue, 23 Apr 2013 16:46:43 +0200 [setup] include every python module under the nazca package (closes #134570)
Adrien Di Mascio <Adrien.DiMascio@logilab.fr> [Tue, 23 Apr 2013 16:46:43 +0200] rev 251
[setup] include every python module under the nazca package (closes #134570)
Thu, 04 Apr 2013 18:24:49 +0200 Added tag nazca-debian-version-0.2.0-1 for changeset 5ac1afdeaf6b
Vincent Michel <vincent.michel@logilab.fr> [Thu, 04 Apr 2013 18:24:49 +0200] rev 250
Added tag nazca-debian-version-0.2.0-1 for changeset 5ac1afdeaf6b
Thu, 04 Apr 2013 18:24:41 +0200 Added tag nazca-version-0.2.0 for changeset 59a15b188628 nazca-debian-version-0.2.0-1
Vincent Michel <vincent.michel@logilab.fr> [Thu, 04 Apr 2013 18:24:41 +0200] rev 249
Added tag nazca-version-0.2.0 for changeset 59a15b188628
Thu, 04 Apr 2013 18:23:49 +0200 preparing 0.2.0 nazca-version-0.2.0
Vincent Michel <vincent.michel@logilab.fr> [Thu, 04 Apr 2013 18:23:49 +0200] rev 248
preparing 0.2.0
Thu, 04 Apr 2013 18:17:35 +0200 [doc] Use sphinx roles and update the sample code (closes #119623)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 04 Apr 2013 18:17:35 +0200] rev 247
[doc] Use sphinx roles and update the sample code (closes #119623)
Fri, 25 Jan 2013 12:41:53 +0100 [Aligner] `normalize_set` handles tuples. (closes #117136)
Simon Chabot <simon.chabot@logilab.fr> [Fri, 25 Jan 2013 12:41:53 +0100] rev 246
[Aligner] `normalize_set` handles tuples. (closes #117136)
Wed, 30 Jan 2013 15:14:23 +0100 [doc] Little explanation on alignall_iterative() (closes #116943)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 30 Jan 2013 15:14:23 +0100] rev 245
[doc] Little explanation on alignall_iterative() (closes #116943)
Fri, 15 Feb 2013 11:03:40 +0100 [aligner] Speed up the alignset reduction (closes #116942)
Simon Chabot <simon.chabot@logilab.fr> [Fri, 15 Feb 2013 11:03:40 +0100] rev 244
[aligner] Speed up the alignset reduction (closes #116942)
Wed, 30 Jan 2013 14:43:24 +0100 [aligner] Enable the user to customize the equality_threshold (closes #116940)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 30 Jan 2013 14:43:24 +0100] rev 243
[aligner] Enable the user to customize the equality_threshold (closes #116940)
Thu, 04 Apr 2013 18:10:10 +0200 [aligner] Enable the user to reuse the cache returned by alignall_iterative() (closes #116938)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 04 Apr 2013 18:10:10 +0200] rev 242
[aligner] Enable the user to reuse the cache returned by alignall_iterative() (closes #116938) The cache returned can be reused by the alignall_iterative() function, to perform another alignment with different parameters, or just for the user to be sure everything has been correctly caught)
Thu, 14 Feb 2013 16:00:55 +0100 [aligner] Add the alignall_iterative() function (closes #116932)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 14 Feb 2013 16:00:55 +0100] rev 231
[aligner] Add the alignall_iterative() function (closes #116932) This function splits the files to align into smaller ones, and run the alignment using a cache. *** [aligner] Better display of progression
Fri, 15 Feb 2013 10:37:39 +0100 [dataio] Implements split_file() (closes #116931)
Simon Chabot <simon.chabot@logilab.fr> [Fri, 15 Feb 2013 10:37:39 +0100] rev 230
[dataio] Implements split_file() (closes #116931) To speed up the alignement and use a cache system, we need a function to split huge files into smaller ones. This function does the job.
Wed, 23 Jan 2013 12:06:55 +0100 [doc] Typo + a litte text about the online demo. (closes #116939)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 23 Jan 2013 12:06:55 +0100] rev 221
[doc] Typo + a litte text about the online demo. (closes #116939)
Wed, 23 Jan 2013 12:13:36 +0100 [aligner] Enables the user to give formatting options (closes #116930)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 23 Jan 2013 12:13:36 +0100] rev 220
[aligner] Enables the user to give formatting options (closes #116930)
Fri, 15 Feb 2013 10:13:36 +0100 [aligner] Deal with possible singleton value for KDTree
Vincent Michel <vincent.michel@logilab.fr> [Fri, 15 Feb 2013 10:13:36 +0100] rev 219
[aligner] Deal with possible singleton value for KDTree
Fri, 07 Dec 2012 13:52:20 +0100 [normalize] Dont simplify item if they aren't basestrings
Simon Chabot <simon.chabot@logilab.fr> [Fri, 07 Dec 2012 13:52:20 +0100] rev 182
[normalize] Dont simplify item if they aren't basestrings
Tue, 18 Dec 2012 14:47:23 +0100 Added tag nazca-debian-version-0.1.0-1 for changeset 3506af368209
Vincent Michel <vincent.michel@logilab.fr> [Tue, 18 Dec 2012 14:47:23 +0100] rev 181
Added tag nazca-debian-version-0.1.0-1 for changeset 3506af368209
Tue, 18 Dec 2012 14:47:22 +0100 Added tag nazca-version-0.1.0 for changeset fe3556bda079 nazca-debian-version-0.1.0-1
Vincent Michel <vincent.michel@logilab.fr> [Tue, 18 Dec 2012 14:47:22 +0100] rev 180
Added tag nazca-version-0.1.0 for changeset fe3556bda079
Tue, 18 Dec 2012 13:05:22 +0100 Remove unused files nazca-version-0.1.0
Vincent Michel <vincent.michel@logilab.fr> [Tue, 18 Dec 2012 13:05:22 +0100] rev 179
Remove unused files
Tue, 18 Dec 2012 13:06:20 +0100 [setup] Add setup and __pkginfo__
Vincent Michel <vincent.michel@logilab.fr> [Tue, 18 Dec 2012 13:06:20 +0100] rev 178
[setup] Add setup and __pkginfo__
Tue, 18 Dec 2012 08:58:52 +0100 [debian] Change package name
Vincent Michel <vincent.michel@logilab.fr> [Tue, 18 Dec 2012 08:58:52 +0100] rev 177
[debian] Change package name
Mon, 17 Dec 2012 13:37:33 +0100 [debian] Fix changelog bis
Vincent Michel <vincent.michel@logilab.fr> [Mon, 17 Dec 2012 13:37:33 +0100] rev 176
[debian] Fix changelog bis
Mon, 17 Dec 2012 12:28:00 +0100 [debian] Fix changelog
Vincent Michel <vincent.michel@logilab.fr> [Mon, 17 Dec 2012 12:28:00 +0100] rev 175
[debian] Fix changelog
Thu, 29 Nov 2012 17:15:32 +0100 Added tag python-nazca-0.1.0 for changeset 4e2afdca3f0c
Vincent Michel <vincent.michel@logilab.fr> [Thu, 29 Nov 2012 17:15:32 +0100] rev 174
Added tag python-nazca-0.1.0 for changeset 4e2afdca3f0c
Thu, 29 Nov 2012 15:11:54 +0100 [aligner] the neighbouring function in ``alignall`` is optional. python-nazca-0.1.0
Simon Chabot <simon.chabot@logilab.fr> [Thu, 29 Nov 2012 15:11:54 +0100] rev 173
[aligner] the neighbouring function in ``alignall`` is optional. In order to provide the more generic function, the ``mode`` argument in the ``alignall`` function is optional. So, it's possible to use this function with ou without neighbouring finding, which wasn't before.
Thu, 29 Nov 2012 14:09:00 +0100 [dataio] Add a ``rqlquery()`` function
Simon Chabot <simon.chabot@logilab.fr> [Thu, 29 Nov 2012 14:09:00 +0100] rev 172
[dataio] Add a ``rqlquery()`` function Execute a rql query on a given cubicweb host, and parse and format the results to the data structure used by nazca alignement process. The cubicweb host is expected to return a csv file (thanks to the csvexport view) *** [dataio] ``rqlquery()`` handles indexes formatting. The output stream get by urllib in ``rqlquery`` is forwarded to the ``parsefile`` function, which already support indexes formatting.
Thu, 29 Nov 2012 14:06:18 +0100 [dataio] ``parsefile()`` handles string or stream as ``filename``
Simon Chabot <simon.chabot@logilab.fr> [Thu, 29 Nov 2012 14:06:18 +0100] rev 171
[dataio] ``parsefile()`` handles string or stream as ``filename`` This patch aims to enable the user to give the path of the file she wants to open, or an already opened file. This is usefull because this function now handles files opened by urllib or whatever.
Tue, 27 Nov 2012 20:17:49 +0100 [dataio] Avoid many imports of the same module (SPARQLWrapper and JSON)
Simon Chabot <simon.chabot@logilab.fr> [Tue, 27 Nov 2012 20:17:49 +0100] rev 170
[dataio] Avoid many imports of the same module (SPARQLWrapper and JSON) The ImportError is still raised in case of failure but only when the ``sparqlquery`` function is called, but there are no more imports made each time the function is called.
Thu, 29 Nov 2012 14:11:20 +0100 [dataio] Don't assume by default the input of ``autocasted`` is a string.
Simon Chabot <simon.chabot@logilab.fr> [Thu, 29 Nov 2012 14:11:20 +0100] rev 169
[dataio] Don't assume by default the input of ``autocasted`` is a string. The argument of ``autocasted`` may be an int or a float or whatever and those types don't have a ``strip`` method. So the idea is to call this method only if it's neither a float nor an int.
Mon, 26 Nov 2012 11:52:28 +0100 [test] Try to protect some tests from a deprecated version of scikit learn
Vincent Michel <vincent.michel@logilab.fr> [Mon, 26 Nov 2012 11:52:28 +0100] rev 168
[test] Try to protect some tests from a deprecated version of scikit learn
Mon, 26 Nov 2012 11:35:47 +0100 [test] Remove unused pytestconf.py
Vincent Michel <vincent.michel@logilab.fr> [Mon, 26 Nov 2012 11:35:47 +0100] rev 167
[test] Remove unused pytestconf.py
Mon, 26 Nov 2012 11:11:29 +0100 Rename package into Nazca and change imports
Vincent Michel <vincent.michel@logilab.fr> [Mon, 26 Nov 2012 11:11:29 +0100] rev 166
Rename package into Nazca and change imports
Mon, 26 Nov 2012 11:00:59 +0100 [doc] Enhancements in the doc
Vincent Michel <vincent.michel@logilab.fr> [Mon, 26 Nov 2012 11:00:59 +0100] rev 165
[doc] Enhancements in the doc
Mon, 26 Nov 2012 10:42:17 +0100 [test] Add the french lemmas file for tests
Vincent Michel <vincent.michel@logilab.fr> [Mon, 26 Nov 2012 10:42:17 +0100] rev 164
[test] Add the french lemmas file for tests
Mon, 26 Nov 2012 10:37:43 +0100 [minhashing] Remove depracated __main__
Vincent Michel <vincent.michel@logilab.fr> [Mon, 26 Nov 2012 10:37:43 +0100] rev 163
[minhashing] Remove depracated __main__
Wed, 21 Nov 2012 10:17:48 +0100 [aligner] Correct the bug raised in 4d53757fbadf
Simon Chabot <simon.chabot@logilab.fr> [Wed, 21 Nov 2012 10:17:48 +0100] rev 162
[aligner] Correct the bug raised in 4d53757fbadf For remind: The treaments must be applied before the findneighbours() function is called, otherwise it's a little bit useless. But, the treatments are also applied by align() function. It's too much and useless. We should find a way to apply it once whatever is the called function (because align() can be called lonely…) The temporary solution is to call normalize_set twice, but it has to be just a temporary solution ! Now, there is a boolean variable for that purpose.
Tue, 20 Nov 2012 18:03:37 +0100 [demo] Use the new formating parsefile options
Simon Chabot <simon.chabot@logilab.fr> [Tue, 20 Nov 2012 18:03:37 +0100] rev 161
[demo] Use the new formating parsefile options
Tue, 20 Nov 2012 18:14:45 +0100 [dataio] Add a formating option
Simon Chabot <simon.chabot@logilab.fr> [Tue, 20 Nov 2012 18:14:45 +0100] rev 160
[dataio] Add a formating option
Tue, 20 Nov 2012 15:44:42 +0100 [distance] Don't read letters uselessly to compute the soundexcode
Simon Chabot <simon.chabot@logilab.fr> [Tue, 20 Nov 2012 15:44:42 +0100] rev 159
[distance] Don't read letters uselessly to compute the soundexcode
Tue, 20 Nov 2012 15:17:38 +0100 [normalize] Make the loadlemmas() function more readable
Simon Chabot <simon.chabot@logilab.fr> [Tue, 20 Nov 2012 15:17:38 +0100] rev 158
[normalize] Make the loadlemmas() function more readable
Tue, 20 Nov 2012 15:13:57 +0100 Remove useless imports
Simon Chabot <simon.chabot@logilab.fr> [Tue, 20 Nov 2012 15:13:57 +0100] rev 157
Remove useless imports
Tue, 20 Nov 2012 15:13:29 +0100 Remove useless variables
Simon Chabot <simon.chabot@logilab.fr> [Tue, 20 Nov 2012 15:13:29 +0100] rev 156
Remove useless variables
Tue, 20 Nov 2012 15:07:41 +0100 [align] Assert all inputs of all items have the same length.
Simon Chabot <simon.chabot@logilab.fr> [Tue, 20 Nov 2012 15:07:41 +0100] rev 155
[align] Assert all inputs of all items have the same length.
Tue, 20 Nov 2012 15:06:33 +0100 [matrix] Add comment on normalization
Simon Chabot <simon.chabot@logilab.fr> [Tue, 20 Nov 2012 15:06:33 +0100] rev 154
[matrix] Add comment on normalization
Tue, 20 Nov 2012 10:59:05 +0100 [distance] Move out the custom parser info.
Simon Chabot <simon.chabot@logilab.fr> [Tue, 20 Nov 2012 10:59:05 +0100] rev 153
[distance] Move out the custom parser info. It enables users to give their own implementation of the parser info, thus it can support other languages…
Tue, 20 Nov 2012 10:33:39 +0100 [doc] Some other corrections
Simon Chabot <simon.chabot@logilab.fr> [Tue, 20 Nov 2012 10:33:39 +0100] rev 152
[doc] Some other corrections
Mon, 19 Nov 2012 17:13:57 +0100 [doc] Some corrections
Simon Chabot <simon.chabot@logilab.fr> [Mon, 19 Nov 2012 17:13:57 +0100] rev 151
[doc] Some corrections
Mon, 19 Nov 2012 16:53:08 +0100 [doc] First version
Simon Chabot <simon.chabot@logilab.fr> [Mon, 19 Nov 2012 16:53:08 +0100] rev 150
[doc] First version
Mon, 19 Nov 2012 11:07:18 +0100 [demo] Rename imports
Simon Chabot <simon.chabot@logilab.fr> [Mon, 19 Nov 2012 11:07:18 +0100] rev 149
[demo] Rename imports Imports names (`n` for instance) were confusing, in particular with `numpy`
Mon, 19 Nov 2012 11:01:31 +0100 [aligner] Normalize the sets before calling the findneigbours() function
Simon Chabot <simon.chabot@logilab.fr> [Mon, 19 Nov 2012 11:01:31 +0100] rev 148
[aligner] Normalize the sets before calling the findneigbours() function The treaments must be applied before the findneighbours() function is called, otherwise it's a little bit useless. But, the treatments are also applied by align() function. It's too much and useless. We should find a way to apply it once whatever is the called function (because align() can be called lonely…) The *temporary* solution is to call normalize_set twice, but it has to be just a *temporary* solution !
Mon, 19 Nov 2012 10:51:01 +0100 [demo] Add some prints to show the alignment progress
Simon Chabot <simon.chabot@logilab.fr> [Mon, 19 Nov 2012 10:51:01 +0100] rev 147
[demo] Add some prints to show the alignment progress
Mon, 19 Nov 2012 11:30:02 +0100 [demo] Add some prints to show the alignment progress
Simon Chabot <simon.chabot@logilab.fr> [Mon, 19 Nov 2012 11:30:02 +0100] rev 146
[demo] Add some prints to show the alignment progress
Mon, 19 Nov 2012 10:43:11 +0100 [demo] Use a global function to compute paths
Simon Chabot <simon.chabot@logilab.fr> [Mon, 19 Nov 2012 10:43:11 +0100] rev 145
[demo] Use a global function to compute paths
Mon, 19 Nov 2012 10:37:47 +0100 [normalize] In simplify() replace punctuation by a space
Simon Chabot <simon.chabot@logilab.fr> [Mon, 19 Nov 2012 10:37:47 +0100] rev 144
[normalize] In simplify() replace punctuation by a space Punctuation used to be removed, now the punctuation is replaced by a space. It enables minhashing to split it and send "ivry seine" and "ivry sur seine" in the same bucket, whereas "ivrysurseine" and "ivsyseine" were not.
Thu, 15 Nov 2012 16:48:36 +0100 Add some XXX on Adrien's comments
Simon Chabot <simon.chabot@logilab.fr> [Thu, 15 Nov 2012 16:48:36 +0100] rev 143
Add some XXX on Adrien's comments
Thu, 15 Nov 2012 14:38:15 +0100 [minhashing] Add a verbose mode
Simon Chabot <simon.chabot@logilab.fr> [Thu, 15 Nov 2012 14:38:15 +0100] rev 142
[minhashing] Add a verbose mode
Thu, 15 Nov 2012 14:17:58 +0100 [aligner] Remove a useless import
Simon Chabot <simon.chabot@logilab.fr> [Thu, 15 Nov 2012 14:17:58 +0100] rev 141
[aligner] Remove a useless import
Thu, 15 Nov 2012 14:17:39 +0100 Remove a useless file
Simon Chabot <simon.chabot@logilab.fr> [Thu, 15 Nov 2012 14:17:39 +0100] rev 140
Remove a useless file
Thu, 15 Nov 2012 14:42:05 +0100 Respect (or try to respect) pep8
Simon Chabot <simon.chabot@logilab.fr> [Thu, 15 Nov 2012 14:42:05 +0100] rev 139
Respect (or try to respect) pep8
Thu, 15 Nov 2012 09:38:52 +0100 [minhashing] consume less memory
Simon Chabot <simon.chabot@logilab.fr> [Thu, 15 Nov 2012 09:38:52 +0100] rev 138
[minhashing] consume less memory Storing the whole document matrix was useless because it was a boolean and sparse matrix. So the only stored element are equal to one, and therefore it's useless to store the data list : only the position are interesting. During the signature step, the read lines are useless because they aren't use anymore, so they are deleted to save memory.
Wed, 14 Nov 2012 17:45:49 +0100 [test] Remove a useless line
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 17:45:49 +0100] rev 137
[test] Remove a useless line
Wed, 14 Nov 2012 12:17:02 +0100 [minhashing] Compute the union step by step
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 12:17:02 +0100] rev 136
[minhashing] Compute the union step by step
Wed, 14 Nov 2012 11:23:10 +0100 [minhashing] Buckets are row-dependant
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 11:23:10 +0100] rev 135
[minhashing] Buckets are row-dependant “We can use the same hash function for all the bands, but we use a separate bucket array for each band, so columns with the same vector in different bands will not hash to the same bucket.”
Wed, 14 Nov 2012 11:23:49 +0100 [dataio] Add a forgotten encoding attribut
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 11:23:49 +0100] rev 134
[dataio] Add a forgotten encoding attribut
Wed, 14 Nov 2012 10:50:10 +0100 [test] write the tests for the `alignall()` function
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 10:50:10 +0100] rev 133
[test] write the tests for the `alignall()` function
Wed, 14 Nov 2012 10:30:09 +0100 [aligner] Align the code on 80 car
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 10:30:09 +0100] rev 132
[aligner] Align the code on 80 car
Wed, 14 Nov 2012 10:50:31 +0100 [aligner] Make the alignall() function
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 10:50:31 +0100] rev 131
[aligner] Make the alignall() function
Wed, 14 Nov 2012 10:27:57 +0100 [aligner] Let the user decide on wheter return the global alignement matrix or not
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 10:27:57 +0100] rev 130
[aligner] Let the user decide on wheter return the global alignement matrix or not
Wed, 14 Nov 2012 09:40:30 +0100 [minhashing] Export the demo of minhashing to the new api
Simon Chabot <simon.chabot@logilab.fr> [Wed, 14 Nov 2012 09:40:30 +0100] rev 129
[minhashing] Export the demo of minhashing to the new api
Tue, 13 Nov 2012 16:46:16 +0100 [aligner] Remove useless arguments from findneighbours_clustering()
Simon Chabot <simon.chabot@logilab.fr> [Tue, 13 Nov 2012 16:46:16 +0100] rev 128
[aligner] Remove useless arguments from findneighbours_clustering()
Mon, 26 Nov 2012 10:13:34 +0100 [aligner] clustering: Don't crash if sets are small.
Simon Chabot <simon.chabot@logilab.fr> [Mon, 26 Nov 2012 10:13:34 +0100] rev 127
[aligner] clustering: Don't crash if sets are small.
Tue, 13 Nov 2012 16:42:57 +0100 [aligner,dataio] Export the results writing to a independant function
Simon Chabot <simon.chabot@logilab.fr> [Tue, 13 Nov 2012 16:42:57 +0100] rev 126
[aligner,dataio] Export the results writing to a independant function
Tue, 13 Nov 2012 16:41:31 +0100 [test] Write the test for test_findneighbours_clustering
Simon Chabot <simon.chabot@logilab.fr> [Tue, 13 Nov 2012 16:41:31 +0100] rev 125
[test] Write the test for test_findneighbours_clustering
Tue, 13 Nov 2012 16:04:17 +0100 [dataio] Export some function to the dataio.py file
Simon Chabot <simon.chabot@logilab.fr> [Tue, 13 Nov 2012 16:04:17 +0100] rev 124
[dataio] Export some function to the dataio.py file
Tue, 13 Nov 2012 15:39:13 +0100 [test] add more tests
Vincent Michel <vincent.michel@logilab.fr> [Tue, 13 Nov 2012 15:39:13 +0100] rev 123
[test] add more tests
Tue, 13 Nov 2012 16:04:38 +0100 [align] Update API
Simon Chabot <simon.chabot@logilab.fr> [Tue, 13 Nov 2012 16:04:38 +0100] rev 122
[align] Update API *** [demo] update demo to the new API
Tue, 13 Nov 2012 15:38:52 +0100 [matrix] Make API closer to scipy.spatial and add metrics handling
Vincent Michel <vincent.michel@logilab.fr> [Tue, 13 Nov 2012 15:38:52 +0100] rev 121
[matrix] Make API closer to scipy.spatial and add metrics handling
Tue, 13 Nov 2012 15:38:19 +0100 Cosmit
Vincent Michel <vincent.michel@logilab.fr> [Tue, 13 Nov 2012 15:38:19 +0100] rev 120
Cosmit
Tue, 13 Nov 2012 15:37:59 +0100 [minhashing] Modify API + change threshold handling
Vincent Michel <vincent.michel@logilab.fr> [Tue, 13 Nov 2012 15:37:59 +0100] rev 119
[minhashing] Modify API + change threshold handling
Tue, 13 Nov 2012 15:37:25 +0100 [normalize] Better tokenizer for unicode + stopwords
Vincent Michel <vincent.michel@logilab.fr> [Tue, 13 Nov 2012 15:37:25 +0100] rev 118
[normalize] Better tokenizer for unicode + stopwords
Tue, 13 Nov 2012 15:36:30 +0100 Few corrections and add tests
Vincent Michel <vincent.michel@logilab.fr> [Tue, 13 Nov 2012 15:36:30 +0100] rev 117
Few corrections and add tests
Tue, 13 Nov 2012 10:46:31 +0100 [minhashing] Compute complexite on a huge file
Simon Chabot <simon.chabot@logilab.fr> [Tue, 13 Nov 2012 10:46:31 +0100] rev 116
[minhashing] Compute complexite on a huge file
Mon, 12 Nov 2012 19:14:52 +0100 [minhashing] plot complexity
Simon Chabot <simon.chabot@logilab.fr> [Mon, 12 Nov 2012 19:14:52 +0100] rev 115
[minhashing] plot complexity
Mon, 12 Nov 2012 18:30:00 +0100 [matrix] Let's the distance matrix API looks like scipy's one
Simon Chabot <simon.chabot@logilab.fr> [Mon, 12 Nov 2012 18:30:00 +0100] rev 114
[matrix] Let's the distance matrix API looks like scipy's one
Mon, 12 Nov 2012 16:45:44 +0100 [minhashing] Don't copy uselessly the signature matrix
Simon Chabot <simon.chabot@logilab.fr> [Mon, 12 Nov 2012 16:45:44 +0100] rev 113
[minhashing] Don't copy uselessly the signature matrix
Mon, 12 Nov 2012 17:22:18 +0100 [minhashing] Faster signaturing using numpy
Simon Chabot <simon.chabot@logilab.fr> [Mon, 12 Nov 2012 17:22:18 +0100] rev 112
[minhashing] Faster signaturing using numpy
Mon, 12 Nov 2012 16:46:46 +0100 [minhashing] rewrite main for testing purposes
Simon Chabot <simon.chabot@logilab.fr> [Mon, 12 Nov 2012 16:46:46 +0100] rev 111
[minhashing] rewrite main for testing purposes
Mon, 12 Nov 2012 11:10:24 +0100 [minhashing] Typo
Simon Chabot <simon.chabot@logilab.fr> [Mon, 12 Nov 2012 11:10:24 +0100] rev 110
[minhashing] Typo
Fri, 09 Nov 2012 14:27:22 +0100 Try to optimize buckets computation in min hashing
Vincent Michel <vincent.michel@logilab.fr> [Fri, 09 Nov 2012 14:27:22 +0100] rev 109
Try to optimize buckets computation in min hashing
Fri, 09 Nov 2012 13:26:12 +0100 [refactoring] First round of refactoring/review
Vincent Michel <vincent.michel@logilab.fr> [Fri, 09 Nov 2012 13:26:12 +0100] rev 108
[refactoring] First round of refactoring/review
Mon, 12 Nov 2012 09:34:36 +0100 Add french_lemmas file
Simon Chabot <simon.chabot@logilab.fr> [Mon, 12 Nov 2012 09:34:36 +0100] rev 107
Add french_lemmas file
Fri, 09 Nov 2012 11:15:41 +0100 [aligner] add findneighbours docstring
Simon Chabot <simon.chabot@logilab.fr> [Fri, 09 Nov 2012 11:15:41 +0100] rev 106
[aligner] add findneighbours docstring
Fri, 09 Nov 2012 11:08:24 +0100 Add and delete some XXX
Simon Chabot <simon.chabot@logilab.fr> [Fri, 09 Nov 2012 11:08:24 +0100] rev 105
Add and delete some XXX
Fri, 09 Nov 2012 10:03:02 +0100 [aligner] Don't append and pop. Check before.
Simon Chabot <simon.chabot@logilab.fr> [Fri, 09 Nov 2012 10:03:02 +0100] rev 104
[aligner] Don't append and pop. Check before.
Fri, 09 Nov 2012 10:02:25 +0100 [aligner] Enable the user to give the signature matrix for minhashing
Simon Chabot <simon.chabot@logilab.fr> [Fri, 09 Nov 2012 10:02:25 +0100] rev 103
[aligner] Enable the user to give the signature matrix for minhashing
Thu, 08 Nov 2012 16:36:50 +0100 [demo] minibatch is faster
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 16:36:50 +0100] rev 102
[demo] minibatch is faster
Thu, 08 Nov 2012 16:36:04 +0100 [aligner] Change the default value of k.
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 16:36:04 +0100] rev 101
[aligner] Change the default value of k. k=1 is a more current used value
Thu, 08 Nov 2012 16:30:16 +0100 [Todo] Add a TODO file
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 16:30:16 +0100] rev 100
[Todo] Add a TODO file
Thu, 08 Nov 2012 15:53:27 +0100 [aligner] Handle any dimension for clustering and kdtree
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 15:53:27 +0100] rev 99
[aligner] Handle any dimension for clustering and kdtree
Thu, 08 Nov 2012 15:38:05 +0100 [aligner] Continue to use 1xM matrices for KDTree
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 15:38:05 +0100] rev 98
[aligner] Continue to use 1xM matrices for KDTree
Thu, 08 Nov 2012 14:44:08 +0100 [demo] Display how much time the run took
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 14:44:08 +0100] rev 97
[demo] Display how much time the run took
Thu, 08 Nov 2012 14:43:32 +0100 [aligner] Instead of returning 1xN matrices, return MxN ones
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 14:43:32 +0100] rev 96
[aligner] Instead of returning 1xN matrices, return MxN ones
Thu, 08 Nov 2012 13:25:20 +0100 [demo] Use kmeans instead of kdtree (for testing)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 13:25:20 +0100] rev 95
[demo] Use kmeans instead of kdtree (for testing)
Thu, 08 Nov 2012 13:24:41 +0100 [aligner] Use lazy import for minhashing and kdtree
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 13:24:41 +0100] rev 94
[aligner] Use lazy import for minhashing and kdtree
Thu, 08 Nov 2012 13:24:13 +0100 [aligner] Add kmeans to the available searchers list
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 13:24:13 +0100] rev 93
[aligner] Add kmeans to the available searchers list
Thu, 08 Nov 2012 11:41:01 +0100 [demo] For demo2, don't read the whole file
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 11:41:01 +0100] rev 92
[demo] For demo2, don't read the whole file
Mon, 12 Nov 2012 18:31:44 +0100 [aligner] Define autocasted as a global function
Simon Chabot <simon.chabot@logilab.fr> [Mon, 12 Nov 2012 18:31:44 +0100] rev 91
[aligner] Define autocasted as a global function
Thu, 08 Nov 2012 10:07:56 +0100 [demo] For testing purpose, run only the given demo
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 10:07:56 +0100] rev 90
[demo] For testing purpose, run only the given demo
Thu, 08 Nov 2012 10:07:21 +0100 [demo] Use the sparql queries handling instead of csvvfile
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 10:07:21 +0100] rev 89
[demo] Use the sparql queries handling instead of csvvfile
Thu, 08 Nov 2012 10:06:22 +0100 [aligner] Handle sparql queries
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 10:06:22 +0100] rev 88
[aligner] Handle sparql queries
Thu, 08 Nov 2012 10:08:57 +0100 [aligner] Makes parsefile works with unicode
Simon Chabot <simon.chabot@logilab.fr> [Thu, 08 Nov 2012 10:08:57 +0100] rev 87
[aligner] Makes parsefile works with unicode
Wed, 07 Nov 2012 17:46:11 +0100 Let's make the tests and demo paths indepedant
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 17:46:11 +0100] rev 86
Let's make the tests and demo paths indepedant
Wed, 07 Nov 2012 17:24:42 +0100 [demo] Use the new implemantation of findneigbours()
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 17:24:42 +0100] rev 85
[demo] Use the new implemantation of findneigbours()
Wed, 07 Nov 2012 17:09:20 +0100 [aligner] higher level implementation of KDTree and Minhashing
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 17:09:20 +0100] rev 84
[aligner] higher level implementation of KDTree and Minhashing
Wed, 07 Nov 2012 15:58:46 +0100 [demo] Add a new demo (with a kdtree)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 15:58:46 +0100] rev 83
[demo] Add a new demo (with a kdtree)
Wed, 07 Nov 2012 12:18:00 +0100 [demo] add exemple on custom normalization function usage
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 12:18:00 +0100] rev 82
[demo] add exemple on custom normalization function usage
Wed, 07 Nov 2012 11:43:26 +0100 [demo] Add a new demo of alignment usage
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 11:43:26 +0100] rev 81
[demo] Add a new demo of alignment usage
Wed, 07 Nov 2012 10:47:41 +0100 [normalize] Let's nltk be optional
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 10:47:41 +0100] rev 80
[normalize] Let's nltk be optional
Wed, 07 Nov 2012 10:39:13 +0100 [minhashing] Spelling mistake
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 10:39:13 +0100] rev 79
[minhashing] Spelling mistake
Wed, 07 Nov 2012 10:29:40 +0100 [matrix] spelling mistakes
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 10:29:40 +0100] rev 78
[matrix] spelling mistakes
Wed, 07 Nov 2012 10:21:46 +0100 [distances] spelling mistakes
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 10:21:46 +0100] rev 77
[distances] spelling mistakes
Wed, 07 Nov 2012 10:08:44 +0100 [aligner] Spelling mistakes
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 10:08:44 +0100] rev 76
[aligner] Spelling mistakes
Wed, 07 Nov 2012 10:06:17 +0100 [aligner] Remove useless dependancies
Simon Chabot <simon.chabot@logilab.fr> [Wed, 07 Nov 2012 10:06:17 +0100] rev 75
[aligner] Remove useless dependancies
Tue, 06 Nov 2012 18:03:14 +0100 [demo] Add comments
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 18:03:14 +0100] rev 74
[demo] Add comments
Tue, 06 Nov 2012 16:27:12 +0100 [aligner] Set the parsefile function into aligner module
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 16:27:12 +0100] rev 73
[aligner] Set the parsefile function into aligner module
Tue, 06 Nov 2012 14:05:31 +0100 Add the demo file
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 14:05:31 +0100] rev 72
Add the demo file
Tue, 06 Nov 2012 17:37:36 +0100 Extract the alignment process from the cube to be independant
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 17:37:36 +0100] rev 71
Extract the alignment process from the cube to be independant *** amends 7d398efa1ab38937d1a6aae63ec13fcbfcad1d3f
Tue, 06 Nov 2012 10:51:43 +0100 [matrix] Cancel the 188 changset. Multiplying matrices was, in fact, a bad idea
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 10:51:43 +0100] rev 70
[matrix] Cancel the 188 changset. Multiplying matrices was, in fact, a bad idea It was a bad idea because if two values were identical, the distance was 0, so the product, whereas all others values could be different
Tue, 06 Nov 2012 10:49:10 +0100 Correct some spelling
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 10:49:10 +0100] rev 69
Correct some spelling
Tue, 06 Nov 2012 10:46:30 +0100 [minlsh] Use an iterator to compute the result set
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 10:46:30 +0100] rev 68
[minlsh] Use an iterator to compute the result set Don't load all the results at once…
Tue, 06 Nov 2012 10:45:10 +0100 [minlsh] Remove the useless rows in the signature matrix while searching
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 10:45:10 +0100] rev 67
[minlsh] Remove the useless rows in the signature matrix while searching It's done to save memory…
Tue, 06 Nov 2012 10:48:06 +0100 [distances] For the jaccard distance, consider the set of tokens
Simon Chabot <simon.chabot@logilab.fr> [Tue, 06 Nov 2012 10:48:06 +0100] rev 66
[distances] For the jaccard distance, consider the set of tokens Considering the set of tokens instead of letters is much more accurate. Eg: before this changeset, the jaccard distance between “silence” and “license” was zero. *** [Test] Jaccard implementation changed, so the tests are changed too.
Tue, 30 Oct 2012 16:41:53 +0100 [Matrix] Remove the unused defaultvalue
Simon Chabot <simon.chabot@logilab.fr> [Tue, 30 Oct 2012 16:41:53 +0100] rev 65
[Matrix] Remove the unused defaultvalue
Tue, 30 Oct 2012 16:28:05 +0100 [Align] don't use queries by lists, directly
Simon Chabot <simon.chabot@logilab.fr> [Tue, 30 Oct 2012 16:28:05 +0100] rev 64
[Align] don't use queries by lists, directly
Tue, 30 Oct 2012 16:25:01 +0100 [Matrix] Multiplying instead of adding
Simon Chabot <simon.chabot@logilab.fr> [Tue, 30 Oct 2012 16:25:01 +0100] rev 63
[Matrix] Multiplying instead of adding The way to compute the global matrix was adding the submatrices. The unknown values used to be maximized to avoid false positive. But it was to much penalizing. Now, for unknown values, we set 1, then all matrices are multiplied. Thus, if a value in unknown, it's not penalizing whereas if the value is known, it's positive.
Tue, 30 Oct 2012 16:16:40 +0100 [Distance] The output unit of geographical distance can be precised
Simon Chabot <simon.chabot@logilab.fr> [Tue, 30 Oct 2012 16:16:40 +0100] rev 62
[Distance] The output unit of geographical distance can be precised
Mon, 29 Oct 2012 15:19:09 +0100 [API] Write results according to csv format
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 15:19:09 +0100] rev 61
[API] Write results according to csv format
Mon, 29 Oct 2012 15:07:22 +0100 [API] Start the implemantation of the alignment API
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 15:07:22 +0100] rev 60
[API] Start the implemantation of the alignment API
Mon, 29 Oct 2012 11:44:41 +0100 [Minhashing] Give a thresdhold instead of a abstract "bandsize"
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 11:44:41 +0100] rev 59
[Minhashing] Give a thresdhold instead of a abstract "bandsize" The thresdhold and the bandsize are related and one can be computed knowning the other. Let's the user give the more explicit one.
Mon, 29 Oct 2012 10:21:14 +0100 Order imports
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 10:21:14 +0100] rev 58
Order imports
Mon, 29 Oct 2012 10:17:24 +0100 [Minhashing] Trained data can be saved and loaded
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 10:17:24 +0100] rev 57
[Minhashing] Trained data can be saved and loaded
Mon, 29 Oct 2012 09:52:16 +0100 [distances] The planet radius for the geographical distance can be given
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 09:52:16 +0100] rev 56
[distances] The planet radius for the geographical distance can be given
Mon, 29 Oct 2012 09:46:26 +0100 [distances] Don't concat two strings to know if there is a space in one of them. Make two tests instead
Simon Chabot <simon.chabot@logilab.fr> [Mon, 29 Oct 2012 09:46:26 +0100] rev 55
[distances] Don't concat two strings to know if there is a space in one of them. Make two tests instead
Fri, 26 Oct 2012 11:44:50 +0200 [typo] alignement --> alignment
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 11:44:50 +0200] rev 54
[typo] alignement --> alignment
Fri, 26 Oct 2012 11:43:57 +0200 [Matrix] Maximum distance computation delegated to max() method of numpy.matrix
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 11:43:57 +0200] rev 53
[Matrix] Maximum distance computation delegated to max() method of numpy.matrix
Fri, 26 Oct 2012 11:41:27 +0200 [Minhash] Tests written
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 11:41:27 +0200] rev 52
[Minhash] Tests written
Fri, 26 Oct 2012 09:49:07 +0200 [Matrix] Improvement of the matched() method
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 09:49:07 +0200] rev 51
[Matrix] Improvement of the matched() method Don't run over all indices but ask directly to scipy for the wanted ones.
Fri, 26 Oct 2012 09:54:16 +0200 [Matrix] Change lil_matrix for dense matrix
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 09:54:16 +0200] rev 50
[Matrix] Change lil_matrix for dense matrix Some experiments shown the matrix wasn't sparse at all. So it's better to use the appropriate data structure. *** [Matrix] Why use todense() method if the matrix is… already dense
Fri, 26 Oct 2012 13:43:08 +0200 [Distance] Add geographical distance
Simon Chabot <simon.chabot@logilab.fr> [Fri, 26 Oct 2012 13:43:08 +0200] rev 49
[Distance] Add geographical distance (Equirectangular projection)
Thu, 25 Oct 2012 16:43:50 +0200 [testing] make the alignment testing a little bit more cleaner
Simon Chabot <simon.chabot@logilab.fr> [Thu, 25 Oct 2012 16:43:50 +0200] rev 48
[testing] make the alignment testing a little bit more cleaner Now use rdflib instead of a (bad) homemade code
Thu, 25 Oct 2012 16:41:53 +0200 [Matrix] Adapt the globalalignmentmatrix computation to the previous changeset
Simon Chabot <simon.chabot@logilab.fr> [Thu, 25 Oct 2012 16:41:53 +0200] rev 47
[Matrix] Adapt the globalalignmentmatrix computation to the previous changeset (4c2f7553490b)
Thu, 25 Oct 2012 16:40:42 +0200 [Matrix] Give weighting and normalization at the contruction of the matrix
Simon Chabot <simon.chabot@logilab.fr> [Thu, 25 Oct 2012 16:40:42 +0200] rev 46
[Matrix] Give weighting and normalization at the contruction of the matrix It makes the computation faster
Wed, 24 Oct 2012 19:10:54 +0200 wip
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 19:10:54 +0200] rev 45
wip
Wed, 24 Oct 2012 15:39:38 +0200 [Minhashing] : Really, really, really faster training
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 15:39:38 +0200] rev 44
[Minhashing] : Really, really, really faster training
Wed, 24 Oct 2012 14:51:09 +0200 [Matrix] Compute the global alignement matrix
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 14:51:09 +0200] rev 43
[Matrix] Compute the global alignement matrix
Wed, 24 Oct 2012 14:50:23 +0200 [Matrix] Can pass extra arguments to distance functions
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 14:50:23 +0200] rev 42
[Matrix] Can pass extra arguments to distance functions
Wed, 24 Oct 2012 12:37:56 +0200 [Matrix] Computation handles unknown values
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 12:37:56 +0200] rev 41
[Matrix] Computation handles unknown values
Wed, 24 Oct 2012 11:51:57 +0200 [Test] Updated to the changement of simplify()
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 11:51:57 +0200] rev 40
[Test] Updated to the changement of simplify()
Wed, 24 Oct 2012 11:43:11 +0200 [Normalize] Add docstring to simplify()
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 11:43:11 +0200] rev 39
[Normalize] Add docstring to simplify()
Wed, 24 Oct 2012 11:52:22 +0200 [normalize] Add a stopword removing option to simplify()
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 11:52:22 +0200] rev 38
[normalize] Add a stopword removing option to simplify()
Wed, 24 Oct 2012 11:29:08 +0200 [Minhashing] Api fonctionnelle
Simon Chabot <simon.chabot@logilab.fr> [Wed, 24 Oct 2012 11:29:08 +0200] rev 37
[Minhashing] Api fonctionnelle
Mon, 22 Oct 2012 18:17:17 +0200 [minhashing] First try of minhashing (related to #129000)
Simon Chabot <simon.chabot@logilab.fr> [Mon, 22 Oct 2012 18:17:17 +0200] rev 36
[minhashing] First try of minhashing (related to #129000)
Fri, 19 Oct 2012 18:22:14 +0200 [Matrix] Add basic operations such as add, mul sub, etc
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 18:22:14 +0200] rev 35
[Matrix] Add basic operations such as add, mul sub, etc
Fri, 19 Oct 2012 17:27:40 +0200 [Matrix] Add __repr__ method()
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 17:27:40 +0200] rev 34
[Matrix] Add __repr__ method()
Fri, 19 Oct 2012 16:57:40 +0200 [Matrix] Don't store inputs inside DistanceMatrix object
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 16:57:40 +0200] rev 33
[Matrix] Don't store inputs inside DistanceMatrix object Storing them had not sense, was useless, and it was annoying for future summation of matrices.
Fri, 19 Oct 2012 14:44:26 +0200 [Distance] Spaces are correctly supported by distances functions
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 14:44:26 +0200] rev 32
[Distance] Spaces are correctly supported by distances functions In fact, the problem was 'Victor Hugo' and 'Hugo Victor' had a big distance when we'd like in this case to have a small one (even a zero one !) So, the approach followed was : Construct a distance matrix : | Victor | Hugo Victor | 0 | 5 Hugo | 5 | 0 And return the minimun of the minimun of each row. In fact, we return the maximun of the minimum of the previous matrix, and the its transpose, to handle the following case : | Victor | Hugo | Jean | Victor | 0 | 5 | 6 |--> min of each row : 0 Hugo | 5 | 0 | 4 | | Victor | Hugo | Victor | 0 | 5 | Hugo | 5 | 0 |--> min of each row : 4 Jean | 6 | 4 | Return the max, ie : 4.
Fri, 19 Oct 2012 11:38:54 +0200 [Matrix] Cannot use zip if it not a square matrix \!
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 11:38:54 +0200] rev 31
[Matrix] Cannot use zip if it not a square matrix \!
Fri, 19 Oct 2012 11:21:44 +0200 [Matrix] Matched() return index and value, as tuples
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 11:21:44 +0200] rev 30
[Matrix] Matched() return index and value, as tuples
Fri, 19 Oct 2012 10:46:14 +0200 [Matrix] Matched() has a lower complexity
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 10:46:14 +0200] rev 29
[Matrix] Matched() has a lower complexity Instead of reading the whole matrix and get indexes where the value is under the cutoff (O(N²)), the idea is : - Get all indexes where the value is not null (ie not exact matched) O(N) - Append all indexes that are not in the previous list (exact matched) O(N) - If cutoff > 0, for all indexes where not null, test if the O(N) çvalue < cutoff, and add or not
Fri, 19 Oct 2012 16:58:27 +0200 [Matrix] Matrices are not symetric. Correct it
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 16:58:27 +0200] rev 28
[Matrix] Matrices are not symetric. Correct it
Fri, 19 Oct 2012 10:06:50 +0200 [Test] Assure distances are symetric
Simon Chabot <simon.chabot@logilab.fr> [Fri, 19 Oct 2012 10:06:50 +0200] rev 27
[Test] Assure distances are symetric
Thu, 18 Oct 2012 19:01:27 +0200 [Test] Add tests for matrix
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 19:01:27 +0200] rev 26
[Test] Add tests for matrix
Thu, 18 Oct 2012 18:18:40 +0200 [Test] Tests don't inherite anymore from CWTestCase by directly from unittest2 (be independant)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 18:18:40 +0200] rev 25
[Test] Tests don't inherite anymore from CWTestCase by directly from unittest2 (be independant)
Thu, 18 Oct 2012 18:15:25 +0200 [Normalizer] Lematizer returns a string (better for comparison)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 18:15:25 +0200] rev 24
[Normalizer] Lematizer returns a string (better for comparison)
Thu, 18 Oct 2012 17:58:53 +0200 [Matrix] Enables normalization
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 17:58:53 +0200] rev 23
[Matrix] Enables normalization
Thu, 18 Oct 2012 17:58:28 +0200 [Test] Distance, test euclidean distance on strings too
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 17:58:28 +0200] rev 22
[Test] Distance, test euclidean distance on strings too
Thu, 18 Oct 2012 17:16:44 +0200 [normalize] Using a class was a bad idea, I removed it
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 17:16:44 +0200] rev 21
[normalize] Using a class was a bad idea, I removed it
Thu, 18 Oct 2012 16:35:18 +0200 [Matrix] Compute a distance matrix
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 16:35:18 +0200] rev 20
[Matrix] Compute a distance matrix
Thu, 18 Oct 2012 15:01:47 +0200 [Distance] Exchange 1 and 0 in the soundex distance, because we try to minimize the distance
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 15:01:47 +0200] rev 19
[Distance] Exchange 1 and 0 in the soundex distance, because we try to minimize the distance
Thu, 18 Oct 2012 13:55:25 +0200 [Distance] Temporal distance supports ambiguity and fuzzyness
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 13:55:25 +0200] rev 18
[Distance] Temporal distance supports ambiguity and fuzzyness - You can precise if the day or the year is given in first (day/month/year or year/month/day or month/day/year format). By default, it assumes the current format is the french common used one, ie day/month/year - You can give fuzzy sentence and compare dates : temporal('Jean est né le 1er octobre 1958', 'Le 01-10-1958, Jean est né') yields 0 !
Thu, 18 Oct 2012 12:22:30 +0200 [Normalizer] Add the format method (related to #128998)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 12:22:30 +0200] rev 17
[Normalizer] Add the format method (related to #128998)
Thu, 18 Oct 2012 11:41:59 +0200 Add LGPL to distance.py and normalize.py
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 11:41:59 +0200] rev 16
Add LGPL to distance.py and normalize.py
Thu, 18 Oct 2012 10:19:15 +0200 [Normalizer] Add a normaliser (related to #128998)
Simon Chabot <simon.chabot@logilab.fr> [Thu, 18 Oct 2012 10:19:15 +0200] rev 15
[Normalizer] Add a normaliser (related to #128998) - unormalize - tokenize - lemmatize - round
Wed, 17 Oct 2012 16:47:07 +0200 [distances] Add an euclidean distance function between two numbers (closes #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 16:47:07 +0200] rev 14
[distances] Add an euclidean distance function between two numbers (closes #128982)
Wed, 17 Oct 2012 15:31:11 +0200 [distance] Add a new distance between dates (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 15:31:11 +0200] rev 13
[distance] Add a new distance between dates (related #128982)
Wed, 17 Oct 2012 12:34:28 +0200 [distances] Add jaccard distance (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 12:34:28 +0200] rev 12
[distances] Add jaccard distance (related #128982)
Wed, 17 Oct 2012 12:05:02 +0200 [distances] Add the soundex distance (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 12:05:02 +0200] rev 11
[distances] Add the soundex distance (related #128982)
Wed, 17 Oct 2012 12:04:41 +0200 [distance] move soundex to soundexcode (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 12:04:41 +0200] rev 10
[distance] move soundex to soundexcode (related #128982) In fact, soundexcode is the function returning the soundex code of a word, and soundex will be the 1/0 distance between two words. (1 meaning both have the same code, 0 otherwise)
Wed, 17 Oct 2012 11:56:22 +0200 [distances] Remove some trailing spaces (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:56:22 +0200] rev 9
[distances] Remove some trailing spaces (related #128982)
Wed, 17 Oct 2012 11:56:04 +0200 [distance] Start the iteration at 1, not 0 because we don't care about the first
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:56:04 +0200] rev 8
[distance] Start the iteration at 1, not 0 because we don't care about the first letter (related #128982)
Wed, 17 Oct 2012 11:55:23 +0200 [distances] Correct an IndexError in the soundex code (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:55:23 +0200] rev 7
[distances] Correct an IndexError in the soundex code (related #128982) As far as we don't know if word[i + 2] is consonant, we use get because it can be a vowel (and crash…)
Wed, 17 Oct 2012 11:53:17 +0200 [distance] Soudex : if there is a vowel between two identical numbered
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:53:17 +0200] rev 6
[distance] Soudex : if there is a vowel between two identical numbered consonants, count those consonants twice. (related #128982)
Wed, 17 Oct 2012 11:51:42 +0200 [test] Add some other tests to soudex, and some explanations too
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:51:42 +0200] rev 5
[test] Add some other tests to soudex, and some explanations too (related #128982)
Wed, 17 Oct 2012 11:50:55 +0200 [test] The test for soundex was false, I corrected it (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 11:50:55 +0200] rev 4
[test] The test for soundex was false, I corrected it (related #128982)
Wed, 17 Oct 2012 16:43:42 +0200 [tests] Add tests for soundex and levenshtein (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 16:43:42 +0200] rev 3
[tests] Add tests for soundex and levenshtein (related #128982)
Wed, 17 Oct 2012 16:42:48 +0200 [distances] Add soundex code (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 16:42:48 +0200] rev 2
[distances] Add soundex code (related #128982)
Wed, 17 Oct 2012 16:40:51 +0200 [distance] add Levenshtein distance (related #128982)
Simon Chabot <simon.chabot@logilab.fr> [Wed, 17 Oct 2012 16:40:51 +0200] rev 1
[distance] add Levenshtein distance (related #128982)
Fri, 12 Oct 2012 10:23:58 +0200 Initial commit
Simon Chabot <simon.chabot@logilab.fr> [Fri, 12 Oct 2012 10:23:58 +0200] rev 0
Initial commit
(0) -200 +200 tip