Diferencia entre revisiones de «Usuario discusión:Peter Bowman/Archivo 3»

Contenido eliminado Contenido añadido
Sin resumen de edición
Sin resumen de edición
Línea 60:
 
--[[Usuario:Kvdrgeus|Kvdrgeus]] ([[Usuario discusión:Kvdrgeus|discusión]]) 09:29 29 mar 2018 (UTC)
 
Hi Peter,
 
I perform already many years an off-site analysis of all page contents of a wiki!
I first started in the Spanish dictionary many years ago and corrected many errors.
I also did a lot of Spanish verb-conjugations.
(I learned Spanish that way which was necessary because since my retirement I live on the Canary Islands)
Then I changed to the Dutch dictionary since that is my native language and I know so much more of that.
 
I have worked all the time by the slow method of getting words one by one by asking a word, analyzing the web-page and extracting the relevant information.
I initially started with all the words from https://es.wiktionary.org/wiki/Especial:Todas
(which in the beginning was not that much)
The updates were got of course from the https://es.wiktionary.org/wiki/Especial:CambiosRecientes
This way of working is not what causes a problem to me at the moment although I am certainly interested in using the back-up’s that you mentioned.
 
The problem is that I know exactly what I want to change in the words but that it takes so much time to apply those changes. (I use Autohotkey for this purpose) especially if you have to change more than 100.000 words.
 
NB To see if I could use the backups you mentioned to synchronize my system I downloaded the file eswiktionary-20180320-pages-meta-current.xml.bz2 (68.5 MB) and extracted it with Winzip to a text-file. This gave me a file of about 864 MB.
I have already written a program that extracts the contents. (was not difficult)
It extracted 28.684.456 records from the file and found 901.654 items (‘words’)
The process took 16 minutes to build the whole database.
(This is much slower than you mentioned but in my case it also had to build the B-tree were I keep my info)
I was surprised to see the order that had the items in the backup (complete disorder)
This is very handy to start with a fresh database!!! But I still will use the old method to keep the system synchronized in the times between back-ups.
Thank you very much for the suggestions made!!!
 
 
un saludo,
 
--[[Usuario:Kvdrgeus|Kvdrgeus]] ([[Usuario discusión:Kvdrgeus|discusión]]) 07:59 31 mar 2018 (UTC)
Volver a la página del usuario «Peter Bowman/Archivo 3».