with @ezs/analytics@1.19.1 [expand] is not necessary #14

Closed thouveni wants to merge 1 commit into tdm:master from tdm:more-easy-mapping
@thouveni thouveni commented on 5 Sep 2022

à valider avant de merger

Ça marche bien quand on relance plusieurs fois la même requête (la première fois, il y a le temps de chargement de la table, puis c'est toujours en-dessous des 300ms).

Edit: dans VSCode.

La plus grosse table étant celle halAuthorId/idRef, j'ai essayé via le fichier examples.http. La première fois, ça a mis 9s. Les suivantes, entre 100ms et 220ms.

Mais quand j'essaye une requête avec des valeurs pas encore rencontrées, ça se remet à prendre plus de 11s:

$ time curl -X 'POST' \
  'http://localhost:31976/v1/halAuthorId/idRef/json' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '[
  {
    "id": 0,
    "value": "https://data.archives-ouvertes.fr/author/11209187"
  },
  { "id": 1, "value": 619030 }
]'
[{
    "id": 0,
    "value": "http://www.idref.fr/254974716/id"
},
{
    "id": 1,
    "value": "http://www.idref.fr/254015476/id"
}]
curl -X 'POST' 'http://localhost:31976/v1/halAuthorId/idRef/json' -H  -H  -d   0,00s user 0,01s system 0% cpu 11,128 total

11,128 secondes.

Plus étonnant, en relançant la même requête quelques minutes plus tard, elle prend toujours 10,7s (je m'attendais à ce qu'elle soit aux alentours de 200ms).

$ time curl -X 'POST' \
  'http://localhost:31976/v1/halAuthorId/idRef/json' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '[
  {
    "id": 0,
    "value": "https://data.archives-ouvertes.fr/author/11209187"
  },
  { "id": 1, "value": 619030 }
]'
[{
    "id": 0,
    "value": "http://www.idref.fr/254974716/id"
},
{
    "id": 1,
    "value": "http://www.idref.fr/254015476/id"
}]curl -X 'POST' 'http://localhost:31976/v1/halAuthorId/idRef/json' -H  -H  -d   0,01s user 0,00s system 0% cpu 10,702 total

Du côté serveur, c'est bien l'instruction combine qui prend du temps:

  ezs New connection 1662559553699-579604 +0ms
  ezs Create middleware 'knownPipeline' for POST /v1/halAuthorId/idRef/json +0ms
  ezs PID 14786 will execute /v1/halAuthorId/idRef/json commands with 0B of global parameters +0ms
  ezs ezs will not uncompress stream. +0ms
  ezs ezs will not compress stream. +0ms
  ezs These statements are registered: BUFObject,OBJCount,OBJNamespaces,OBJStandardize,OBJFlatten,TXTParse,TXTObject,TXTConcat,XMLParse,XMLString,XMLConvert,CSVParse,CSVObject,CSVString,JSONParse,JSONString,URLFetch,URLPagination,URLRequest,URLParse,URLString,URLStream,URLConnect,TXTZip,ZIPExtract,INIString,FILESave,bufferify,standardize,split,segmenter +0ms
  ezs These statements are registered: count,distinct,graph,pair,pluck,keys,minimizing,maximizing,merging,reducing,exploding,summing,groupingByEquality,groupingByModulo,groupingByLevenshtein,groupingByHamming,tune,value,sort,segment,output,slice,distribute,greater,less,drop,filter,multiply,distance,aggregate,statistics,combine,expand,files,bufferize,buffers,upload,throttle +0ms
  ezs [combine] with sub pipeline. +0ms
  ezs DB from /tmp/store/combine/95d48e42-5872-411e-b1a4-109e9d87df21/14786 was created +0ms
  ezs 2 chunks have been delegated +0ms
  ezs 0.0043s cumulative 0.0006s elapsed for [transit] +0ms
  ezs 0.0111s cumulative 0.0060s elapsed for [transit] +0ms
  ezs 0.0100s cumulative 0.0057s elapsed for [JSONParse] +0ms
  ezs 0.0095s cumulative 0.0064s elapsed for [assign] +0ms
  ezs 9.0275s cumulative 9.0247s elapsed for [files] +0ms
  ezs 10.0403s cumulative 0.0181s elapsed for [CSVParse] +0ms
  ezs 10.0457s cumulative 5.0026s elapsed for [CSVObject] +0ms
  ezs 10.0453s cumulative 7.0269s elapsed for [replace] +0ms
  ezs 10.0450s cumulative 10.0348s elapsed for [saveIn] +0ms
  ezs DB from /tmp/store/combine/95d48e42-5872-411e-b1a4-109e9d87df21/14786 is closing +0ms
  ezs DB from /tmp/store/combine/95d48e42-5872-411e-b1a4-109e9d87df21/14786 is clearing +0ms
  ezs 10.0524s cumulative 10.0496s elapsed for [combine] +0ms
  ezs 10.0518s cumulative 0.0003s elapsed for [assign] +0ms
  ezs 10.0512s cumulative 0.0003s elapsed for [JSONString] +0ms
  ezs 10.0955s cumulative 0.0465s elapsed for [delegate] +0ms
  ezs 10.0943s cumulative 0.0004s elapsed for [unamed] +0ms
  ezs Connection closed 1662559553699-579604 +0ms

La trace montre que combine utilise encore le répertoire /tmp ce qui semble indiquer que ce n'est pas la toute dernière version qui est utilisée.

mais, dans tous les cas, tu as raison ça restera long : [combine] exécute le sous flux une seule fois par jeu de données mais il le refait à chaque fois. Pour éviter cela il faut utiliser [expand] et donc la version d'origine... cqfd

@thouveni thouveni referenced the pull request on 20 Sep 2022

so at now expand is necessary

@thouveni thouveni closed this pull request on 20 Sep 2022
Labels

Priority
default
Milestone
No milestone
Assignee
No one
2 participants
@thouveni @parmentf