Sisyphe-go is a golang command line application for recursive analysis of directories and files of scientific publishing corpus

example Fix last bug and update indexation after analyze 10 months ago
kibanatemplates Evolution indexation, dashboard, user, variable 10 months ago
nginx.conf.d add kibana files 1 year ago
.dockerignore add log for display output indexation script 1 year ago
.gitignore Evolution indexation, dashboard, user, variable 10 months ago
Dockerfile Ajout paquet coreutils pour GNU sort 7 months ago
README.md Adaptation RAM et cpus rééls 7 months ago
docker-compose.yml Evolution indexation, dashboard, user, variable 10 months ago
go.mod Fix last bug and update indexation after analyze 10 months ago
go.sum Fix last bug and update indexation after analyze 10 months ago
indexCorpus.sh suppression argument corpuspath inutile 7 months ago
main.go File attente à 32 5 months ago
pdf.go fix bugs 1 year ago
pdf_test.go refactoring logging 1 year ago
struct.go add error in analyze log If error in xpath analysis 10 months ago
util.go Fix last bug and update indexation after analyze 10 months ago
xml.go add error in analyze log If error in xpath analysis 10 months ago
xml_test.go add error in analyze log If error in xpath analysis 10 months ago
xpath.sh Adaptation RAM et cpus rééls 7 months ago
README.md

Sisyphe-GO

Sisyphe-go is a generic Golang recursive folder analyser terminal application

Requirements

Tested with Golang 1.18

Works on Linux/OSX/Windows

Create and fill in the following environment variables on the host machine

  • WORK
  • CORPUS_RESOURCES
  • SISYPHE_OUT
  • ELASTIC_URL
  • ELASTIC_PORT
  • KIBANA_PORT
  • UID=$(id -u)
  • GID=$(id -g)

Execution

Generic analysis

docker-compose up -d
docker exec -t sisyphe-go_go_1 go run . -n corpusName -p corpusPath -o outputPath

Detailed analysis

docker exec -t sisyphe-go_go_1 go run . -n corpusName -c corpusResourcesPath -p corpusPath -o outputPath

Example:

docker exec -t sisyphe-go_go_1 go run . -n karger-ebooks-2022-08-08-detaillee -c /corpus-resources -p /work/sample/karger_2020_11_06

By default the program will write its results in SISYPHE_OUT

Install it on local

  1. Download the latest Sisyphe-go version
  2. Just do : go build .
  3. ... that's it.

Help

go run . --help Will output help

Options

--help      Output usage
-c          Configuration folder path
-n          Corpus name (default "test")
-o          Output directory where results are written
-p          Corpus path
-w          Counting word on pdf
-noanalyze  Disable analysis
-noindex    Disable indexation
-noxpath    Disable xpaths.csv file generation
-noattval   xpaths.csv without attribute value

How it works ?

Just start Sisyphe-go on a folder with any files in it.

go run . -p ~/Documents/customfolder/corpus -n corpusname -o outputpath

go run . -p ~/Documents/customfolder/corpus -n corpusname -c ~/Documents/customfolder/corpusResources -o outputpath

Sisyphe-go is now working in background with all your computer thread. Just take a coffee and wait , it will prevent you when it's done :)

The results of sisyphe-go are present in outputpath/{timestamp}-corpusName/ (errors,info,duration..)

Test

Just run go test

For cover go test -cover

Modules

  • PDF Usage of pdf lib (pdftotext and pdfinfo)
  • XML Usage of xml lib (xmlstarlet and xmllint)