Sisyphe-go is a golang command line application for recursive analysis of directories and files of scientific publishing corpus

@Nacim Nacim authored on 7 Feb 2022
.gitignore add log 2 years ago
Dockerfile small optimization 2 years ago
README.md Improve pdf (metadata + number word) and add more logs 2 years ago
docker-compose.yml add xml and flags 2 years ago
go.mod Improve pdf (metadata + number word) and add more logs 2 years ago
go.sum Improve pdf (metadata + number word) and add more logs 2 years ago
main.go small optimization 2 years ago
pdf.go small optimization 2 years ago
xml.go small optimization 2 years ago
README.md

sisyphe

Sisyphe-GO

Sisyphe is a generic Golang recursive folder analyser terminal application

Sisyphe-pic

Requirements

Tested with Golang 1.17

Works on Linux/OSX/Windows

Mount a corpus folder and :

docker-compose up -d
docker exec sisyphe_go_go_1 -it go run . -n corpusname -c corpuspath -o outputpath

Install it on local

  1. Download the latest Sisyphe-go version
  2. Just do : go build .
  3. ... that's it.

Help

go run . --help Will output help

Options

--help  Output usage
-c      Configuration folder path
-n      Corpus name (default "test")
-o      Output directory where results are written
-p      Corpus path
-w      Counting word on pdf

How it works ?

Just start Sisyphe on a folder with any files in it.

go run . ~/Documents/customfolder/corpus -n corpusname -o outputpath

go run . ~/Documents/customfolder/corpus -n corpusname -c ~/Documents/customfolder/corpusResources -o outputpath

Sisyphe is now working in background with all your computer thread. Just take a coffee and wait , it will prevent you when it's done :)

The results of sisyphe are present @ sisyphe/out/{timestamp}-corpusname/ (errors,info,duration..)

Sisyphe-dashboard

Modules

There is a list of default modules (focused on xml & pdf).

Those URL NEED to be updated when merge branch will be ok.

  • FILETYPE Will detect mimetype,extension, corrupted files..
  • PDF Will get info from PDF (version, author, meta...)
  • XML Will check if it's wellformed, valid-dtd's, get elements from balises ...
  • XPATH Will generate a complete list of xpaths from submitted folder
  • OUT Will export data to json file & ElasticSearch database