IRC3 =============== **IRC3** (_**I**ndexation par **R**echerche et **C**omparaison de **C**haînes de **C**aractères_ = indexing by search and comparison of character strings) is a simple and robust programme to search and extract from a corpus of text files the fixed expressions — as chemicals, scientific names of animals or plants, author names, etc. — belonging to a finite list. **N.B.**: the list of terms and the different texts must be in **UTF-8** (without [BOM](https://fr.wikipedia.org/wiki/Indicateur_d%27ordre_des_octets)). ### Usage ``` IRC3.pl -t table -r directory [ -e extension ]* [ -s output_file ] [ -l log ] [ -cq ] IRC3.pl -t table -f input_file [ -s output_file ] [ -l log ] [ -cq ] IRC3.pl -t table [ -e extension ]* [ -s output_file ] [ -l log ] [ -cq ] IRC3.pl -h ``` ### Options ``` -c considers the letter case (uppercase/lowercase) of searched terms -e indicates the the extension (e.g. “.txt”) of the text files to process (you can have several extensions by repeating that option) -f indicates the name of the input file to process -h displays that help -l indicates the name of the log file in which the number of found terms and occurrences is recorded -q suppresses the display of the work progression (especially for use in a script shell) -r indicates the directory containing the files to be processed -s indicates the name of the output file -t indicates the name of the file contining the resource, i.e. the list of searched terms ``` ### Resource The resource file contains one line per term. You can indicates the preferential form of a term by adding it at the end of the line after one or more tab characters. Empty lines or lines starting with the “#” character are not considered. Moreover, the resource may be a file compressed by `gzip` or `bzip2`. ### Result The output file contains one line per found occurrence. Each line is formed of 4 tab-separated fields which are respectively: * the name of the processed file (“STDIN” for the standard input), * the term as it is in the resource, * the term as it appears in the analysed text, * the preferential form in the case of a synonym.