Headline

Tokenize 101repo files and compute some basic metrics

Description

This module looks up the GeSHi code from the metadata for a file and applies GeSHi and MegaLib to extract a token sequence from the the file contents. What GeSHi considers a token for a given language is quite likely not exactly what a proper lexer for the language would consider a token. This is because GeSHi does not perform proper lexical analysis. In addition to the token sequence, this module also computes also basic metrics such as "lines of code".

Input

http://data.101companies.org/dumps/matches.json as prepared by Module matches101meta

Output

  • *.tokens.json file per 101repo file with GeSHi code per metadata
  • *.metrics.json file per 101repo file with GeSHi code per metadata

Metadata


Marcel Heinz edited this article at Thu, 09 Nov 2017 13:10:23 +0100
Compare revisions Compare revisions

User contributions

    This user never has never made submissions.

    User edits

    Syntax for editing wiki

    For you are available next options:

    will make text bold.

    will make text italic.

    will make text underlined.

    will make text striked.

    will allow you to paste code headline into the page.

    will allow you to link into the page.

    will allow you to paste code with syntax highlight into the page. You will need to define used programming language.

    will allow you to paste image into the page.

    is list with bullets.

    is list with numbers.

    will allow your to insert slideshare presentation into the page. You need to copy link to presentation and insert it as parameter in this tag.

    will allow your to insert youtube video into the page. You need to copy link to youtube page with video and insert it as parameter in this tag.

    will allow your to insert code snippets from @worker.

    10 most similar pages:

    Syntax for editing wiki

    For you are available next options:

    will make text bold.

    will make text italic.

    will make text underlined.

    will make text striked.

    will allow you to paste code headline into the page.

    will allow you to link into the page.

    will allow you to paste code with syntax highlight into the page. You will need to define used programming language.

    will allow you to paste image into the page.

    is list with bullets.

    is list with numbers.

    will allow your to insert slideshare presentation into the page. You need to copy link to presentation and insert it as parameter in this tag.

    will allow your to insert youtube video into the page. You need to copy link to youtube page with video and insert it as parameter in this tag.

    will allow your to insert code snippets from @worker.