FOSSology  3.2.0rc1
Open Source License Compliance by Open Source Software
doctorBuffer_utils.c File Reference

Doctor buffer utilities for debugging. More...

#include "doctorBuffer_utils.h"
#include "nomos.h"
#include "list.h"
#include "util.h"
#include "nomos_regex.h"
Include dependency graph for doctorBuffer_utils.c:

Go to the source code of this file.

Macros

#define INVISIBLE   (int) '\377'
 

Functions

int compressDoctoredBuffer (char *textBuffer)
 garbage collect: eliminate all INVISIBLE characters in the buffer More...
 
void removeHtmlComments (char *buf)
 Remove HTML comments from buffer without removing comment text. More...
 
void removeLineComments (char *buf)
 Remove comments that start at the beginning of a line. More...
 
void cleanUpPostscript (char *buf)
 Remove newlines from buffer. More...
 
void removeBackslashesAndGTroffIndicators (char *buf)
 Remove groff/troff font-size indicators, the literal string backslash-n and all backslahes, ala. More...
 
void convertWhitespaceToSpaceAndRemoveSpecialChars (char *buf, int isCR)
 Convert white-space to real spaces, and remove unnecessary punctuation. More...
 
void dehyphen (char *buf)
 
void removePunctuation (char *buf)
 Clean up miscellaneous punctuation. More...
 
void ignoreFunctionCalls (char *buf)
 Ignore function calls to print routines. More...
 
void convertSpaceToInvisible (char *buf)
 
void doctorBuffer (char *buf, int isML, int isPS, int isCR)
 Convert a buffer of multiple stuff to text-only, separated by spaces. More...
 

Detailed Description

Doctor buffer utilities for debugging.

Definition in file doctorBuffer_utils.c.

Function Documentation

void cleanUpPostscript ( char *  buf)

Remove newlines from buffer.

Parameters
[in,out]buf

Definition at line 231 of file doctorBuffer_utils.c.

int compressDoctoredBuffer ( char *  textBuffer)

garbage collect: eliminate all INVISIBLE characters in the buffer

Parameters
[in,out]textBufferBuffer to compress
Returns
Size difference between orifinal and compressed buffer

Definition at line 36 of file doctorBuffer_utils.c.

void convertSpaceToInvisible ( char *  buf)

Convert the regex ' [X ]+' (where X is really the character #defined as INVISIBLE) to a single space (and a string of INVISIBLE characters).

Parameters
[in,out]buf

Definition at line 541 of file doctorBuffer_utils.c.

void convertWhitespaceToSpaceAndRemoveSpecialChars ( char *  buf,
int  isCR 
)

Convert white-space to real spaces, and remove unnecessary punctuation.

`tr -d '*=+#$|%.,:;!?()\][\140\047\042' | tr '\011\012\015' ' '`

Parameters
[in,out]buf
Note
We purposely do NOT process backspace-characters here. Perhaps there's an improvement in the wings for this?

Definition at line 296 of file doctorBuffer_utils.c.

void dehyphen ( char *  buf)

Look for hyphenations of words, to compress both halves into a single (sic) word. Regex == "[a-z]- [a-z]".

Parameters
[in,out]buf
Note
Not sure this will work based on the way we strip punctuation out of the buffer above – work on this later.

Definition at line 437 of file doctorBuffer_utils.c.

void doctorBuffer ( char *  buf,
int  isML,
int  isPS,
int  isCR 
)

Convert a buffer of multiple stuff to text-only, separated by spaces.

The steps followed in this function are:

  1. Filter HTML/XML comments using removeHtmlComments()
  2. Filter code comments using removeLineComments()
  3. Filter post scripts using cleanUpPostscript()
  4. Filter groff/troff using removeBackslashesAndGTroffIndicators()
  5. Filter spaces and special characters using convertWhitespaceToSpaceAndRemoveSpecialChars()
  6. Filter hyphen strings using dehyphen()
  7. Filter punctuation using removePunctuation()
  8. Ignore print routines using ignoreFunctionCalls()
  9. Filter spaces using convertSpaceToInvisible()
  10. Compress the buffer using compressDoctoredBuffer()
    Parameters
    [in,out]bufBuffer to filter
    [in]isMLBuffer contains HTML/XML data
    [in]isPSBuffer contains post script data
    [in]isCR

Definition at line 586 of file doctorBuffer_utils.c.

void ignoreFunctionCalls ( char *  buf)

Ignore function calls to print routines.

Only concentrate on what's being printed (sometimes programs do print licensing information) – but don't ignore real words that END in 'print', like footprint and fingerprint.

Here, we take a risk and just look for a 't' (in "footprint"), or for an 'r' (in "fingerprint"). If someone has ever coded a print routine that is named 'rprint' or tprint', we're spoofed.

Parameters
[in,out]buf

Definition at line 516 of file doctorBuffer_utils.c.

void removeBackslashesAndGTroffIndicators ( char *  buf)

Remove groff/troff font-size indicators, the literal string backslash-n and all backslahes, ala.

`perl -pe 's,\s[+-][0-9]*,,g;s,\s[0-9]*,,g;s/\n//g;' | f`

Parameters
[in,out]buf

Definition at line 257 of file doctorBuffer_utils.c.

void removeHtmlComments ( char *  buf)

Remove HTML comments from buffer without removing comment text.

Parameters
[in,out]buf

Definition at line 54 of file doctorBuffer_utils.c.

void removeLineComments ( char *  buf)

Remove comments that start at the beginning of a line.

Comments like *, ^dnl, ^xcomm, ^comment, and // preserving the comment text

Parameters
[in,out]buf

when MODULE_LICENSE("GPL") is outcommented, do not get rid of this line.

Definition at line 131 of file doctorBuffer_utils.c.

void removePunctuation ( char *  buf)

Clean up miscellaneous punctuation.

perl -pe 's,[-_/]+ , ,g;s/print[_a-zA-Z]* //g;s/ / /g;'

Parameters
[in,out]buf

Definition at line 478 of file doctorBuffer_utils.c.