MOAB: Mesh Oriented datABase  (version 5.4.1)
MBMesquite::FileTokenizer Class Reference

Parse a file as space-separated tokens. More...

#include <FileTokenizer.hpp>

Public Member Functions

 FileTokenizer (std::FILE *file_ptr)
 constructor
 ~FileTokenizer ()
 destructor : closes file.
const char * get_string (MsqError &err)
 get next token
bool get_newline (MsqError &err)
 check for newline
bool get_doubles (size_t count, double *array, MsqError &err)
 Parse a sequence of double values.
bool get_floats (size_t count, float *array, MsqError &err)
 Parse a sequence of float values.
bool get_integers (size_t count, int *array, MsqError &err)
 Parse a sequence of integer values.
bool get_long_ints (size_t count, long *array, MsqError &err)
 Parse a sequence of integer values.
bool get_short_ints (size_t count, short *array, MsqError &err)
 Parse a sequence of integer values.
bool get_bytes (size_t count, unsigned char *array, MsqError &err)
 Parse a sequence of integer values.
bool get_booleans (size_t count, bool *array, MsqError &err)
 Parse a sequence of bit or boolean values.
bool eof () const
int line_number () const
void unget_token ()
bool match_token (const char *string, MsqError &err)
int match_token (const char *const *string_list, MsqError &err)

Private Member Functions

bool get_double_internal (double &result, MsqError &err)
bool get_long_int_internal (long &result, MsqError &err)
bool get_boolean_internal (bool &result, MsqError &err)
bool get_float_internal (float &result, MsqError &err)
bool get_integer_internal (int &result, MsqError &err)
bool get_short_int_internal (short &result, MsqError &err)
bool get_byte_internal (unsigned char &result, MsqError &err)

Private Attributes

std::FILE * filePtr
char buffer [512]
char * nextToken
char * bufferEnd
int lineNumber
char lastChar

Detailed Description

Parse a file as space-separated tokens.

Author:
Jason Kraftcheck
Date:
30 Sept 2004

Read a file, separating it into space-separated tokens. This is provided in place of using the standard C or C++ file parsing routines because it counts lines, which is useful for error reporting. Also provides some useful utility methods for parsing VTK files (which is the intended use of this implementation.)

Uses raw reads/writes, implementing internal buffering. Token size may not exceed buffer size.

Definition at line 56 of file mesquite/Misc/FileTokenizer.hpp.


Constructor & Destructor Documentation

MBMesquite::FileTokenizer::FileTokenizer ( std::FILE *  file_ptr)

constructor

Parameters:
file_ptrThe file to read from.

Definition at line 36 of file mesquite/Misc/FileTokenizer.cpp.

    : filePtr( file_ptr ), nextToken( buffer ), bufferEnd( buffer ), lineNumber( 1 ), lastChar( '\0' )
{
}

destructor : closes file.

The destructor closes the passed file handle. This is done as a convenience feature. If the caller creates an instance of this object on the stack, the file will automatically be closed when the caller returns.

Definition at line 41 of file mesquite/Misc/FileTokenizer.cpp.

References filePtr.

{
    fclose( filePtr );
}

Member Function Documentation

Check for end-of-file condition.

Definition at line 46 of file mesquite/Misc/FileTokenizer.cpp.

References bufferEnd, filePtr, and nextToken.

Referenced by get_newline(), MBMesquite::MeshImpl::read_vtk(), FileTokenizerTest::token_test(), and MBMesquite::MeshImpl::vtk_read_dataset().

{
    return nextToken == bufferEnd && feof( filePtr );
}
bool MBMesquite::FileTokenizer::get_boolean_internal ( bool &  result,
MsqError err 
) [private]

Internal implementation of get_Booleans()

Definition at line 254 of file mesquite/Misc/FileTokenizer.cpp.

References get_string(), line_number(), MSQ_CHKERR, MSQ_SETERR, and MBMesquite::MsqError::PARSE_ERROR.

Referenced by get_booleans().

{
    // Get a token
    const char* token = get_string( err );
    if( MSQ_CHKERR( err ) ) return false;

    if( token[1] || ( token[0] != '0' && token[0] != '1' ) )
    {
        MSQ_SETERR( err )
        ( MsqError::PARSE_ERROR, "Syntax error at line %d: expected 0 or 1, got \"%s\"", line_number(), token );
        return false;
    }

    result = token[0] == '1';
    return true;
}
bool MBMesquite::FileTokenizer::get_booleans ( size_t  count,
bool *  array,
MsqError err 
)

Parse a sequence of bit or boolean values.

Read the specified number of space-deliminated values.

Parameters:
countThe number of values to read.
arrayThe memory at which to store the values.
Returns:
true if successful, false otherwise.

Definition at line 336 of file mesquite/Misc/FileTokenizer.cpp.

References get_boolean_internal(), and MSQ_CHKERR.

Referenced by FileTokenizerTest::boolean_test(), and MBMesquite::MeshImpl::vtk_read_typed_data().

{
    for( size_t i = 0; i < count; ++i )
    {
        get_boolean_internal( *array, err );
        if( MSQ_CHKERR( err ) ) return false;
        ++array;
    }
    return true;
}
bool MBMesquite::FileTokenizer::get_byte_internal ( unsigned char &  result,
MsqError err 
) [private]

Internal implementation of get_bytes()

Definition at line 206 of file mesquite/Misc/FileTokenizer.cpp.

References get_long_int_internal(), line_number(), MSQ_CHKERR, MSQ_SETERR, and MBMesquite::MsqError::PARSE_ERROR.

Referenced by get_bytes().

{
    long i = 0;
    get_long_int_internal( i, err );
    if( MSQ_CHKERR( err ) ) return false;

    result = (unsigned char)i;
    if( i != (long)result )
    {
        MSQ_SETERR( err )( MsqError::PARSE_ERROR, "Numberic overflow at line %d.", line_number() );
        return false;
    }

    return true;
}
bool MBMesquite::FileTokenizer::get_bytes ( size_t  count,
unsigned char *  array,
MsqError err 
)

Parse a sequence of integer values.

Read the specified number of space-deliminated ints.

Parameters:
countThe number of values to read.
arrayThe memory at which to store the values.
Returns:
true if successful, false otherwise.

Definition at line 292 of file mesquite/Misc/FileTokenizer.cpp.

References get_byte_internal(), and MSQ_CHKERR.

{
    for( size_t i = 0; i < count; ++i )
    {
        get_byte_internal( *array, err );
        if( MSQ_CHKERR( err ) ) return false;
        ++array;
    }
    return true;
}
bool MBMesquite::FileTokenizer::get_double_internal ( double &  result,
MsqError err 
) [private]

Internal implementation of get_doubles()

Definition at line 136 of file mesquite/Misc/FileTokenizer.cpp.

References get_string(), line_number(), MSQ_CHKERR, MSQ_SETERR, and MBMesquite::MsqError::PARSE_ERROR.

Referenced by get_doubles(), and get_float_internal().

{
    // Get a token
    const char *token_end, *token = get_string( err );
    if( MSQ_CHKERR( err ) ) return false;

    // Check for hex value -- on some platforms (e.g. Linux), strtod
    // will accept hex values, on others (e.g. Sun) it wil not.  Force
    // failure on hex numbers for consistancy.
    if( token[0] && token[1] && token[0] == '0' && toupper( token[1] ) == 'X' )
    {
        MSQ_SETERR( err )
        ( MsqError::PARSE_ERROR, "Syntax error at line %d: expected number, got \"%s\"", line_number(), token );
        return false;
    }

    // Parse token as double
    result = strtod( token, (char**)&token_end );

    // If the one past the last char read by strtod is
    // not the NULL character terminating the string,
    // then parse failed.
    if( *token_end )
    {
        MSQ_SETERR( err )
        ( MsqError::PARSE_ERROR, "Syntax error at line %d: expected number, got \"%s\"", line_number(), token );
        return false;
    }

    return true;
}
bool MBMesquite::FileTokenizer::get_doubles ( size_t  count,
double *  array,
MsqError err 
)

Parse a sequence of double values.

Read the specified number of space-deliminated doubles.

Parameters:
countThe number of values to read.
arrayThe memory at which to store the values.
Returns:
true if successful, false otherwise.

Definition at line 281 of file mesquite/Misc/FileTokenizer.cpp.

References get_double_internal(), and MSQ_CHKERR.

Referenced by FileTokenizerTest::double_test(), MBMesquite::MeshImpl::vtk_read_polydata(), MBMesquite::MeshImpl::vtk_read_rectilinear_grid(), MBMesquite::MeshImpl::vtk_read_scalar_attrib(), MBMesquite::MeshImpl::vtk_read_structured_grid(), MBMesquite::MeshImpl::vtk_read_structured_points(), MBMesquite::MeshImpl::vtk_read_typed_data(), and MBMesquite::MeshImpl::vtk_read_unstructured_grid().

{
    for( size_t i = 0; i < count; ++i )
    {
        get_double_internal( *array, err );
        if( MSQ_CHKERR( err ) ) return false;
        ++array;
    }
    return true;
}
bool MBMesquite::FileTokenizer::get_float_internal ( float &  result,
MsqError err 
) [private]

Internal implementation of get_floats()

Definition at line 168 of file mesquite/Misc/FileTokenizer.cpp.

References get_double_internal(), line_number(), MSQ_CHKERR, MSQ_SETERR, and MBMesquite::MsqError::PARSE_ERROR.

Referenced by get_floats().

{
    double d = 0.0;
    get_double_internal( d, err );
    if( MSQ_CHKERR( err ) ) return false;

    result = (float)d;
    if( d != (double)result )
    {
        MSQ_SETERR( err )( MsqError::PARSE_ERROR, "Numberic overflow at line %d.", line_number() );
        return false;
    }

    return true;
}
bool MBMesquite::FileTokenizer::get_floats ( size_t  count,
float *  array,
MsqError err 
)

Parse a sequence of float values.

Read the specified number of space-deliminated doubles.

Parameters:
countThe number of values to read.
arrayThe memory at which to store the values.
Returns:
true if successful, false otherwise.

Definition at line 271 of file mesquite/Misc/FileTokenizer.cpp.

References get_float_internal().

Referenced by MBMesquite::MeshImpl::vtk_read_color_attrib(), and MBMesquite::MeshImpl::vtk_read_scalar_attrib().

{
    for( size_t i = 0; i < count; ++i )
    {
        if( !get_float_internal( *array, err ) ) return false;
        ++array;
    }
    return true;
}
bool MBMesquite::FileTokenizer::get_integer_internal ( int &  result,
MsqError err 
) [private]

Internal implementation of get_integers()

Definition at line 238 of file mesquite/Misc/FileTokenizer.cpp.

References get_long_int_internal(), line_number(), MSQ_CHKERR, MSQ_SETERR, and MBMesquite::MsqError::PARSE_ERROR.

Referenced by get_integers().

{
    long i = 0;
    get_long_int_internal( i, err );
    if( MSQ_CHKERR( err ) ) return false;

    result = (int)i;
    if( i != (long)result )
    {
        MSQ_SETERR( err )( MsqError::PARSE_ERROR, "Numberic overflow at line %d.", line_number() );
        return false;
    }

    return true;
}
bool MBMesquite::FileTokenizer::get_integers ( size_t  count,
int *  array,
MsqError err 
)

Parse a sequence of integer values.

Read the specified number of space-deliminated ints.

Parameters:
countThe number of values to read.
arrayThe memory at which to store the values.
Returns:
true if successful, false otherwise.

Definition at line 314 of file mesquite/Misc/FileTokenizer.cpp.

References get_integer_internal(), and MSQ_CHKERR.

Referenced by MBMesquite::MeshImpl::vtk_read_cell_data(), MBMesquite::MeshImpl::vtk_read_field(), MBMesquite::MeshImpl::vtk_read_point_data(), MBMesquite::MeshImpl::vtk_read_texture_attrib(), and MBMesquite::MeshImpl::vtk_read_typed_data().

{
    for( size_t i = 0; i < count; ++i )
    {
        get_integer_internal( *array, err );
        if( MSQ_CHKERR( err ) ) return false;
        ++array;
    }
    return true;
}
bool MBMesquite::FileTokenizer::get_long_int_internal ( long &  result,
MsqError err 
) [private]

Internal implementation of get_long_ints()

Definition at line 184 of file mesquite/Misc/FileTokenizer.cpp.

References get_string(), line_number(), MSQ_CHKERR, MSQ_SETERR, and MBMesquite::MsqError::PARSE_ERROR.

Referenced by get_byte_internal(), get_integer_internal(), get_long_ints(), and get_short_int_internal().

{
    // Get a token
    const char *token_end, *token = get_string( err );
    if( MSQ_CHKERR( err ) ) return false;

    // Parse token as long
    result = strtol( token, (char**)&token_end, 0 );

    // If the one past the last char read by strtol is
    // not the NULL character terminating the string,
    // then parse failed.
    if( *token_end )
    {
        MSQ_SETERR( err )
        ( MsqError::PARSE_ERROR, "Syntax error at line %d: expected integer, got \"%s\"", line_number(), token );
        return false;
    }

    return true;
}
bool MBMesquite::FileTokenizer::get_long_ints ( size_t  count,
long *  array,
MsqError err 
)

Parse a sequence of integer values.

Read the specified number of space-deliminated ints.

Parameters:
countThe number of values to read.
arrayThe memory at which to store the values.
Returns:
true if successful, false otherwise.

Definition at line 325 of file mesquite/Misc/FileTokenizer.cpp.

References get_long_int_internal(), and MSQ_CHKERR.

Referenced by FileTokenizerTest::long_test(), MBMesquite::MeshImpl::vtk_read_color_attrib(), MBMesquite::MeshImpl::vtk_read_dataset(), MBMesquite::MeshImpl::vtk_read_field_data(), MBMesquite::MeshImpl::vtk_read_polydata(), MBMesquite::MeshImpl::vtk_read_polygons(), MBMesquite::MeshImpl::vtk_read_rectilinear_grid(), MBMesquite::MeshImpl::vtk_read_scalar_attrib(), MBMesquite::MeshImpl::vtk_read_structured_grid(), MBMesquite::MeshImpl::vtk_read_structured_points(), MBMesquite::MeshImpl::vtk_read_typed_data(), and MBMesquite::MeshImpl::vtk_read_unstructured_grid().

{
    for( size_t i = 0; i < count; ++i )
    {
        get_long_int_internal( *array, err );
        if( MSQ_CHKERR( err ) ) return false;
        ++array;
    }
    return true;
}

check for newline

Consume whitespace upto and including the next newline. If a non-space character is found before a newline, the function will stop, set the error message, and return false.

Returns:
True if a newline was found before any non-space character. False otherwise.

Definition at line 406 of file mesquite/Misc/FileTokenizer.cpp.

References buffer, bufferEnd, eof(), filePtr, MBMesquite::MsqError::IO_ERROR, lastChar, line_number(), lineNumber, MSQ_SETERR, nextToken, and MBMesquite::MsqError::PARSE_ERROR.

Referenced by FileTokenizerTest::newline_test(), MBMesquite::MeshImpl::vtk_read_polydata(), MBMesquite::MeshImpl::vtk_read_polygons(), MBMesquite::MeshImpl::vtk_read_rectilinear_grid(), MBMesquite::MeshImpl::vtk_read_structured_grid(), MBMesquite::MeshImpl::vtk_read_structured_points(), and MBMesquite::MeshImpl::vtk_read_unstructured_grid().

{
    if( lastChar == '\n' )
    {
        lastChar = ' ';
        ++lineNumber;
        return true;
    }

    // Loop until either we a) find a newline, b) find a non-whitespace
    // character or c) reach the end of the file.
    for( ;; )
    {
        // If the buffer is empty, read more.
        if( nextToken == bufferEnd )
        {
            size_t count = fread( buffer, 1, sizeof( buffer ), filePtr );
            if( !count )
            {
                if( eof() )
                    MSQ_SETERR( err )( "File truncated.", MsqError::PARSE_ERROR );
                else
                    MSQ_SETERR( err )( MsqError::IO_ERROR );
                return false;
            }

            nextToken = buffer;
            bufferEnd = buffer + count;
        }

        // If the current character is not a space, the we've failed.
        if( !isspace( *nextToken ) )
        {
            MSQ_SETERR( err )
            ( MsqError::PARSE_ERROR, "Expected newline at line %d.", line_number() );
            return false;
        }

        // If the current space character is a newline,
        // increment the line number count.
        if( *nextToken == '\n' )
        {
            ++lineNumber;
            ++nextToken;
            lastChar = ' ';
            return true;
        }
        ++nextToken;
    }

    // should never reach this
    return false;
}
bool MBMesquite::FileTokenizer::get_short_int_internal ( short &  result,
MsqError err 
) [private]

Internal implementation of get_short_ints()

Definition at line 222 of file mesquite/Misc/FileTokenizer.cpp.

References get_long_int_internal(), line_number(), MSQ_CHKERR, MSQ_SETERR, and MBMesquite::MsqError::PARSE_ERROR.

Referenced by get_short_ints().

{
    long i = 0;
    get_long_int_internal( i, err );
    if( MSQ_CHKERR( err ) ) return false;

    result = (short)i;
    if( i != (long)result )
    {
        MSQ_SETERR( err )( MsqError::PARSE_ERROR, "Numberic overflow at line %d.", line_number() );
        return false;
    }

    return true;
}
bool MBMesquite::FileTokenizer::get_short_ints ( size_t  count,
short *  array,
MsqError err 
)

Parse a sequence of integer values.

Read the specified number of space-deliminated ints.

Parameters:
countThe number of values to read.
arrayThe memory at which to store the values.
Returns:
true if successful, false otherwise.

Definition at line 303 of file mesquite/Misc/FileTokenizer.cpp.

References get_short_int_internal(), and MSQ_CHKERR.

{
    for( size_t i = 0; i < count; ++i )
    {
        get_short_int_internal( *array, err );
        if( MSQ_CHKERR( err ) ) return false;
        ++array;
    }
    return true;
}

get next token

Get the next whitesapce-deliminated token from the file. NOTE: The returned string is only valid until the next call to any of the functions in this class that read from the file.

Returns:
A pointer to the buffer space containing the string, or NULL if an error occured.

Definition at line 51 of file mesquite/Misc/FileTokenizer.cpp.

References buffer, bufferEnd, filePtr, MBMesquite::MsqError::IO_ERROR, lastChar, lineNumber, MSQ_SETERR, nextToken, and MBMesquite::MsqError::PARSE_ERROR.

Referenced by get_boolean_internal(), get_double_internal(), get_long_int_internal(), FileTokenizerTest::line_number_test(), match_token(), FileTokenizerTest::newline_test(), FileTokenizerTest::token_test(), FileTokenizerTest::unget_test(), MBMesquite::MeshImpl::vtk_read_attrib_data(), MBMesquite::MeshImpl::vtk_read_field(), MBMesquite::MeshImpl::vtk_read_field_data(), and MBMesquite::MeshImpl::vtk_read_scalar_attrib().

{
    // If the whitepsace character marking the end of the
    // last token was a newline, increment the line count.
    if( lastChar == '\n' ) ++lineNumber;

    // Loop until either found the start of a token to return or have
    // reached the end of the file.
    for( ;; )
    {
        // If the buffer is empty, read more.
        if( nextToken == bufferEnd )
        {
            size_t count = fread( buffer, 1, sizeof( buffer ) - 1, filePtr );
            if( !count )
            {
                if( feof( filePtr ) )
                    MSQ_SETERR( err )( "File truncated.\n", MsqError::PARSE_ERROR );
                else
                    MSQ_SETERR( err )( MsqError::IO_ERROR );
                return NULL;
            }

            nextToken = buffer;
            bufferEnd = buffer + count;
        }

        // If the current character is not a space, we've found a token.
        if( !isspace( *nextToken ) ) break;

        // If the current space character is a newline,
        // increment the line number count.
        if( *nextToken == '\n' ) ++lineNumber;
        ++nextToken;
    }

    // Store the start of the token in "result" and
    // advance "nextToken" to one past the end of the
    // token.
    char* result = nextToken;
    while( nextToken != bufferEnd && !isspace( *nextToken ) )
        ++nextToken;

    // If we have reached the end of the buffer without finding
    // a whitespace character terminating the token, we need to
    // read more from the file.  Only try once.  If the token is
    // too large to fit in the buffer, give up.
    if( nextToken == bufferEnd )
    {
        // Shift the (possibly) partial token to the start of the buffer.
        size_t remaining = bufferEnd - result;
        memmove( buffer, result, remaining );
        result    = buffer;
        nextToken = result + remaining;

        // Fill the remainder of the buffer after the token.
        size_t count = fread( nextToken, 1, sizeof( buffer ) - remaining - 1, filePtr );
        if( !count && !feof( filePtr ) )
        {
            MSQ_SETERR( err )( "I/O error.\n", MsqError::IO_ERROR );
            return NULL;
        }
        bufferEnd = nextToken + count;

        // Continue to advance nextToken until we find the space
        // terminating the token.
        while( nextToken != bufferEnd && !isspace( *nextToken ) )
            ++nextToken;

        if( nextToken == bufferEnd )  // EOF
        {
            *bufferEnd = '\0';
            ++bufferEnd;
        }
    }

    // Save terminating whitespace character (or NULL char if EOF).
    lastChar = *nextToken;
    // Put null in buffer to mark end of current token.
    *nextToken = '\0';
    // Advance nextToken to the next character to search next time.
    ++nextToken;
    return result;
}
int MBMesquite::FileTokenizer::match_token ( const char *const *  string_list,
MsqError err 
)

Match the current token to one of an array of strings. Sets the error message if the current token doesn't match any of the input strings.

Parameters:
string_listA NULL-terminated array of strings.
Returns:
One greater than the index of the matched string, or zero if no match.

Definition at line 377 of file mesquite/Misc/FileTokenizer.cpp.

References get_string(), line_number(), MSQ_CHKERR, MSQ_SETERR, and MBMesquite::MsqError::PARSE_ERROR.

{
    // Get a token
    const char* token = get_string( err );
    if( MSQ_CHKERR( err ) ) return false;

    // Check if it matches any input string
    const char* const* ptr;
    for( ptr = list; *ptr; ++ptr )
        if( 0 == strcmp( token, *ptr ) ) return ptr - list + 1;

    // No match, constuct error message
    std::string message( "Parsing error at line " );
    char lineno[16];
    sprintf( lineno, "%d", line_number() );
    message += lineno;
    message += ": expected one of {";
    for( ptr = list; *ptr; ++ptr )
    {
        message += " ";
        message += *ptr;
    }
    message += " } got \"";
    message += token;
    message += "\"";
    MSQ_SETERR( err )( message, MsqError::PARSE_ERROR );
    return false;
}

Member Data Documentation

char MBMesquite::FileTokenizer::buffer[512] [private]

Input buffer

Definition at line 225 of file mesquite/Misc/FileTokenizer.hpp.

Referenced by get_newline(), get_string(), and unget_token().

One past the last used byte of the buffer

Definition at line 230 of file mesquite/Misc/FileTokenizer.hpp.

Referenced by eof(), get_newline(), and get_string().

std::FILE* MBMesquite::FileTokenizer::filePtr [private]

Pointer to standard C FILE struct

Definition at line 222 of file mesquite/Misc/FileTokenizer.hpp.

Referenced by eof(), get_newline(), get_string(), and ~FileTokenizer().

The whitespace character marking the end of the last returned token. Saved here because if it is a newline, the line count will need to be incremented when the next token is returned.

Definition at line 240 of file mesquite/Misc/FileTokenizer.hpp.

Referenced by get_newline(), get_string(), and unget_token().

Line number of last returned token

Definition at line 233 of file mesquite/Misc/FileTokenizer.hpp.

Referenced by get_newline(), get_string(), and line_number().

One past the end of the last token returned

Definition at line 228 of file mesquite/Misc/FileTokenizer.hpp.

Referenced by eof(), get_newline(), get_string(), and unget_token().

List of all members.


The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines