World Library  
Flag as Inappropriate
Email this Article

C character classification

Article Id: WHEBN0000652327
Reproduction Date:

Title: C character classification  
Author: World Heritage Encyclopedia
Language: English
Subject: C++ Standard Library, C dynamic memory allocation, C mathematical functions, C string handling, Stdarg.h
Collection: C Standard Library
Publisher: World Heritage Encyclopedia
Publication
Date:
 

C character classification

C character classification is an operation provided by a group of functions in the ANSI C Standard Library for the C programming language. These functions are used to test characters for membership in a particular class of characters, such as alphabetic characters, control characters, etc. Both single-byte, and wide characters are supported.[1]

Contents

  • History 1
  • Implementation 2
  • Overview of functions 3
  • References 4
  • External links 5

History

Early toolsmiths writing in C under Unix began developing idioms at a rapid rate to classify characters into different types. For example, in the ASCII character set, the following test identifies a letter:

if ('A' <= c && c <= 'Z' || 'a' <= c && c <= 'z')

However, this idiom does not necessarily work for other character sets such as EBCDIC.

Pretty soon, programs became thick with tests such as the one above, or worse, tests almost like the one above. A programmer can write the same idiom several different ways, which slows comprehension and increases the chance for errors.

Before long, the idioms were replaced by the functions in .

Implementation

Unlike the above example, the character classification routines are not written as comparison tests. In most C libraries, they are written as static table lookups instead of macros or functions.

For example, an array of 256 eight-bit integers, arranged as bitfields, is created, where each bit corresponds to a particular property of the character, e.g., isdigit, isalpha. If the lowest-order bit of the integers corresponds to the isdigit property, the code could be written thus:

#define isdigit(x) (TABLE[x] & 1)

Early versions of Linux used a potentially faulty method similar to the first code sample:

#define isdigit(x) ((x) >= '0' && (x) <= '9')

This can cause problems if x has a side effect---for instance, if one calls isdigit(x++) or isdigit(run_some_program()). It would not be immediately evident that the argument to isdigit is being evaluated twice. For this reason, the table-based approach is generally used.

The difference between these two methods became a point of interest during the SCO v. IBM case.

Overview of functions

The functions that operate on single-byte characters are defined in ctype.h header (cctype header in C++). The functions that operate on wide characters are defined in wctype.h header (cwctype header in C++).

The classification is done according to the current locale.

Byte
character
Wide
character
Description
isalnum iswalnum checks if a byte/wchar_t is alphanumeric
isalpha iswalpha checks if a byte/wchar_t is alphabetic
islower iswlower checks if a byte/wchar_t is lowercase
isupper iswupper checks if a byte/wchar_t is an uppercase byte/wchar_t
isdigit iswdigit checks if a byte/wchar_t is a digit
isxdigit iswxdigit checks if a byte/wchar_t is a hexadecimal byte/wchar_t
iscntrl iswcntrl checks if a byte/wchar_t is a control byte/wchar_t
isgraph iswgraph checks if a byte/wchar_t is a graphical byte/wchar_t
isspace iswspace checks if a byte/wchar_t is a space byte/wchar_t
isblank iswblank checks if a byte/wchar_t is a blank byte/wchar_t (C99/C++11)
isprint iswprint checks if a byte/wchar_t is a printing byte/wchar_t
ispunct iswpunct checks if a byte/wchar_t is a punctuation byte/wchar_t
tolower towlower converts a byte/wchar_t to lowercase
toupper towupper converts a byte/wchar_t to uppercase
N/A iswctype checks if a wchar_t falls into specific class
N/A towctrans converts a wchar_t using a specific mapping
N/A wctype returns a wide character class to be used with iswctype
N/A wctrans returns a transformation mapping to be used with towctrans

References

  1. ^ ISO/IEC 9899:1999 specification (PDF). p. 193, § 7.4. 

External links

This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
 
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
 
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.
 


Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.