Category: 02. Lexical Elements

https://img.freepik.com/premium-photo/3d-icon-study-schedule-3d-illustration-3d-element-3d-rendering-graphic-elements-design-element_808921-804.jpg

  • Identifiers

    Identifier in C helps in identifying variables, constants, functions etc., in a C code. C, being a high-level computer language, allows you to refer to a memory location with a name instead of using its address in binary or hexadecimal form.

    C Identifiers

    Identifiers are the user-defined names given to make it easy to refer to the memory. It is also used to define various elements in the program, such as the function, user-defined type, labels, etc. Identifiers are thus the names that help the programmer to use programming elements more conveniently.

    When a variable or a function is defined with an identifier, the C compiler allocates it the memory and associates the memory location to the identifier. As a result, whenever the identifier is used in the instruction, C compiler can access its associated memory location. For example, when we declare a variable age and assign it a value as shown in the following figure, the compiler assigns a memory location to it.

    Memory

    Even if the programmer can use an identifier of his choice to name a variable or a function etc., there are certain rules to be followed to form a valid identifier.

    Naming Rules of C Identifiers

    Given below are the rules using which an identifier is formed −

    • Keywords can’t be used as identifiers as they are predefined.
    • Out of the character set that C uses, only the alphabets (upper and lowercase) and the underscore symbol (_) are allowed in the identifier. It implies that you can’t use characters like the punctuation symbols etc. as a part of the identifier.
    • The identifier must start either with an alphabet (upper or lowercase) or an underscore. It means, a digit cannot be the first character of the identifier.
    • The subsequent characters may be alphabets or digits or an underscore.
    • Same identifier can’t be used as a name of two entities. An identifier can be used only once in the current scope.

    As per the above rules, some examples of the valid and invalid identifiers are as follows −

    Valid C Identifiers

    age, Age, AGE, average_age, __temp, address1, phone_no_personal, _my_name
    

    Invalid C Identifiers

    Average-age, my name, $age, #phone,1mg, phy+maths
    

    Examples of C Identifiers

    The following program shows an error −

    #include <stdio.h>intmain(){/* variable definition: */int marks =50;float marks =65.50;printf("%d %f", marks, marks);return0;}

    Error

    main.c: In function 'main':
    main.c:7:10: error: conflicting types for 'marks'; have 'float'
    
    7 |    float marks = 65.50;
      |          ^~~~~
    main.c:6:8: note: previous definition of 'marks' with type 'int'
    6 |    int marks = 50;
      |        ^~~~~
    main.c:8:13: warning: format '%d' expects argument of type 'int', but argument 2 has type 'double' [-Wformat=]
    8 |    printf("%d %f", marks, marks);
      |            ~^      ~~~~~
      |             |      |
      |             int    double
      |            %f  
    • Identifiers are case-sensitive, as a result age is not the same as AGE.
    • ANSI standard recognizes a length of 31 characters for an identifier. Although you can choose a name with more characters, only the first 31 will be recognized. Thus you can form a meaningful and descriptive identifier.

    Scope of C Identifiers

    In C language, the scope of identifiers refers to the place where an identifier is declared and can be used/accessed. There are two scopes of an identifier:

    Global Identifiers

    If an identifier has been declared outside before the declaration of any function, it is called as an global (external) identifier.

    Example

    #include <stdio.h>int marks=100;// external identifierintmain(){printf("The value of marks is %d\n", marks);}

    Output

    The value of marks is 100
    

    This is because marks is defined outside of any blocks, so it is an external identifier.

    Local Identifiers

    On the other hand, an identifier inside any function is an local (internal) identifier.

    Example

    #include <stdio.h>intmain(){int marks=100;// internal identifierprintf("The value of marks is %d\n", marks);}

    Output

    The value of marks is 100
    

    This is because marks is defined inside main function, so it is an internal identifier.

    Examples of Different Types of C Identifiers

    Identifiers can also appear in a forward declaration of a function. However, the declaration signature of a function should match with the definition.

    Example of Variable Identifier

    int marks1 =50, marks2 =60;float avg =(float)(marks1+marks2)/2;

    Example of Function Identifier

    intaverage(int marks1,int marks2){return(float)(marks1+marks2)/2;}

    Example of User-defined Type Identifier

    structstudent{int rollno;char*name;int m1,m2,m3;float percent
    };structstudent s1 ={1,"Raju",50,60,70,60.00};

    Example of Typedef Identifier

    structstudent{int rollno;char*name;int m1,m2,m3;float percent
    };typedefstructstudent STUDENT;
    STUDENT s1 ={1,"Raju",50,60,70,60.00};

    Example of Label Identifier

    #include <stdio.h>intmain(){int x=0;
       begin:
       x++;if(x>=10)goto end;printf("%d\n", x);goto begin;
    
       end:return0;}

    Output

    1
    2
    3
    4
    5
    6
    7
    8
    9
    

    Example of Enum Identifier

    #include <stdio.h>enumweek{Mon=10, Tue, Wed, Thur, Fri=10, Sat=16, Sun};intmain(){printf("The value of enum week: %d\n",Mon);return0;}

    Output

    The value of enum week: 10
    

    Thus, the identifiers are found everywhere in the C program. Choosing right identifier for the coding element such as the variable or a function is important for enhancing the readability and debugging and documentation of the program.

  • Keywords

    Keywords are those predefined words that have special meaning in the compiler and they cannot be used for any other purpose. As per the C99 standard, C language has 32 keywords. Keywords cannot be used as identifiers.

    The following table has the list of all keywords (reserved words) available in the C language:

    autodoubleintstruct
    breakelselongswitch
    caseenumregistertypedef
    charexternreturnunion
    continueforsignedvoid
    doifstaticwhile
    defaultgotosizeofvolatile
    constfloatshortunsigned

    All the keywords in C have lowercase alphabets, although the keywords that have been newly added in C, do have uppercase alphabets in them. C is a case-sensitive language. Hence, int is a keyword but INT, or Int are not recognized as a keyword. The new keywords introduced from C99 onwards start with an underscore character. The compiler checks the source code for the correctness of the syntax of all the keywords and then translates it into the machine code.

    Example of C Keywords

    In the following program, we are using a keyword as an identifier i.e., as the name of the user-defined function, that will cause a compilation error.

    #include <stdio.h>voidregister(int,int);intmain(){/* variable definition: */int a=5, b=7;register(a,b);return0;}voidregister(int a,int b){printf("%d", a+b);}

    Errors

    main.c:3:15: error: expected identifier or '(' before 'int'
    
    3 | void register(int, int);
      |               ^~~
    main.c: In function 'main': main.c:8:14: error: expected ')' before ',' token
    8 |    register(a,b);
      |              ^
      |              )
    main.c: At top level: main.c:12:15: error: expected identifier or '(' before 'int' 12 | void register(int a, int b)
      |               ^

    The reason for the errors is that we are using a keyword register as the name of a user-defined function, which is not allowed.

    The ANSI C version has 32 keywords. These keywords are the basic element of the program logic. These keywords can be broadly classified in following types −

    • Primary Data types
    • User defined types
    • Storage types
    • Conditionals
    • Loops and loop controls
    • Others

    Let us discuss the keywords in each category.

    Primary Types C Keywords

    These keywords are used for variable declaration. C is a statically type language, the variable to be used must be declared. Variables in C are declared with the following keywords:

    intDeclares an integer variable
    longDeclares a long integer variable
    shortDeclares a short integer variable
    signedDeclares a signed variable
    doubleDeclares a double-precision variable
    charDeclares a character variable
    floatDeclares a floating-point variable
    unsignedDeclares an unsigned variable
    voidSpecifies a void return type

    User-defined Types C Keywords

    C language allows you to define new data types as per requirement. The user defined type has one or more elements of primary type.

    The following keywords are provided for user defined data types −

    structDeclares a structure type
    typedefCreates a new data type
    unionDeclares a union type
    enumDeclares an enumeration type

    Storage Types C Keywords

    The following set of keywords are called storage specifiers. They indicate the location where in the memory the variables stored. Default storage type of a variable is auto, although you can ask the compiler to form a variable with specific storage properties.

    autoSpecifies automatic storage class
    externDeclares a variable or function
    staticSpecifies static storage class
    registerSpecifies register storage class

    Conditionals C Keywords

    The following set of keywords help you to put conditional logic in the program. The conditional logic expressed with if and else keywords provides two alternative actions for a condition. For multi-way branching, use switch case construct. In C, the jump operation in an assembler is implemented by the goto keyword.

    gotoJumps to a labeled statement
    ifStarts an if statement
    elseExecutes when the if condition is false
    caseLabels a statement within a switch
    switchStarts a switch statement
    defaultSpecifies default statement in switch

    Loops and Loop Control C Keywords

    Repetition or iteration is an essential aspect of the algorithm. C provides different alternatives for forming a loop, and keywords for controlling the behaviour of the loop. Each of the keywords let you form a loop of different characteristics and usage.

    ForStarts a for-loop
    doStarts a do-while loop
    whilestarts a while loop
    continueSkips an iteration of a loop
    breakTerminates a loop or switch statement

    Other C Keywords

    The following miscellaneous keywords are also extremely important:

    constSpecifies a constant value
    SizeofDetermines the size of a data type
    Volatilecompiler that the value of the variable may change at any time

    In C99 version, five more keywords were added −

    • _Bool
    • _Complex
    • _Imaginary
    • inline

    In C11, seven more keywords have been added

    • _Alignas
    • _Alignof
    • _Atomic
    • _Generic
    • _Noreturn
    • _Static_assert

    When the C23 standard will be released it will introduce 14 more keywords −

    • alignas
    • alignof
    • bool
    • constexpr
    • false
    • nullptr
    • static_assert
    • thread_local
    • true
    • typeof
    • typeof_unqual
    • _Decimal128

    Most of the recently reserved words begin with an underscore followed by a capital letter, Since existing program source code should not have been using these identifiers.

    Following points must be kept in mind when using the keywords:

    • Keywords are reserved by the programming language and have predefined meaning. They cannot be used as name of a variable or function.
    • Each keyword has to be used as per the syntax stipulated for its use. If the syntax is violated, the compiler reports compilation errors.
    • C is one of the smallest computer languages with only 32 keywords in its ANSI C version, although a few more keywords have been added afterwards.
  • Tokens in C

    token is referred to as the smallest unit in the source code of a computer language such as C. The term token is borrowed from the theory of linguistics. Just as a certain piece of text in a language (like English) comprises words (collection of alphabets), digits, and punctuation symbols. A compiler breaks a C program into tokens and then proceeds ahead to the next stages used in the compilation process.

    The first stage in the compilation process is a tokenizer. The tokenizer divides the source code into individual tokens, identifying the token type, and passing tokens one at a time to the next stage of the compiler.

    The parser is the next stage in the compilation. It is capable of understanding the language’s grammar. identifies syntax errors and translates an error-free program into the machine language.

    A C source code also comprises tokens of different types. The tokens in C are of the following types −

    • Character set
    • Keyword tokens
    • Literal tokens
    • Identifier tokens
    • Operator tokens
    • Special symbol tokens

    Let us discuss each of these token types.

    C Character set

    The C language identifies a character set that comprises English alphabets upper and lowercase (A to Z, as well as a to z), digits 0 to 9, and certain other symbols with a special meaning attached to them. In C, certain combinations of characters also have a special meaning attached to them. For example, \n is known as a newline character. Such combinations are called escape sequences.

    Here is the character set of C language −

    • Uppercase: A to Z
    • Lowercase: a to z
    • Digits: 0 to 9
    • Special characters: ! ” # $ % & ‘ ( ) * + – . : , ; ` ~ = < > { } [ ] ^ _ \ /

    A sequence of any of these characters inside a pair of double quote symbols ” and ” are used to represent a string literal. Digits are used to represent numeric literal. Square brackets are used for defining an array. Curly brackets are used to mark code blocks. Back slash is an escape character. Other characters are defined as operators.

    C Keywords

    In C, a predefined sequence of alphabets is called a keyword. Compared to human languages, programming languages have fewer keywords. To start with, C had 32 keywords, later on, few more were added in subsequent revisions of C standards. All keywords are in lowercase. Each keyword has rules of usage (in programming it is called syntax) attached to it.

    The C compiler checks whether a keyword has been used according to the syntax, and translates the source code into the object code.

    C Literals

    In computer programming terminology, the term literal refers to a textual representation of a value to be assigned to a variable, directly hard-coded in the source code.

    A numeric literal contains digits, a decimal symbol, and/or the exponentiation character E or e.

    The string literal is made up of any sequence of characters put inside a pair of double quotation symbols. A character literal is a single character inside a single quote.

    Arrays can also be represented in literal form by putting a comma-separated sequence of literals between square brackets.

    In C, escape sequences are also a type of literal. Two or more characters, the first being a backslash \ character, put inside a single quote form an escape sequence. Each escape sequence has a predefined meaning attached to it.

    C Identifiers

    In contrast to the keywords, the identifiers are the user-defined elements in a program. You need to define various program elements by giving them an appropriate name. For example, variable, constant, label, user-defined type, function, etc.

    There are certain rules prescribed in C, to form an identifier. One of the important restrictions is that a reserved keyword cannot be used as an identifier. For example, for is a keyword in C, and hence it cannot be used as an identifier, i.e., name of a variable, function, etc.

    C Operators

    C is a computational language. Hence a C program consists of expressions that perform arithmetic and comparison operations. The special symbols from the character set of C are mostly defined as operators. For example, the well-known symbols, +* and / are the arithmetic operators in C. Similarly, < and > are used as comparison operators.

    C Special symbols

    Apart from the symbols defined as operators, the other symbols include punctuation symbols like commas, semicolons, and colons. In C, you find them used differently in different contexts.

    Similarly, the parentheses ( and ) are used in arithmetic expressions as well as in function definitions. The curly brackets are employed to mark the scope of functions, code blocks in conditional and looping statements, etc.