How to write a good C main function

Learn how to structure a C file and write a C main function that handles command line arguments like a champ.
285 readers like this.
Hand drawing out the word "code"

I know, Python and JavaScript are what the kids are writing all their crazy "apps" with these days. But don't be so quick to dismiss C—it's a capable and concise language that has a lot to offer. If you need speed, writing in C could be your answer. If you are looking for job security and the opportunity to learn how to hunt down null pointer dereferences, C could also be your answer! In this article, I'll explain how to structure a C file and write a C main function that handles command line arguments like a champ.

Me: a crusty Unix system programmer.

You: someone with an editor, a C compiler, and some time to kill.

Let's do this.

A boring but correct C program

Parody O'Reilly book cover, "Hating Other People's Code"

A C program starts with a main() function, usually kept in a file named main.c.

/* main.c */
int main(int argc, char *argv[]) {

}

This program compiles but doesn't do anything.

$ gcc main.c
$ ./a.out -o foo -vv 
$

Correct and boring.

Main functions are unique

The main() function is the first function in your program that is executed when it begins executing, but it's not the first function executed. The first function is _start(), which is typically provided by the C runtime library, linked in automatically when your program is compiled. The details are highly dependent on the operating system and compiler toolchain, so I'm going to pretend I didn't mention it.

The main() function has two arguments that traditionally are called argc and argv and return a signed integer. Most Unix environments expect programs to return 0 (zero) on success and -1 (negative one) on failure.

Argument Name Description
argc Argument count Length of the argument vector
argv Argument vector Array of character pointers

The argument vector, argv, is a tokenized representation of the command line that invoked your program. In the example above, argv would be a list of the following strings:

argv = [ "/path/to/a.out", "-o", "foo", "-vv" ];

The argument vector is guaranteed to always have at least one string in the first index, argv[0], which is the full path to the program executed.

Anatomy of a main.c file

When I write a main.c from scratch, it's usually structured like this:

/* main.c */
/* 0 copyright/licensing */
/* 1 includes */
/* 2 defines */
/* 3 external declarations */
/* 4 typedefs */
/* 5 global variable declarations */
/* 6 function prototypes */

int main(int argc, char *argv[]) {
/* 7 command-line parsing */
}

/* 8 function declarations */

I'll talk about each of these numbered sections, except for zero, below. If you have to put copyright or licensing text in your source, put it there.

Another thing I won't talk about adding to your program is comments.

"Comments lie."
- A cynical but smart and good looking programmer.

Instead of comments, use meaningful function and variable names.

Appealing to the inherent laziness of programmers, once you add comments, you've doubled your maintenance load. If you change or refactor the code, you need to update or expand the comments. Over time, the code mutates away from anything resembling what the comments describe.

If you have to write comments, do not write about what the code is doing. Instead, write about why the code is doing what it's doing. Write comments that you would want to read five years from now when you've forgotten everything about this code. And the fate of the world is depending on you. No pressure.

1. Includes

The first things I add to a main.c file are includes to make a multitude of standard C library functions and variables available to my program. The standard C library does lots of things; explore header files in /usr/include to find out what it can do for you.

The #include string is a C preprocessor (cpp) directive that causes the inclusion of the referenced file, in its entirety, in the current file. Header files in C are usually named with a .h extension and should not contain any executable code; only macros, defines, typedefs, and external variable and function prototypes. The string <header.h> tells cpp to look for a file called header.h in the system-defined header path, usually /usr/include.

/* main.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <libgen.h>
#include <errno.h>
#include <string.h>
#include <getopt.h>
#include <sys/types.h>

This is the minimum set of global includes that I'll include by default for the following stuff:

#include File Stuff It Provides
stdio Supplies FILE, stdin, stdout, stderr, and the fprint() family of functions
stdlib Supplies malloc(), calloc(), and realloc()
unistd Supplies EXIT_FAILURE, EXIT_SUCCESS
libgen Supplies the basename() function
errno Defines the external errno variable and all the values it can take on
string Supplies memcpy(), memset(), and the strlen() family of functions
getopt Supplies external optarg, opterr, optind, and getopt() function
sys/types Typedef shortcuts like uint32_t and uint64_t

2. Defines

/* main.c */
<...>

#define OPTSTR "vi:o:f:h"
#define USAGE_FMT  "%s [-v] [-f hexflag] [-i inputfile] [-o outputfile] [-h]"
#define ERR_FOPEN_INPUT  "fopen(input, r)"
#define ERR_FOPEN_OUTPUT "fopen(output, w)"
#define ERR_DO_THE_NEEDFUL "do_the_needful blew up"
#define DEFAULT_PROGNAME "george"

This doesn't make a lot of sense right now, but the OPTSTR define is where I will state what command line switches the program will recommend. Consult the getopt(3) man page to learn how OPTSTR will affect getopt()'s behavior.

The USAGE_FMT define is a printf()-style format string that is referenced in the usage() function.

I also like to gather string constants as #defines in this part of the file. Collecting them makes it easier to fix spelling, reuse messages, and internationalize messages, if required.

Finally, use all capital letters when naming a #define to distinguish it from variable and function names. You can run the words together if you want or separate words with an underscore; just make sure they're all upper case.

3. External declarations

/* main.c */
<...>

extern int errno;
extern char *optarg;
extern int opterr, optind;

An extern declaration brings that name into the namespace of the current compilation unit (aka "file") and allows the program to access that variable. Here we've brought in the definitions for three integer variables and a character pointer. The opt prefaced variables are used by the getopt() function, and errno is used as an out-of-band communication channel by the standard C library to communicate why a function might have failed.

4. Typedefs

/* main.c */
<...>

typedef struct {
  int           verbose;
  uint32_t      flags;
  FILE         *input;
  FILE         *output;
} options_t;

After external declarations, I like to declare typedefs for structures, unions, and enumerations. Naming a typedef is a religion all to itself; I strongly prefer a _t suffix to indicate that the name is a type. In this example, I've declared options_t as a struct with four members. C is a whitespace-neutral programming language, so I use whitespace to line up field names in the same column. I just like the way it looks. For the pointer declarations, I prepend the asterisk to the name to make it clear that it's a pointer.

5. Global variable declarations

/* main.c */
<...>

int dumb_global_variable = -11;

Global variables are a bad idea and you should never use them. But if you have to use a global variable, declare them here and be sure to give them a default value. Seriously, don't use global variables.

6. Function prototypes

/* main.c */
<...>

void usage(char *progname, int opt);
int  do_the_needful(options_t *options);

As you write functions, adding them after the main() function and not before, include the function prototypes here. Early C compilers used a single-pass strategy, which meant that every symbol (variable or function name) you used in your program had to be declared before you used it. Modern compilers are nearly all multi-pass compilers that build a complete symbol table before generating code, so using function prototypes is not strictly required. However, you sometimes don't get to choose what compiler is used on your code, so write the function prototypes and drive on.

As a matter of course, I always include a usage() function that main() calls when it doesn't understand something you passed in from the command line.

7. Command line parsing

/* main.c */
<...>

int main(int argc, char *argv[]) {
    int opt;
    options_t options = { 0, 0x0, stdin, stdout };

    opterr = 0;

    while ((opt = getopt(argc, argv, OPTSTR)) != EOF) 
       switch(opt) {
           case 'i':
              if (!(options.input = fopen(optarg, "r")) ){
                 perror(ERR_FOPEN_INPUT);
                 exit(EXIT_FAILURE);
                 /* NOTREACHED */
              }
              break;

           case 'o':
              if (!(options.output = fopen(optarg, "w")) ){
                 perror(ERR_FOPEN_OUTPUT);
                 exit(EXIT_FAILURE);
                 /* NOTREACHED */
              }    
              break;
              
           case 'f':
              options.flags = (uint32_t )strtoul(optarg, NULL, 16);
              break;

           case 'v':
              options.verbose += 1;
              break;

           case 'h':
           default:
              usage(basename(argv[0]), opt);
              /* NOTREACHED */
              break;
       }

    if (do_the_needful(&options) != EXIT_SUCCESS) {
       perror(ERR_DO_THE_NEEDFUL);
       exit(EXIT_FAILURE);
       /* NOTREACHED */
    }

    return EXIT_SUCCESS;
}

OK, that's a lot. The purpose of the main() function is to collect the arguments that the user provides, perform minimal input validation, and then pass the collected arguments to functions that will use them. This example declares an options variable initialized with default values and parse the command line, updating options as necessary.

The guts of this main() function is a while loop that uses getopt() to step through argv looking for command line options and their arguments (if any). The OPTSTR #define earlier in the file is the template that drives getopt()'s behavior. The opt variable takes on the character value of any command line options found by getopt(), and the program's response to the detection of the command line option happens in the switch statement.

Those of you paying attention will now be questioning why opt is declared as a 32-bit int but is expected to take on an 8-bit char? It turns out that getopt() returns an int that takes on a negative value when it gets to the end of argv, which I check against EOF (the End of File marker). A char is a signed quantity, but I like matching variables to their function return values.

When a known command line option is detected, option-specific behavior happens. Some options have an argument, specified in OPTSTR with a trailing colon. When an option has an argument, the next string in argv is available to the program via the externally defined variable optarg. I use optarg to open files for reading and writing or converting a command line argument from a string to an integer value.

There are a couple of points for style here:

  • Initialize opterr to 0, which disables getopt from emiting a ?.
  • Use exit(EXIT_FAILURE); or exit(EXIT_SUCCESS); in the middle of main().
  • /* NOTREACHED */ is a lint directive that I like.
  • Use return EXIT_SUCCESS; at the end of functions that return int.
  • Explicitly cast implicit type conversions.

The command line signature for this program, if it were compiled, would look something like this:

$ ./a.out -h
a.out [-v] [-f hexflag] [-i inputfile] [-o outputfile] [-h]

In fact, that's what usage() will emit to stderr once compiled.

8. Function declarations

/* main.c */
<...>

void usage(char *progname, int opt) {
   fprintf(stderr, USAGE_FMT, progname?progname:DEFAULT_PROGNAME);
   exit(EXIT_FAILURE);
   /* NOTREACHED */
}

int do_the_needful(options_t *options) {

   if (!options) {
     errno = EINVAL;
     return EXIT_FAILURE;
   }

   if (!options->input || !options->output) {
     errno = ENOENT;
     return EXIT_FAILURE;
   }

   /* XXX do needful stuff */

   return EXIT_SUCCESS;
}

Finally, I write functions that aren't boilerplate. In this example, function do_the_needful() accepts a pointer to an options_t structure. I validate that the options pointer is not NULL and then go on to validate the input and output structure members. EXIT_FAILURE returns if either test fails and, by setting the external global variable errno to a conventional error code, I signal to the caller a general reason. The convenience function perror() can be used by the caller to emit human-readable-ish error messages based on the value of errno.

Functions should almost always validate their input in some way. If full validation is expensive, try to do it once and treat the validated data as immutable. The usage() function validates the progname argument using a conditional assignment in the fprintf() call. The usage() function is going to exit anyway, so I don't bother setting errno or making a big stink about using a correct program name.

The big class of errors I am trying to avoid here is de-referencing a NULL pointer. This will cause the operating system to send a special signal to my process called SYSSEGV, which results in unavoidable death. The last thing users want to see is a crash due to SYSSEGV. It's much better to catch a NULL pointer in order to emit better error messages and shut down the program gracefully.

Some people complain about having multiple return statements in a function body. They make arguments about "continuity of control flow" and other stuff. Honestly, if something goes wrong in the middle of a function, it's a good time to return an error condition. Writing a ton of nested if statements to just have one return is never a "good idea."™

Finally, if you write a function that takes four or more arguments, consider bundling them in a structure and passing a pointer to the structure. This makes the function signatures simpler, making them easier to remember and not screw up when they're called later. It also makes calling the function slightly faster, since fewer things need to be copied into the function's stack frame. In practice, this will only become a consideration if the function is called millions or billions of times. Don't worry about it if that doesn't make sense.

Wait, you said no comments!?!!

In the do_the_needful() function, I wrote a specific type of comment that is designed to be a placeholder rather than documenting the code:

/* XXX do needful stuff */

When you are in the zone, sometimes you don't want to stop and write some particularly gnarly bit of code. You'll come back and do it later, just not now. That's where I'll leave myself a little breadcrumb. I insert a comment with a XXX prefix and a short remark describing what needs to be done. Later on, when I have more time, I'll grep through source looking for XXX. It doesn't matter what you use, just make sure it's not likely to show up in your codebase in another context, as a function name or variable, for instance.

Putting it all together

OK, this program still does almost nothing when you compile and run it. But now you have a solid skeleton to build your own command line parsing C programs.

/* main.c - the complete listing */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <libgen.h>
#include <errno.h>
#include <string.h>
#include <getopt.h>

#define OPTSTR "vi:o:f:h"
#define USAGE_FMT  "%s [-v] [-f hexflag] [-i inputfile] [-o outputfile] [-h]"
#define ERR_FOPEN_INPUT  "fopen(input, r)"
#define ERR_FOPEN_OUTPUT "fopen(output, w)"
#define ERR_DO_THE_NEEDFUL "do_the_needful blew up"
#define DEFAULT_PROGNAME "george"

extern int errno;
extern char *optarg;
extern int opterr, optind;

typedef struct {
  int           verbose;
  uint32_t      flags;
  FILE         *input;
  FILE         *output;
} options_t;

int dumb_global_variable = -11;

void usage(char *progname, int opt);
int  do_the_needful(options_t *options);

int main(int argc, char *argv[]) {
    int opt;
    options_t options = { 0, 0x0, stdin, stdout };

    opterr = 0;

    while ((opt = getopt(argc, argv, OPTSTR)) != EOF) 
       switch(opt) {
           case 'i':
              if (!(options.input = fopen(optarg, "r")) ){
                 perror(ERR_FOPEN_INPUT);
                 exit(EXIT_FAILURE);
                 /* NOTREACHED */
              }
              break;

           case 'o':
              if (!(options.output = fopen(optarg, "w")) ){
                 perror(ERR_FOPEN_OUTPUT);
                 exit(EXIT_FAILURE);
                 /* NOTREACHED */
              }    
              break;
              
           case 'f':
              options.flags = (uint32_t )strtoul(optarg, NULL, 16);
              break;

           case 'v':
              options.verbose += 1;
              break;

           case 'h':
           default:
              usage(basename(argv[0]), opt);
              /* NOTREACHED */
              break;
       }

    if (do_the_needful(&options) != EXIT_SUCCESS) {
       perror(ERR_DO_THE_NEEDFUL);
       exit(EXIT_FAILURE);
       /* NOTREACHED */
    }

    return EXIT_SUCCESS;
}

void usage(char *progname, int opt) {
   fprintf(stderr, USAGE_FMT, progname?progname:DEFAULT_PROGNAME);
   exit(EXIT_FAILURE);
   /* NOTREACHED */
}

int do_the_needful(options_t *options) {

   if (!options) {
     errno = EINVAL;
     return EXIT_FAILURE;
   }

   if (!options->input || !options->output) {
     errno = ENOENT;
     return EXIT_FAILURE;
   }

   /* XXX do needful stuff */

   return EXIT_SUCCESS;
}

Now you're ready to write C that will be easier to maintain. If you have any questions or feedback, please share them in the comments.

XENON coated avatar will glow red in the presence of aliens.
Erik O'Shaughnessy is an opinionated but friendly UNIX system programmer living the good life in Texas. Over the last twenty years (or more!) he has worked for IBM, Sun Microsystems, Oracle, and most recently Intel doing computer system performance related work.

38 Comments

This is an *excellent* article about writing good C code. Great job Erik!

Function prototypes and global variables should be placed in the `*.h` that corresponds to the `*.c` where they are defined. Always keep the API in close association to its code.

While I would normally agree with you, the scope of this article was how to write a main.c and not how to structure a multi-file C program. I guess I know what to write about next :)

In reply to by Shawn H Corey

It will definitely be a good next article! Here are a few other ideas... You mentioned _start() function, tell us more! ;) Of course it depends on many parameters, but if it's limited to something well used and popular, like (x86_64, Linux, gcc), then it's real to dive into the topic within 1-3 articles. Another good topic would be debugging, a few examples how you debug code than a program was coredumped and killed with different signals. You also touched slightly "errno is used as an out-of-band communication channel by the standard C library to communicate why a function might have failed". I think, it deserves its own article. To show on examples how a caller interacts with a program, what is EINVAL and ENOENT. Tell us more how you handle errors.

In reply to by JnyJny

Agree with Jim, Great article!

I really like a style of the text and the content in particular. It is uniq as it represents a distilled experience. There is no only one right solution. C is still C, but knowing the language doesn't mean everyone will use it in the same way. Experience matters. I will wait for more articles on C/Unix/Low-level programming!

One little note regarding the code:
void usage(char *progname, int opt);

This function takes a second argument, but it is not used in the function's body.

Thank you for the praise and you are of course correct with respect to the opt argument being unused in the body of the function. I had intended to emit the offending option in the fprintf but must have forgotten it in the fugue of writing.

In reply to by Oleksii Tsvietnov

I couldn't compile the code because of two issues:
1. the full listing of a program has lost its #include
2. by some reasons three is no uint32_t on my Linux system

$ uname -r
5.0.7-200.fc29.x86_64

$ grep int32_ /usr/include/sys/types.h
typedef unsigned int u_int32_t;

After I fixed these two issues, I've managed to compiled the code.
And, one little note about a formatting of the usage() output. I think, it might make sense to add \n in the add, otherwise it doesn't look fine in a shell:

#define USAGE_FMT "%s [-v] [-f hexflag] [-i inputfile] [-o outputfile] [-h]\n"

In reply to by JnyJny

Thank you for this article. Whenever I start with a new C-Programm, I will read this again.

This is a good start, but there is one glaring omission. You need an option to redirect stderr to a log file and a matching FILE entry in the options_t structure. Once that is added, you might want to be able to set a log level as well, although the verbose counter would work if you use multiple 'v's.

One of the biggest problems I encountered in debugging new code is to figure out what lead up to the crash. Single stepping through a loop that repeats thousands of time before it crashes is not fun. Plus the debugger changes the timing and it may not crash at all. A log file with a few simple breadcrumbs can work wonders in this case.

Thanks for the suggestion Bob, however I have to wonder how redirecting stderr to a file in a shell doesn't accomplish the same thing without code maintenance overhead? I'm not saying you're wrong to want log files, but I am always interested in writing only as much code as I need to get the job done and no more.

In reply to by Bob McConnell (not verified)

Can you guarantee that doing it in the shell won't add a layer of kernel calls for each write? If not, there may be a significant difference in the time to push each message out to the file. When I first ran into this one, it took me three days to figure out the redirect was slowing the process down enough that the crash no longer occurred. With the log file it still didn't happen often, but at least I was able to find the glitch. Timing is everything.

In reply to by JnyJny

Thanks for sharing your experience; fixing heisenbugs can be notoriously difficult to accomplish. I think we can agree that the function and file layout I present in this article can be adapted to fit debugging that class of problems, however my primary purpose writing this article is to demonstrate basic structure of a C program for those who are just beginning their journey.

In reply to by Bob McConnell (not verified)

If you declare main as int you should also return an int.

Are you objecting to returning EXIT_SUCCESS or EXIT_FAILURE instead of zero or one? While C is replete with instances of "magic constants", I find it's better to use a macro definition for constants whenever possible. And I especially like using constants that the standard library supplies since I know that I can count on them being defined in compliant runtime environments.

In reply to by oxagast (not verified)

Awesome, thanks! If have time please tell about how C runtime works and layout/structure of compiler and linker files.

This is Very Helpful. specially in Interviews

Nice article!

If I may complete the "usage" function (by using the 'opt' argument) so the message be more informative when encountering an error, I propose this:

...
#define OPTSTR ":vi:o:f:h"
...
void usage(char *progname, int opt) {
if (opt == '?') fprintf(stderr, "Unknown option '-%c'\n", optopt);
if (opt == ':') fprintf(stderr, "Missing argument for option '-%c'\n", optopt);
fprintf(stderr, USAGE_FMT, progname?progname:DEFAULT_PROGNAME);
exit(EXIT_FAILURE);
/* NOTREACHED */
}

All those are worthy additions and illustrate what can happen when you build on a solid foundation; you can begin to concentrate on those small usability touches that are easy but mean a great deal to the users of your tools.

I believe SYSSEGV should be SIGSEGV.

Ok, I'm not even going to try to lie. I went back and checked my source notes and sure enough, it's SYSSEGV there too. It just goes to show that editing is hard when there isn't a compiler keeping you honest. Thanks for the extra editing, your check is in the mail! ;)

In reply to by Rares Aioanei (not verified)

Copied/pasted the code to my favorite Linux box and bang ... compile errors. :) It failed with errors around the uint32 type on various RedHat flavors (CentOS-5 and Fedora-29). However everything worked fine on MacOS.

With RedHat, the real "uint" typedefs are done in the file, which is included in bits/types.h. However a more elegant approach is to add simply an "#include

My comment was supposed to be instructive and helpful for RedHat users. But when you use gt/lt characters in your message, everything gets screwed up, stuff gets deleted. In this case completely removing the essence of my message. :-)

here a new attempt, hoping memory serves me:

-----

Copied/pasted the code to my favorite Linux box and bang ... compile errors :). It failed with errors around the uint32 type on various RedHat flavors (CentOS-5 and Fedora-29). However everything worked fine on MacOS.

With RedHat, the real "uint" typedefs are done in the file /usr/include/bits/stdint-uintn.h, which is included in bits/types.h and then again in sys/types.h. However a much more elegant approach is to add simply an #include (lt) stdint.h (gt) line in the first section of main.c .

In reply to by WWWillem (not verified)

Less than/greater than ate your post and I replied to nothing, so I think we are both operating at a deficit :) Excellent description of where those typedefs are located, and it serves to illustrate that not all C environments are the same.

In reply to by WWWillem (not verified)

Welcome to the unbounded joy that is C! I considered briefly not using uint32_t (or a cognate) in my code, but this article was originally titled "How To Write a C Main Function Like Me" so I wrote it the way I would normally write it. When I was writing C regularly, it was nearly always in support of a particular software/hardware combination that was reasonably "static" and only changed for major/minor OS updates or forklift hardware upgrades. Today's programming environments are much more fluid and you have to decide how much you want to "forage" from the environment and how much you are willing to re-implement to avoid the problems like the one you encountered.

You mentioned somewhere Solaris. I worked for ten years for Sun Microsystems (2000-2009). My Sony laptop was 100% Solaris X64 and StarOffice.

At the time X86 and Linux were foul words at Sun, but I secretly still did it in my basement. :-)

When that changed and X64 and Linux were suddenly OK, I suddenly became the local multi-boot guru and RedHat expert.

WWWillem

In reply to by JnyJny

Hey there Sun alumni!

I worked in the Austin TX office from 2000 to 2017 doing performance work and x86 was definitely a dirty word for a long time at Sun. I spent the majority of my time working on SPARC hardware with some weird jaunts from time to time (AMD, firmware, service processors, architecture simulators, stuff like that).

In reply to by WWWillem (not verified)

i guess you should ditch sys/types.h in favour of stdint.h which was implemented in c99 to standartise these accross systems

It's a nice article to how to write a good C-Program and i learnt a few new things related to C programming.

Thank you for the positive feedback! Stay tuned, I have more articles in the pipeline :)

If the Main function in C language is INT MAIN(), then you have to add RETURN VALUE of the function, here VALUE maybe 0 or 1.

Note: Here capital letter denoted a syntax or C programming functions Name, Keywords, etc.

This is excellent! I wish I had seen an example like this several years ago.

Since you mention that you also code in Python, is it too much to ask for essentially the same template using Python's standard libraries? (Pretty please with sugar on top?!)

I'm sorry I didn't write it earlier, but I'm glad you found it helpful today!

With respect to a python oriented article, I'll put it on the list. Stay tuned!

In reply to by SS (not verified)

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.