Some Simple Tricks in C (and C++)

I have been in a fair number of programming courses in my life. Several of these have been in the hallowed C programming language. There was a time when C and C++ were the only languages I could call myself proficient in.

Over the years, I have seen a constant pattern in all these courses - they teach you all the basics you should learn, but none of what you will need. For instance, you have been taught to use structures, but not much about how to use them to make sense of vile madness that you might often encounter while coding in C. There is this neat trick I will show you which uses structures and can really simplify working with arrays. There’s something easy you can do with macros to make dynamic allocations tons of times readable. And yes, there are amazing tools right inside stdio.h which can make string manipulation much easier. And you can use this trick to implement some radical changes in the way you code C.

This is not an article for professional C programmers. This is for Python and Javascript programmers who would like to be able to really use C, the way they are used to making their favourite languages do useful things.

Typedef the structs, unless it contains a pointer to itself

Generally, when we work with the struct datatype, we define the structures something like this:

struct Position
{
  double latitude;
  double longitude;
  double altitude;
};

And then we declare the variables of this structure something like this:

struct Position mypos;

I have no problem with this - it is a classic method, something every good C programmers would know - but must I place the struct keyword twice here? Might I suggest a simple idiom for defining the structure which changes the whole feel of the code to something more modern and less archaic?

typedef struct
{
    double latitude;
    double longitude;
    double altitude;
} Position;

And that is all. We do not give the structure a name, but we use it as part of a typedef to make a new name for the specific datatype which is this structure. Now, this is what the declaration would look like:

Position mypos;

This is much more readable to anyone who is more familiar with OOP code than with C code.

Why would someone do this? Because even for those of us who work only in higher level languages like Python, Javascript and Java, there are times when we just have to use a small snippet of C. (scratch that, there is no such thing as a small snippet of C - it is at least thirty lines;) This is a simple tool to maintain your sanity if you are ever stuck in such a situation and need to make a C code, but are really wishing that there was some OOP feel to it. Speaking of OOP feel,

Place arrays inside a structure which also contains their length

Look at this function, which reads an array of integers in C and computes their sum:

int sum(int *Arr, int length)
{
    int sum = 0, i;

    for(i = 0; i < length; i++)
        sum = sum + Arr[i];

    return sum;
}

A simple, friendly, and irreconcilably ugly function. Why, oh why, why in the name of the Great Chthullu, is there an additional argument for the length of the array? Literally NO other programming language - not even that ugly disaster Java - requires this. Only in C and C++ - and that too not in practical C++ where they use sensible things like std::vector - must we deal with this unaesthetic. I have a simple solution to deal with this mess, and it involves some primitive encapsulation in C:

typedef
{
    int *val;
    int length;
} Int;

Yes, I am making a structure, naming it Int with a capital I using a typedef, and making it contain a pointer for dynamic allocation, and a humble integer to store the length of the array so dynamically allocated. Not exactly rocket science, is it? Now all we have to do is make some functions initialize the structure, sort of like a constructor, if you will. Once this is done, the function I wrote above will become much more beautiful:

int sum(Int Arr)
{
    int sum = 0, i;

    for(i = 0; i < Arr.length; i++)
        sum = sum + Arr.val[i];

    return sum;
}

There. Did that hurt? It would if we didn’t have an accurate value of the length variable, but if we set the length variable at the time when we allocate the memory, there should be nothing wrong here. The constructor is simply something like this:

Int newInt(int length)
{
    Int A = {NULL, 0};
    A.val = (int *)malloc(sizeof(int) * length);
    A.length = length;
    return A;
}

There is just one thing to note - while the pointer val is always being passed by value, the things it points to are not. Change the members of Arr.val inside a function at your own peril.

The only bit of ugly code in that function is the line with the malloc in it. And that is what I will take on next.

Use a macro instead of the mallocs

Slap this on the top of your code:

#define new(dtype,length) (dtype *)malloc(sizeof(dtype)*(length))

When you want to dynamically allocate an array of any type, use the new macro it makes. It’s really simple:

// Integer array of length 5:
int *Arr = new(int, 5);

// Double array of length 2:
double *Trouble = new(double, 2);

// String of length 25, excluding the null character:
char *name = new(char, 26);

Do I really need to explain this? All malloc statements are too similar to one another to merit being written over and over again. Just use this single-line boilerplate to replace all malloc statements with a decent new statement like you would use in C++ or Java.

Use the formatted String I/O functions

Note the last example. We have created a string. As everyone knows, strings in C are an incredible pain to have to work with. Even with the string.h library, they are still an epic pain to deal with. All the method pass pointers around. Some of them change the strings in their arguments, something these functions can do only because we have to pass them strings by pointer. Some of them don’t, because the programmer in question was feeling kind at the moment. Its a real mess, and string.h has always, at least for me, been a visit to the documentation. Even with all that, there is no easy way to parse a string into an integer or unparse an integer into a string, and woe be upon you if you were to ever come across such a situation as would need

There are two incredibly powerful functions in the good old stdio.h header of C which no one talks about. Teachers are suspiciously silent about them. If you have ever come across them in a class lecture, do let me know about it. Even good books on C do not spend much time extolling the virtues of these badboys. I do not know why these two amazing functions must always languish in obscurity.

The unsung heroes I speak of are, of course, sprintf and sscanf methods. These are incredibly useful. What do they do? Well, they do exactly what their cousins fprintf and fscanf do, but instead of writing to and reading from files, these functions write to strings.

Yes. Instead of taking a FILE pointer, they take a pointer to a char array as their first argument. Then they simply write to or read from the char array itself!

Just imagine what this means. This is a simple statement to concatenate two strings str1 and str2 and place them inside a new string, target:

sprintf(target, "%s%s", s1, s2);

Now, why would one not just use good old strcat? Well, there are three reasons:

  1. This is faster. Its about 20% faster than strcat, since it accesses the string stored in target with a SYSCALL instead of looping over it. I have checked using GCC 9.3 on Manjaro with kernel 5.4 (Linux).
  2. What if you wanted to concatenate two strings with a space between them, or a comma between them? Such as, making a full name out of first and last names?
  3. What if you want to concatenate an integer with a number?

Putting a space between the strings:

sprintf(fullname, "%s %s", firstname, lastname);

One. Line.

The best part is that you can put literally anything between the two strings - a comma, a space, a tab, a newline, and so on. Want to remove the last two characters of the first string before slapping the second one at its end? Put a pair of \b between them. Want to replace the first N characters of a string by “abcde”? Put a carriage return between them. The possibilities are endless!

Concatenating non-strings:

You have an integer. You have a string. You want to append the integer at the end of the string. Something that goes like this in Python:

str2 = str1 + str(n)

Or like this in Javascript:

let str2 = str1 + JSON.stringify(n);

You get the idea.

Doing this in C is insanely difficult. Unless use sprintf. Then it becomes insanely easy. Notice this code:

sprintf(str2, "%s%d", str1, n);

Once again, note that the things being concatenated need not be strings at all! And that is what leads us to our next great trick:

Parsing an Integer or a Double:

Yes, parsing. there is an integer written down like we would write a number - in other words, digit-by-digit - in the string S. Your job is to put it in the integer variable n, which should contain the integer in the binary format, the way integers are meant to exist in a RAM.

As simple as parseInt in Javascript or int in Python. As simple as Integer.parseInt (well, okay, grotesque, but simple) in Java. As simple as int.parse in C#. A nightmare in C and C++.

Enter sscanf. Note how to use this one. Its like fscanf, but it reads your variables off a string, not a file. Go:

sscanf(S, "%d", &n);

Done. Parsing complete.

It is equally easy to convert existing integer n into a string and store it in a string variable S. Simply print to S instead of scanning from it:

sprintf(S, "%d", n);

Now, there is one important matter of note: you are going to require stdio to use these two functions. If you are placing them inside a header file, you need to make sure that the header file is called after the stdio.h header.


That is all for now. In my next article, I will discuss how to implement the split and join functions. I will be using C++ then, since the std::vector class is much better than direct arrays for storing a list of strings, which our splitfunction shall implement and our join function will read as an argument.

However, with those last two functions, it is possible to make something incredibly more powerful than mere tricks and idioms. I am talking, of course, about the var - Variant - datatype.

This will require some of the juicy abstraction C++ provides. I will write about implementing a dynamically typed Variant datatype and using it to make a vector of variants of myriad types in my next article.

Have a great day. And please do comment. I would like to know if I am wrong about anything, or if these techniques have already been discovered by other programmers and named after them (I do not intend to plagiarize!) or if there are even greater simplifications available.


el jefe picture

Written like a high level language programmer. I couldn't find a single C code repository from you. This article pretty much confirms you've had little experience with the language.

A simple, friendly, and irreconcilably ugly function. Why, oh why, why in the name of the Great Chthullu, is there an additional argument for the length of the array? Literally NO other programming language - not even that ugly disaster Java - requires this.

Because there is no way a function could deduce the length of the data behind the pointer. This is C we're talking about, not some high level language that you're trying to coerce C into. Also, literally no other language requires this? How about Assembly...?

There. Did that hurt?

Yes, actually. We've now spent time making the API less flexible, introduced another structure and a constructor function for it, and allocated dynamic memory (which must be freed) just so that we can pass one argument instead of two. Perhaps you didn't think about this beyond your little code example, but there's going to be a hell of a lot more stuff to do to call your function now. I think it takes less effort to rewrite the sum function than to use yours. Also, you typedef'd the struct into "Int". This is just confusing and crude. It is now implied that the type is a primitive integer type. You've purposefully hidden an important detail about the type you're working with, again, just to spite the API user.

Use a macro instead of the mallocs

That is a cute macro. However, when you're going to be allocating memory in real world code, you're probably going to get the size by referring to the variable name and not the type, since that's less prone to bugs when stuff changes. Now your clean little macro call is just going to look weird and confusing, e.g. struct foo bar = new(bar, 1); Is it so hard to use malloc/calloc like everyone else? Also, why use malloc in this scenario? Let calloc do the multiplication. It checks for overflow anyway.

Paul Humphreys picture

Won't your newInt function cause undefined behaviour as Int A is an automatic variable so will be defined on the stack and have its value overwritten when another function is called that reuses the same stack space? I believe for this function to work correctly I believe you'd instead need to have a Int* A variable, then allocate some memory with A = malloc(sizeof(Int));

Also, sprintf can be dangerous if you're not aware of the memory size needed to store your string. The common extension asprintf will fix this by always internally malloc'ing the right amount of space needed (which will need to be free'd afterwards).

Finally, I believe there are some security issues associated with the scanf family, but I can't remember the details right now. I was taught to instead use the strtod function to obtain double precision floating point numbers from strings. To get strings from a file I was taught to instead write a wrapper around calling fgets multiple times with a fixed size buffer whist joining the buffer's intermediate values together with realloc in the process.

Rajarshi Bandopadhyay picture

Thank you. These are valuable inputs. I have already tested the newInt method, and it did work. I can show you the code:

#include &lt;stdio.h>
#include &lt;stdlib.h>

typedef struct
{
    int *val;
    int length;
} Int;

void print(Int Arr)
{
    int i;

    puts("Members of the array are: ");
    for(i = 0; i &lt; Arr.length; i++)
        printf(" %d. %d\n", (i+1), Arr.val[i]);
}

Int newInt(int size)
{
    Int A = {NULL, 0};
    A.val = (int *)malloc( sizeof(int) * size );
    A.length = size;
    return A;
}

int main()
{
    Int Ar = newInt(4);
    Ar.val[2] = 3;
    print(Ar);
    return 0;
}

Compiled with GCC 9.3 and LD 2.34 and run on Manjaro with Linux Kernel 5.4, it works like a charm. The Int variable is meant to be passed as a value, not by reference.

I must thank you for warning me about the sprintf being dangerous. It was something I did not know. Also, the asprintf function was a nice thing to learn about.

Same goes for the scanf family. I didn't know it was dangerous. Someone on Reddit told me to use atoi for parsing instead, and I think I shall stick to that for now. I wonder that the dangers of the scanf is - is it that the function requires addresses as arguments, and therefore addresses to internal variables are placed on the call stack?

Anyway, that was very informative to read. Can you tell me where you were taught to use fgets with realloc? Seems like a useful approach to input handling.</stdlib.h></stdio.h>

Rhett Trickett picture

Since learning and enjoying Python I've wanted to pick up a more performant language and so started investigating C++ and then C. Particularly because of their versatility and being used in some interesting applications like web browsers, networking, embedded devices, compilers and even the CPython implementation of the Python language itself. They seem to offer a path to all types of interesting projects! I also tried Java but couldn't stomach its verbosity, which perhaps is more suited to larger codebases.

Like you say, many C/C++ tutorials will teach you the semantics but not really impart much humane wisdom when using the languages. Not being able to simply access the length of an array (after using Python's built-in len()) was probably one of the first surprises I encountered, but I figured that this must come with the territory.

So this post was pleasantly surprising to read, to see that there are ways to construct helpful tools. These look like some clever tricks that don't seem to obscure anything in the code. Did you come up with these yourself? Have you seen similar technicques used in other codebases? I'm curious about whether you think other professional C/C++ programmers would welcome or discourage such practices? They seem to circumvent aspects of the language in order to improve working with it. It reminds me of the line that says "C wears well as one’s experience with it grows" from the preface to The C Programming Language book. Perhaps it's my beginner's naivitey.

This was a great first post Rajarshi, thank you. I genuinely look forward to reading about your implementations of split and join. Welcome to Able!

Rajarshi Bandopadhyay picture

Thank you for your warm welcome. It's really lovely to see such appreciation on my first post here. My experiences were sort of the reverse of yours - I began my serious programming education with C and C++, and eventually moved to Python. It is perhaps why I have learnt to value the abstractions that Python provides.

Most of the methods I wrote about here are tricks I came up with myself. I may not have been the first to do so, of course - several other programmers may have already been using a macro for malloc, for instance - but apart from the first trick of using a typedef on the struct, these are all things I devised while writing C++ code after several years of Python had rusted my C/C++ skills.

I came across a programmer's guide somewhere on the internet which first mentioned the use of typedefs to improve the readability of structures. This was done, not by using the sane typedef implementation that any schoolkid would have thought of, but by doing this:

struct Position
{
    int latitude;
    int longitude;
    int altitude;
};
typedef struct Position POSITION;

Apparently, there is some Divine Commandment that a typedef must always be named exclusively using uppercase characters. It is the reason why the FILE structure is so named. This makes sense in the context of prevalent programming practices in the 70's and 80's. According to Trudeau, sexism has been deprecated since 2015. Its 2020 now, when are we going to deprecate this uppercase madness? You only need one peek at the WinAPI header files to see where this craze has led us. There are typedefs that make no sense: typedef unsigned int UINT; I mean, why in the name of Maxwell's Demon must we do that?

That said, while nobody in my university or any programming course I have been to has ever taught me about the string I/O functions, an Israeli friend of mine tells me that his university did, in fact, teach them. Perhaps our Indian Computer Science courses are simply potato. I honestly cannot tell. Do they teach String I/O in the US, Canada, Australia, Europe, etc.?