Dynamic arrays in C

by Michał Węglarz

Inspired by this viral video, I've decided to try to develop my own minimal example of a dynamic array in the C programming language. The goal of this exercise is to grow the basic understanding of fundamental data structures and build my own internal toolset of useful code snippets that I might revisit and reuse in the future.

The standard array

The regular array of five integers in C looks like this:

int arr[5] = {1, 2, 3, 4, 5};

It is a fixed-size collection of data items of the same type. The size of an array cannot be changed once it is declared. This limitation, which doesn't exist in higher-level languages, may be frustrating, but it's not difficult to overcome by implementing our own dynamic array.

Representing an array as a struct

The implementation starts with a basic struct, that wraps the pointer to the first item in the array, and specifies fields such as the current length (the actual number of meaningful items currently stored in the array) and the total capacity.

typedef struct {
	int *items;
	size_t length;
	size_t capacity;
} IntArray;

This struct is enough to represent the dynamic array consisting of a variable number of integers.

I can now initialize it, ensuring all members are zeroed by default:

IntArray numbers = {0};

Adding an item to the array

I'll create a new function responsible for adding a new item to the array and call it IntArray_push. It takes a pointer to the IntArray as a first parameter and the item to insert as a second one.

void IntArray_push(IntArray *arr, int item)
{
	arr->items[arr->length] = item;
	arr->length += 1;
}

The function inserts the item at the end of the array and adjusts the length appropriately.

It is missing some crucial parts, though. Before I can actually proceed with adding to the array, there's a series of necessary questions I have to answer first.

What if the array is already at its maximum capacity?

The array is full when its length is equal to the capacity.

if (arr->length == arr->capacity) { 

In such a case, it's necessary to expand the array.

This expression also evaluates to true when both length and capacity are 0, which occurs on the first push after initialization.

What to do when the array's capacity is zero?

I have to manually set the base capacity to some value.

if (arr->capacity == 0) arr->capacity = 256;

It doesn't really matter what value I choose here, but it's a good idea to think about the real-world datasets I'm going to work with and select the capacity based on that. As a general rule of thumb, the less frequently we get to resize it later on, the better for performance. I want to reduce the number of necessary resizing operations, which may happen to be more costly than I'd wish for.

In an ideal world, all of my data structures would have a static size known at the compile time.

How do I ensure the array has sufficient capacity for the new item?

If the base capacity has already been specified in the previous step, but I get to the point where the new item I want to insert won't fit into the array, I need to manually adjust the capacity. As it's easy to guess, the new capacity has to be sufficiently larger than the previous one. Here I decide to double it:

arr->capacity *= 2;

Similarly to the previous case, the value of two is somewhat arbitrary—other values could be fine too. The point is to find such a number that will fulfill the minimum requirements for the new value to fit in the array, without increasing the number of future resizing operations too much.

It's important to note that this only updates the capacity value—it doesn't yet resize the memory allocated for the array.

How do I actually resize the array memory?

I'm going to use realloc to request a new amount of memory, equal to the updated capacity multiplied by the size of each item.

int *new_ptr = realloc(arr->items, arr->capacity * sizeof(*arr->items));

realloc returns the pointer to the beginning of newly allocated memory.

I am going to make sure the allocation went successfully and terminate the program if the new pointer isn't a valid memory address.

if (new_ptr == NULL) {
	fprintf(stderr, "Memory reallocation failed\n");
	exit(EXIT_FAILURE);
}

Otherwise, if all goes well, I can safely update the array pointer:

arr->items = new_ptr;

It is worth noting here that realloc can also be used for initial memory allocation (when arr->items is NULL), since realloc(NULL, size) behaves like malloc(size).

Putting it together

Now that I've got the necessary pieces, the entire code required to safely push to the array and resize it, in case of insufficient capacity, looks like this:

void IntArray_push(IntArray *arr, int item)
{
	if (arr == NULL) {
		fprintf(stderr, "Array pointer is NULL\n");
		exit(EXIT_FAILURE);
	}

	if (arr->length == arr->capacity) {
		if (arr->capacity == 0)
			arr->capacity = 256;
		else
			arr->capacity *= 2;

		int *new_ptr = realloc(arr->items, arr->capacity * sizeof(*arr->items));
		if (new_ptr == NULL) {
			fprintf(stderr, "Memory reallocation failed\n");
			exit(EXIT_FAILURE);
		}
		arr->items = new_ptr;
	}
	arr->items[arr->length] = item;
	arr->length += 1;
}

I've thrown in a NULL check at the beginning, just to be safe. The base capacity initialization happens inside the if statement—this triggers on the first push when both length and capacity are 0, which is when realloc does the initial allocation.

Retrieving an item from the array

The IntArray_get helper function allows me to safely retrieve an item from the array based on its index by using pointer arithmetic. It includes a NULL check and verifies that the provided index is valid, preventing out-of-bounds access.

If the index is invalid, instead of dealing with undefined behavior, I want to fail gracefully, by returning a value that would clearly tell me that something went wrong (e.g. -1 as an error indicator), or even terminate the whole program.

int IntArray_get(IntArray *arr, int index)
{
	if (arr == NULL) {
		fprintf(stderr, "Array pointer is NULL\n");
		return -1; // or exit(EXIT_FAILURE)
	}

	if (index < 0 || index >= arr->length) {
		fprintf(stderr, "Invalid array access at index %d\n", index);
		return -1; // or e.g. exit(EXIT_FAILURE);
	}

	return *(arr->items + index);
}

Usage

int main()
{
	IntArray numbers = {0}; // Initialize the dynamic array
	for (int i = 0; i <= 267; i++) {
		IntArray_push(&numbers, i); // Fill the array with values
	}

	for (int i = 0; i < numbers.length; i++) {
		int item = IntArray_get(&numbers, i); // Retrieve values from the array
		printf("%d\n", item);
	}

	int item = IntArray_get(&numbers, 567); // Invalid array access at index 567
	printf("%d\n", item); // -1

	return 0;
}

Handling different data types

So far, I've implemented a dynamic array that can store elements of type int. But what if I want to use any other type? Since C doesn't have a built-in feature like TypeScript's generics or C++ templates, do I have to manually redefine all of the above whenever I want to store a different type of elements?

Manual approach

I can manually copy the code above, and replace only the appropriate pieces with the new type, e.g. replace int with float. It is not the "cleanest" solution, as I impose on myself a fair amount of manual labor of copying and pasting the same code and ultimately end up with some repetitive fragments. It is the most straightforward solution, though. It allows me to pragmatically focus only on the data I'm actually dealing with, rather that prematurely introduce complexity that I might not even make use of.

typedef struct {
	int *items;
	size_t length;
	size_t capacity;
} IntArray;

void IntArray_push(IntArray *arr, int item) { ... }
int IntArray_get(IntArray *arr, int index) { ... }

typedef struct {
	float *items;
	size_t length;
	size_t capacity;
} FloatArray;

void FloatArray_push(FloatArray *arr, float item) { ... }
float FloatArray_get(FloatArray *arr, int index) { ... }

Macros

Another option is to use C macros. They provide a way to work with different data types, although they don't provide type safety and can become difficult to write, understand, and debug. Additionally, since macros don't return values like functions, the most convenient (but a little awkward) approach is to pass an output variable as a parameter.

#define ARRAY_PUSH(arr, item)                                                                                                                        \
	do {                                                                                                                                             \
		if ((arr)->length == (arr)->capacity) {                                                                                                      \
			if ((arr)->capacity == 0)                                                                                                                \
				(arr)->capacity = 256;                                                                                                               \
			else                                                                                                                                     \
				(arr)->capacity *= 2;                                                                                                                \
			void *new_ptr = realloc((arr)->items, (arr)->capacity * sizeof(*(arr)->items));                                                          \
			if (new_ptr == NULL) {                                                                                                                   \
				fprintf(stderr, "Memory reallocation failed\n");                                                                                     \
				exit(EXIT_FAILURE);                                                                                                                  \
			}                                                                                                                                        \
			(arr)->items = new_ptr;                                                                                                                  \
		}                                                                                                                                            \
		(arr)->items[(arr)->length] = (item);                                                                                                        \
		(arr)->length += 1;                                                                                                                          \
	} while (0)

#define ARRAY_GET(item, arr, index)                                                                                                                  \
	do {                                                                                                                                             \
		if ((index) >= 0 && (index) < (arr)->length) {                                                                                               \
			item = *((arr)->items + (index));                                                                                                        \
		} else {                                                                                                                                     \
			fprintf(stderr, "Invalid array access at index %d\n", (index));                                                                          \
			item = (typeof(item))(-1);                                                                                                               \
		}                                                                                                                                            \
	} while (0)

#define DEFINE_ARRAY(type, name)                                                                                                                     \
	typedef struct {                                                                                                                                 \
		type *items;                                                                                                                                 \
		size_t length;                                                                                                                               \
		size_t capacity;                                                                                                                             \
	} name

int main()
{
	DEFINE_ARRAY(float, GenericFloatArray);
	GenericFloatArray numbers = {0};
	for (int i = 0; i <= 510; i++) {
		ARRAY_PUSH(&numbers, i + 0.5f);
	}
	for (int i = 0; i < numbers.length; i++) {
		float item;
		ARRAY_GET(item, &numbers, i);
		printf("item: %f\n", item);
	}
	float item;
	ARRAY_GET(item, &numbers, 921); // Invalid array access at index 921
	printf("item: %f\n", item);     // -1

	return 0;
}

Summary

While there are already many different solutions to the problem of dynamic arrays available online, with many solid and well-tested implementations, I still find it a good learning exercise to "reinvent the wheel" a little and go through the process of writing this code myself. It's certainly easier than one might think, and with this code in place, I'll be happy to reuse it in future projects.

The entire code from this blog post can be found here.

Related resources: