Linux Kernel & Driver Advanced Interview Questions

1. Process Management

How to manipulate the current process states?

Process states can be manipulated using various kernel functions and system calls:

  • set_task_state(task, state) - Set process state
  • __set_task_state(task, state) - Directly set without barriers
  • set_current_state(state) - Set current process state
  • TASK_RUNNING - Process ready to run
  • TASK_INTERRUPTIBLE - Sleeping, can be awakened by signal
  • TASK_UNINTERRUPTIBLE - Sleeping, cannot be awakened by signal
  • TASK_STOPPED - Process stopped (debugger/job control)
  • TASK_TRACED - Process traced by debugger

Example:

set_current_state(TASK_INTERRUPTIBLE);
schedule();  // Go to sleep
What are kernel threads and how are they different from user threads?

Kernel Threads: Threads created and managed by the kernel

  • No user space address space (mm_struct is NULL)
  • Created with kthread_create()
  • Used for kernel background tasks (kswapd, kworker, etc.)
  • Cannot access user memory directly
  • Run in kernel mode only

User Threads: Threads created in user space

  • Have user space address space
  • Can be lightweight or OS-scheduled
  • Can access user memory
  • Can run in both user and kernel mode

Creating kernel thread:

struct task_struct *t = kthread_create(threadfunc, 
                                       arg, "mythread");
if (!IS_ERR(t))
    wake_up_process(t);
What is the difference between fork() and vfork()?

fork(): Creates complete copy of parent process

  • Copies all memory pages (via copy-on-write for efficiency)
  • Parent continues after fork returns
  • Child gets new process ID
  • More memory intensive
  • Return value: child PID in parent, 0 in child, -1 on error

vfork(): Creates process with shared memory

  • Child shares memory with parent
  • Parent is blocked until child exec() or exit()
  • Very fast - less memory copying
  • Used when child will immediately exec()
  • Dangerous if child modifies shared memory

Code comparison:

// fork() - parent continues
pid_t pid = fork();
if (pid == 0) {
    // Child code
} else {
    // Parent continues
}

// vfork() - parent waits
pid_t pid = vfork();
if (pid == 0) {
    // Child MUST exec() or exit()
    execv(...);
} else {
    // Parent blocked until child execs
}
What is task_struct and what information does it contain?

The task_struct is the fundamental process descriptor in Linux kernel. It contains:

  • Process State: volatile long state
  • Process ID: pid_t pid, pid_t tgid (thread group ID)
  • Scheduling: int prio, int static_prio, struct list_head run_list
  • Memory: struct mm_struct *mm, struct mm_struct *active_mm
  • Files: struct files_struct *files
  • Signal handling: struct signal_struct *signal
  • Kernel stack: void *stack
  • Virtual run time: u64 se.vruntime
  • Parent/children: struct task_struct *real_parent, struct list_head children
  • Flags: unsigned int flags (PF_RUNNING, PF_EXITPIDONE, etc.)

Accessing current process:

struct task_struct *current;  // Always points to current process
current->pid;  // Get PID
current->state;  // Get state
current->comm;  // Get command name
current_thread_info();  // Get thread_info
What is process context and when does it occur?

Process Context: Execution state when kernel code runs on behalf of a process

When it occurs:

  • During system call execution
  • When exception/fault occurs
  • NOT during interrupt/softirq handling

Characteristics:

  • current macro points to current task
  • Can access process address space safely
  • Can sleep and block (process can be scheduled out)
  • Can use copy_from_user() and copy_to_user()
  • Not in interrupt context

Example:

// In syscall - process context
asmlinkage long sys_read(unsigned int fd, char __user *buf, 
                         size_t count) {
    // current points to the process making the call
    // can sleep: schedule();
    // can copy: copy_to_user(buf, kbuf, count);
}
What is a zombie process and how is it handled?

Zombie Process: A child process that has exited but parent hasn't reaped it with wait()

Why it exists:

  • Child process exits
  • Parent hasn't called wait()/waitpid() to collect exit status
  • Child process descriptor remains in kernel
  • Takes up process table entry

How kernel handles orphaned children:

  • If parent dies, child is reparented to init process
  • init periodically calls wait() to reap zombies
  • Uses exit_notify() and forget_original_parent()
  • Walks child list and reparents to init or subreaper

Code example:

// Parent process
pid_t pid = fork();
if (pid == 0) {
    // Child
    exit(5);
} else {
    // Parent - without wait, child becomes zombie!
    sleep(10);  // Child is zombie during this time
    wait(NULL);  // Reaps zombie, collects exit status
}

2. Process Scheduling

What is the CFS (Completely Fair Scheduler) and how does it work?

CFS (Completely Fair Scheduler): Default Linux process scheduler since 2.6.23

Key concepts:

  • Fair time distribution: Each process gets CPU time proportional to its weight
  • Virtual runtime: vruntime tracks how much CPU time process has received
  • Red-Black tree: Runnable processes stored in RB-tree (O(log n) insertion/deletion)
  • Leftmost process: Process with smallest vruntime (most starving) runs next
  • No time slices: Processes don't have fixed time quantum

How it selects next process:

  1. Pick leftmost process in RB-tree (smallest vruntime)
  2. Let it run for scheduler latency period
  3. Update its vruntime: vruntime += actual_runtime / weight
  4. Reinsert into RB-tree
  5. Repeat

Advantages over previous O(1) scheduler:

  • Better fairness
  • Eliminates starvation
  • Better interactive response
  • Scales well to many CPUs
What is nice value and how does it affect scheduling?

Nice Value: Priority hint for process scheduling (range: -20 to +19)

Values:

  • -20 = highest priority (most CPU)
  • 0 = default
  • +19 = lowest priority (least CPU)

How it affects CFS:

  • Maps to process weight in kernel
  • Weight determines vruntime update rate
  • Lower nice = higher weight = less vruntime increase
  • Higher nice = lower weight = more vruntime increase

Linux commands:

nice -n 10 myapp       # Start with nice 10
renice -n 5 -p 1234   # Change PID 1234 to nice 5
ps aux                 # View NI (nice) column
What is context switching and how is it handled?

Context Switching: Saving one process state and loading another

What gets saved:

  • CPU registers
  • Program counter (PC)
  • Stack pointer
  • Memory management registers (page tables)
  • TLB (Translation Lookaside Buffer)

Process:

  1. Save current process registers in task_struct
  2. Call context_switch() macro
  3. Switch memory management: switch_mm()
  4. Switch register state: switch_to()
  5. Load new process registers
  6. Jump to new process PC

Code flow:

// In scheduler
context_switch(rq, prev, next) {
    struct mm_struct *mm, *oldmm;
    
    mm = next->mm;
    oldmm = prev->active_mm;
    
    if (!mm)  // Kernel thread
        next->active_mm = oldmm;
    else
        switch_mm(oldmm, mm, next);
    
    switch_to(prev, next, prev);  // Register switch
}

3. Interrupt Handling

What are top halves and bottom halves in interrupt handling?

Top Half (ISR - Interrupt Service Routine): Immediate interrupt handler

  • Runs immediately when interrupt occurs
  • Interrupts disabled
  • Must be very fast and short
  • Can't sleep or block
  • Typically just acknowledges hardware interrupt
  • Schedules bottom half for deferred work

Bottom Half: Deferred processing

  • Runs later when interrupts are re-enabled
  • Can perform heavy work
  • Can use tasklets, softirqs, or workqueues
  • Can sleep with workqueues

Example:

// Top half - very fast
irqreturn_t interrupt_handler(int irq, void *dev_id) {
    // Acknowledge hardware
    hardware_ack();
    
    // Schedule bottom half
    tasklet_schedule(&my_tasklet);
    
    return IRQ_HANDLED;
}

// Bottom half - can do more work
void tasklet_handler(unsigned long data) {
    // Process received data
    // No interrupt context restrictions
}
What are tasklets, softirq, and workqueues? When to use which?

Tasklets: Simple deferred execution mechanism

  • Built on top of softirqs
  • Atomic (can't be interrupted)
  • Soft interrupt context (can't sleep)
  • Use: tasklet_schedule(&tasklet)
  • Runs on same CPU it was scheduled
  • Good for: moderate work, no sleeping needed

Softirqs: More lightweight than tasklets

  • Can be re-entered (multiple instances can run)
  • Very fast
  • Limited (10 softirq vectors)
  • Used by kernel for critical tasks
  • Examples: NET_RX_SOFTIRQ, TIMER_SOFTIRQ
  • Good for: high-priority deferred work

Workqueues: Heavy deferred work

  • Run in process context (not softirq)
  • Can sleep and block
  • Can use mutex, call blocking functions
  • Can be delayed or scheduled
  • Good for: I/O, long-running work

Comparison table:

Feature Tasklet Softirq Workqueue
Context Softirq Softirq Process
Can sleep No No Yes
Can block No No Yes
Speed Fast Fastest Slower
Use case Medium work Critical I/O, long work

4. Memory Management

What are memory zones and how is memory allocated?

Memory Zones: Different regions of physical memory

  • ZONE_DMA: 0-16 MB (for DMA devices with address limitation)
  • ZONE_NORMAL: 16 MB - 896 MB (directly mapped kernel memory)
  • ZONE_HIGHMEM: Above 896 MB (not directly mapped, for 32-bit only)
  • ZONE_MOVABLE: Movable pages (for hot plugging)

Memory allocation functions:

  • alloc_pages(gfp_mask, order) - Allocate pages (return page struct)
  • __get_free_pages(gfp_mask, order) - Allocate pages (return address)
  • kmalloc(size, gfp_mask) - Allocate memory < page size
  • vmalloc(size) - Allocate non-contiguous virtual memory
  • get_zeroed_page() - Allocate zero-filled page

GFP (Get Free Pages) flags:

  • GFP_KERNEL - Normal allocation, can sleep (for process context)
  • GFP_ATOMIC - Atomic context, cannot sleep (for ISR/softirq)
  • GFP_DMA - Allocate from DMA zone
  • __GFP_HIGHMEM - Can use high memory (32-bit)

Example:

// Allocate 4 pages (16 KB)
struct page *pages = alloc_pages(GFP_KERNEL, 2);

// Allocate memory for structure
struct mydata *data = kmalloc(sizeof(*data), GFP_KERNEL);

// In interrupt - must not sleep
unsigned long *buf = kmalloc(1024, GFP_ATOMIC);

// Allocate virtual memory (non-contiguous)
void *vbuf = vmalloc(10000);
What is kmalloc vs vmalloc and when to use each?

kmalloc: Physically and virtually contiguous memory

  • Allocates from slab allocator
  • Physically contiguous pages
  • Virtually contiguous (identity mapped)
  • Limited by contiguous free memory
  • Fast allocation
  • Good for: DMA buffers, small allocations
  • Max size: typically 128 KB - 256 KB

vmalloc: Virtually contiguous, physically scattered

  • Allocates from page allocator
  • Can be non-contiguous physically
  • Virtually contiguous (via page tables)
  • Can allocate larger areas
  • Slower than kmalloc (page table setup)
  • Good for: large buffers, stack allocation
  • Cannot use for DMA (physical addresses scattered)

Comparison:

Feature kmalloc vmalloc
Physical contiguity Yes No
Speed Fast Slower
DMA safe Yes No
Max size 128-256 KB Larger
Use Small, DMA Large buffers

5. Kernel Synchronization

What is a spinlock and when should it be used?

Spinlock: Busy-waiting lock for short critical sections

  • Task spins in a loop checking lock
  • Does NOT sleep/block
  • Disables preemption
  • Can be used in ISR context
  • Good for: very short critical sections
  • Bad for: long operations (wastes CPU)

When to use:

  • In interrupt handler
  • Short critical sections (< 10 microseconds)
  • When you can't sleep
  • Per-CPU synchronization

Usage:

DEFINE_SPINLOCK(mylock);

// Acquire lock
spin_lock(&mylock);
// Critical section - must be very short!
spin_unlock(&mylock);

// In interrupt context
spin_lock_irqsave(&mylock, flags);
// Critical section
spin_unlock_irqrestore(&mylock, flags);

// Nested locks - always lock in same order
spin_lock(&lock1);
spin_lock(&lock2);
// ...
spin_unlock(&lock2);
spin_unlock(&lock1);

Types:

  • spin_lock() - Basic spinlock
  • spin_lock_irq() - Disables interrupts
  • spin_lock_irqsave() - Saves and restores interrupt state
  • spin_lock_bh() - Disables bottom halves
What are atomic operations and when are they used?

Atomic Operations: Operations that complete without interruption

  • Cannot be interrupted by context switch or interrupt
  • Implemented as atomic CPU instructions
  • No spinlock needed
  • Very fast
  • Limited to simple operations (increment, decrement, set, clear)

Common atomic operations:

atomic_t count;

atomic_set(&count, 5);          // Set to 5
atomic_read(&count);             // Read value
atomic_inc(&count);              // Increment
atomic_dec(&count);              // Decrement
atomic_add(3, &count);           // Add value
atomic_sub(2, &count);           // Subtract
atomic_inc_and_test(&count);     // Inc, test if zero
atomic_dec_and_test(&count);     // Dec, test if zero
atomic_xchg(&count, 10);         // Exchange
atomic_cmpxchg(&count, 5, 10);   // Compare and exchange

When to use:

  • Simple counters
  • Reference counting
  • Flags that need atomic updates
  • Performance-critical code with simple operations

Example - reference counting:

struct myobj {
    atomic_t refcount;
};

// Get reference
void get_obj(struct myobj *obj) {
    atomic_inc(&obj->refcount);
}

// Release reference
void put_obj(struct myobj *obj) {
    if (atomic_dec_and_test(&obj->refcount))
        kfree(obj);  // Free when count reaches 0
}

6. Device Driver Basics

What are the main components of a character device driver?

Character Device Driver Components:

  • Module Init Function: Register driver with kernel
  • Module Exit Function: Unregister driver
  • File Operations (fops): open, read, write, ioctl, release
  • cdev Structure: Kernel representation of char device
  • Device Number: Major and minor numbers
  • Class/Device Creation: For udev to create /dev nodes

Complete example:

#include 
#include 
#include 
#include 

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Your Name");
MODULE_DESCRIPTION("Simple character device");

static dev_t devno;  // Device number
static struct cdev my_cdev;  // cdev structure
static struct class *my_class;  // Device class

// File operations
static int my_open(struct inode *i, struct file *f) {
    pr_info("Device opened\n");
    return 0;
}

static ssize_t my_read(struct file *f, char __user *buf, 
                       size_t len, loff_t *off) {
    char data[] = "Hello";
    copy_to_user(buf, data, sizeof(data));
    return sizeof(data);
}

static int my_release(struct inode *i, struct file *f) {
    pr_info("Device closed\n");
    return 0;
}

static const struct file_operations my_fops = {
    .owner = THIS_MODULE,
    .open = my_open,
    .read = my_read,
    .release = my_release,
};

// Module initialization
static int __init my_init(void) {
    // Allocate device number
    alloc_chrdev_region(&devno, 0, 1, "mydev");
    
    // Initialize and add cdev
    cdev_init(&my_cdev, &my_fops);
    cdev_add(&my_cdev, devno, 1);
    
    // Create device class
    my_class = class_create(THIS_MODULE, "mydev_class");
    device_create(my_class, NULL, devno, NULL, "mydev");
    
    pr_info("Device created\n");
    return 0;
}

// Module cleanup
static void __exit my_exit(void) {
    device_destroy(my_class, devno);
    class_destroy(my_class);
    cdev_del(&my_cdev);
    unregister_chrdev_region(devno, 1);
    
    pr_info("Device destroyed\n");
}

module_init(my_init);
module_exit(my_exit);
What are IOCTLs and how are they used in device drivers?

IOCTL (Input/Output Control): Device-specific commands beyond read/write

  • Used for configuration, status, control operations
  • Passed from user space via ioctl() syscall
  • Each device driver defines its own ioctl commands
  • Direction: none, read, write, or read/write

IOCTL command format:

_IOC(direction, type, nr, size)

Macros:
_IO(type, nr)           // No argument
_IOR(type, nr, size)    // Read from device
_IOW(type, nr, size)    // Write to device
_IOWR(type, nr, size)   // Read/Write

Example:
#define IOCTL_SET_VALUE    _IOW('k', 1, int)
#define IOCTL_GET_VALUE    _IOR('k', 2, int)

Driver implementation:

// User space
int val = 42;
ioctl(fd, IOCTL_SET_VALUE, &val);
ioctl(fd, IOCTL_GET_VALUE, &val);

// Kernel space
static long my_ioctl(struct file *f, unsigned int cmd, 
                     unsigned long arg) {
    int value;
    
    switch(cmd) {
        case IOCTL_SET_VALUE:
            copy_from_user(&value, (int __user *)arg, sizeof(int));
            pr_info("Set value: %d\n", value);
            break;
            
        case IOCTL_GET_VALUE:
            value = 100;
            copy_to_user((int __user *)arg, &value, sizeof(int));
            break;
            
        default:
            return -EINVAL;
    }
    
    return 0;
}

static const struct file_operations my_fops = {
    .unlocked_ioctl = my_ioctl,
};