Linux Kernel & Driver Advanced Interview Questions

Q: What is CFS scheduler in Linux

CFS (Completely Fair Scheduler) is the default Linux process scheduler that provides fair CPU time distribution using virtual runtime and red-black trees.

Question

How to manipulate the current process states?

Answer 1

Process states can be manipulated using various kernel functions and system calls:

set_task_state(task, state) - Set process state
__set_task_state(task, state) - Directly set without barriers
set_current_state(state) - Set current process state
TASK_RUNNING - Process ready to run
TASK_INTERRUPTIBLE - Sleeping, can be awakened by signal
TASK_UNINTERRUPTIBLE - Sleeping, cannot be awakened by signal
TASK_STOPPED - Process stopped (debugger/job control)
TASK_TRACED - Process traced by debugger

Example:

set_current_state(TASK_INTERRUPTIBLE);
schedule();  // Go to sleep

Answer 2

Kernel Threads: Threads created and managed by the kernel

No user space address space (mm_struct is NULL)
Created with kthread_create()
Used for kernel background tasks (kswapd, kworker, etc.)
Cannot access user memory directly
Run in kernel mode only

User Threads: Threads created in user space

Have user space address space
Can be lightweight or OS-scheduled
Can access user memory
Can run in both user and kernel mode

Creating kernel thread:

struct task_struct *t = kthread_create(threadfunc, 
                                       arg, "mythread");
if (!IS_ERR(t))
    wake_up_process(t);

Answer 3

fork(): Creates complete copy of parent process

Copies all memory pages (via copy-on-write for efficiency)
Parent continues after fork returns
Child gets new process ID
More memory intensive
Return value: child PID in parent, 0 in child, -1 on error

vfork(): Creates process with shared memory

Child shares memory with parent
Parent is blocked until child exec() or exit()
Very fast - less memory copying
Used when child will immediately exec()
Dangerous if child modifies shared memory

Code comparison:

// fork() - parent continues
pid_t pid = fork();
if (pid == 0) {
    // Child code
} else {
    // Parent continues
}

// vfork() - parent waits
pid_t pid = vfork();
if (pid == 0) {
    // Child MUST exec() or exit()
    execv(...);
} else {
    // Parent blocked until child execs
}

Answer 4

The task_struct is the fundamental process descriptor in Linux kernel. It contains:

Process State: volatile long state
Process ID: pid_t pid, pid_t tgid (thread group ID)
Scheduling: int prio, int static_prio, struct list_head run_list
Memory: struct mm_struct *mm, struct mm_struct *active_mm
Files: struct files_struct *files
Signal handling: struct signal_struct *signal
Kernel stack: void *stack
Virtual run time: u64 se.vruntime
Parent/children: struct task_struct *real_parent, struct list_head children
Flags: unsigned int flags (PF_RUNNING, PF_EXITPIDONE, etc.)

Accessing current process:

struct task_struct *current;  // Always points to current process
current->pid;  // Get PID
current->state;  // Get state
current->comm;  // Get command name
current_thread_info();  // Get thread_info

Answer 5

Process Context: Execution state when kernel code runs on behalf of a process

When it occurs:

During system call execution
When exception/fault occurs
NOT during interrupt/softirq handling

Characteristics:

current macro points to current task
Can access process address space safely
Can sleep and block (process can be scheduled out)
Can use copy_from_user() and copy_to_user()
Not in interrupt context

Example:

// In syscall - process context
asmlinkage long sys_read(unsigned int fd, char __user *buf, 
                         size_t count) {
    // current points to the process making the call
    // can sleep: schedule();
    // can copy: copy_to_user(buf, kbuf, count);
}

Answer 6

Zombie Process: A child process that has exited but parent hasn't reaped it with wait()

Why it exists:

Child process exits
Parent hasn't called wait()/waitpid() to collect exit status
Child process descriptor remains in kernel
Takes up process table entry

How kernel handles orphaned children:

If parent dies, child is reparented to init process
init periodically calls wait() to reap zombies
Uses exit_notify() and forget_original_parent()
Walks child list and reparents to init or subreaper

Code example:

// Parent process
pid_t pid = fork();
if (pid == 0) {
    // Child
    exit(5);
} else {
    // Parent - without wait, child becomes zombie!
    sleep(10);  // Child is zombie during this time
    wait(NULL);  // Reaps zombie, collects exit status
}

Answer 7

CFS (Completely Fair Scheduler): Default Linux process scheduler since 2.6.23

Key concepts:

Fair time distribution: Each process gets CPU time proportional to its weight
Virtual runtime: vruntime tracks how much CPU time process has received
Red-Black tree: Runnable processes stored in RB-tree (O(log n) insertion/deletion)
Leftmost process: Process with smallest vruntime (most starving) runs next
No time slices: Processes don't have fixed time quantum

How it selects next process:

Pick leftmost process in RB-tree (smallest vruntime)
Let it run for scheduler latency period
Update its vruntime: vruntime += actual_runtime / weight
Reinsert into RB-tree
Repeat

Advantages over previous O(1) scheduler:

Better fairness
Eliminates starvation
Better interactive response
Scales well to many CPUs

Answer 8

Nice Value: Priority hint for process scheduling (range: -20 to +19)

Values:

-20 = highest priority (most CPU)
0 = default
+19 = lowest priority (least CPU)

How it affects CFS:

Maps to process weight in kernel
Weight determines vruntime update rate
Lower nice = higher weight = less vruntime increase
Higher nice = lower weight = more vruntime increase

Linux commands:

nice -n 10 myapp       # Start with nice 10
renice -n 5 -p 1234   # Change PID 1234 to nice 5
ps aux                 # View NI (nice) column

Answer 9

Context Switching: Saving one process state and loading another

What gets saved:

CPU registers
Program counter (PC)
Stack pointer
Memory management registers (page tables)
TLB (Translation Lookaside Buffer)

Process:

Save current process registers in task_struct
Call context_switch() macro
Switch memory management: switch_mm()
Switch register state: switch_to()
Load new process registers
Jump to new process PC

Code flow:

// In scheduler
context_switch(rq, prev, next) {
    struct mm_struct *mm, *oldmm;
    
    mm = next->mm;
    oldmm = prev->active_mm;
    
    if (!mm)  // Kernel thread
        next->active_mm = oldmm;
    else
        switch_mm(oldmm, mm, next);
    
    switch_to(prev, next, prev);  // Register switch
}

Answer 10

Top Half (ISR - Interrupt Service Routine): Immediate interrupt handler

Runs immediately when interrupt occurs
Interrupts disabled
Must be very fast and short
Can't sleep or block
Typically just acknowledges hardware interrupt
Schedules bottom half for deferred work

Bottom Half: Deferred processing

Runs later when interrupts are re-enabled
Can perform heavy work
Can use tasklets, softirqs, or workqueues
Can sleep with workqueues

Example:

// Top half - very fast
irqreturn_t interrupt_handler(int irq, void *dev_id) {
    // Acknowledge hardware
    hardware_ack();
    
    // Schedule bottom half
    tasklet_schedule(&my_tasklet);
    
    return IRQ_HANDLED;
}

// Bottom half - can do more work
void tasklet_handler(unsigned long data) {
    // Process received data
    // No interrupt context restrictions
}

Answer 11

Tasklets: Simple deferred execution mechanism

Built on top of softirqs
Atomic (can't be interrupted)
Soft interrupt context (can't sleep)
Use: tasklet_schedule(&tasklet)
Runs on same CPU it was scheduled
Good for: moderate work, no sleeping needed

Softirqs: More lightweight than tasklets

Can be re-entered (multiple instances can run)
Very fast
Limited (10 softirq vectors)
Used by kernel for critical tasks
Examples: NET_RX_SOFTIRQ, TIMER_SOFTIRQ
Good for: high-priority deferred work

Workqueues: Heavy deferred work

Run in process context (not softirq)
Can sleep and block
Can use mutex, call blocking functions
Can be delayed or scheduled
Good for: I/O, long-running work

Comparison table:

Feature	Tasklet	Softirq	Workqueue
Context	Softirq	Softirq	Process
Can sleep	No	No	Yes
Can block	No	No	Yes
Speed	Fast	Fastest	Slower
Use case	Medium work	Critical	I/O, long work

Answer 12

Memory Zones: Different regions of physical memory

ZONE_DMA: 0-16 MB (for DMA devices with address limitation)
ZONE_NORMAL: 16 MB - 896 MB (directly mapped kernel memory)
ZONE_HIGHMEM: Above 896 MB (not directly mapped, for 32-bit only)
ZONE_MOVABLE: Movable pages (for hot plugging)

Memory allocation functions:

alloc_pages(gfp_mask, order) - Allocate pages (return page struct)
__get_free_pages(gfp_mask, order) - Allocate pages (return address)
kmalloc(size, gfp_mask) - Allocate memory < page size
vmalloc(size) - Allocate non-contiguous virtual memory
get_zeroed_page() - Allocate zero-filled page

GFP (Get Free Pages) flags:

GFP_KERNEL - Normal allocation, can sleep (for process context)
GFP_ATOMIC - Atomic context, cannot sleep (for ISR/softirq)
GFP_DMA - Allocate from DMA zone
__GFP_HIGHMEM - Can use high memory (32-bit)

Example:

// Allocate 4 pages (16 KB)
struct page *pages = alloc_pages(GFP_KERNEL, 2);

// Allocate memory for structure
struct mydata *data = kmalloc(sizeof(*data), GFP_KERNEL);

// In interrupt - must not sleep
unsigned long *buf = kmalloc(1024, GFP_ATOMIC);

// Allocate virtual memory (non-contiguous)
void *vbuf = vmalloc(10000);

Answer 13

kmalloc: Physically and virtually contiguous memory

Allocates from slab allocator
Physically contiguous pages
Virtually contiguous (identity mapped)
Limited by contiguous free memory
Fast allocation
Good for: DMA buffers, small allocations
Max size: typically 128 KB - 256 KB

vmalloc: Virtually contiguous, physically scattered

Allocates from page allocator
Can be non-contiguous physically
Virtually contiguous (via page tables)
Can allocate larger areas
Slower than kmalloc (page table setup)
Good for: large buffers, stack allocation
Cannot use for DMA (physical addresses scattered)

Comparison:

Feature	kmalloc	vmalloc
Physical contiguity	Yes	No
Speed	Fast	Slower
DMA safe	Yes	No
Max size	128-256 KB	Larger
Use	Small, DMA	Large buffers

Answer 14

Spinlock: Busy-waiting lock for short critical sections

Task spins in a loop checking lock
Does NOT sleep/block
Disables preemption
Can be used in ISR context
Good for: very short critical sections
Bad for: long operations (wastes CPU)

When to use:

In interrupt handler
Short critical sections (< 10 microseconds)
When you can't sleep
Per-CPU synchronization

Usage:

DEFINE_SPINLOCK(mylock);

// Acquire lock
spin_lock(&mylock);
// Critical section - must be very short!
spin_unlock(&mylock);

// In interrupt context
spin_lock_irqsave(&mylock, flags);
// Critical section
spin_unlock_irqrestore(&mylock, flags);

// Nested locks - always lock in same order
spin_lock(&lock1);
spin_lock(&lock2);
// ...
spin_unlock(&lock2);
spin_unlock(&lock1);

Types:

spin_lock() - Basic spinlock
spin_lock_irq() - Disables interrupts
spin_lock_irqsave() - Saves and restores interrupt state
spin_lock_bh() - Disables bottom halves

Answer 15

Atomic Operations: Operations that complete without interruption

Cannot be interrupted by context switch or interrupt
Implemented as atomic CPU instructions
No spinlock needed
Very fast
Limited to simple operations (increment, decrement, set, clear)

Common atomic operations:

atomic_t count;

atomic_set(&count, 5);          // Set to 5
atomic_read(&count);             // Read value
atomic_inc(&count);              // Increment
atomic_dec(&count);              // Decrement
atomic_add(3, &count);           // Add value
atomic_sub(2, &count);           // Subtract
atomic_inc_and_test(&count);     // Inc, test if zero
atomic_dec_and_test(&count);     // Dec, test if zero
atomic_xchg(&count, 10);         // Exchange
atomic_cmpxchg(&count, 5, 10);   // Compare and exchange

When to use:

Simple counters
Reference counting
Flags that need atomic updates
Performance-critical code with simple operations

Example - reference counting:

struct myobj {
    atomic_t refcount;
};

// Get reference
void get_obj(struct myobj *obj) {
    atomic_inc(&obj->refcount);
}

// Release reference
void put_obj(struct myobj *obj) {
    if (atomic_dec_and_test(&obj->refcount))
        kfree(obj);  // Free when count reaches 0
}

Answer 16

Character Device Driver Components:

Module Init Function: Register driver with kernel
Module Exit Function: Unregister driver
File Operations (fops): open, read, write, ioctl, release
cdev Structure: Kernel representation of char device
Device Number: Major and minor numbers
Class/Device Creation: For udev to create /dev nodes

Complete example:

#include 
#include 
#include 
#include 

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Your Name");
MODULE_DESCRIPTION("Simple character device");

static dev_t devno;  // Device number
static struct cdev my_cdev;  // cdev structure
static struct class *my_class;  // Device class

// File operations
static int my_open(struct inode *i, struct file *f) {
    pr_info("Device opened\n");
    return 0;
}

static ssize_t my_read(struct file *f, char __user *buf, 
                       size_t len, loff_t *off) {
    char data[] = "Hello";
    copy_to_user(buf, data, sizeof(data));
    return sizeof(data);
}

static int my_release(struct inode *i, struct file *f) {
    pr_info("Device closed\n");
    return 0;
}

static const struct file_operations my_fops = {
    .owner = THIS_MODULE,
    .open = my_open,
    .read = my_read,
    .release = my_release,
};

// Module initialization
static int __init my_init(void) {
    // Allocate device number
    alloc_chrdev_region(&devno, 0, 1, "mydev");
    
    // Initialize and add cdev
    cdev_init(&my_cdev, &my_fops);
    cdev_add(&my_cdev, devno, 1);
    
    // Create device class
    my_class = class_create(THIS_MODULE, "mydev_class");
    device_create(my_class, NULL, devno, NULL, "mydev");
    
    pr_info("Device created\n");
    return 0;
}

// Module cleanup
static void __exit my_exit(void) {
    device_destroy(my_class, devno);
    class_destroy(my_class);
    cdev_del(&my_cdev);
    unregister_chrdev_region(devno, 1);
    
    pr_info("Device destroyed\n");
}

module_init(my_init);
module_exit(my_exit);

Answer 17

IOCTL (Input/Output Control): Device-specific commands beyond read/write

Used for configuration, status, control operations
Passed from user space via ioctl() syscall
Each device driver defines its own ioctl commands
Direction: none, read, write, or read/write

IOCTL command format:

_IOC(direction, type, nr, size)

Macros:
_IO(type, nr)           // No argument
_IOR(type, nr, size)    // Read from device
_IOW(type, nr, size)    // Write to device
_IOWR(type, nr, size)   // Read/Write

Example:
#define IOCTL_SET_VALUE    _IOW('k', 1, int)
#define IOCTL_GET_VALUE    _IOR('k', 2, int)

Driver implementation:

// User space
int val = 42;
ioctl(fd, IOCTL_SET_VALUE, &val);
ioctl(fd, IOCTL_GET_VALUE, &val);

// Kernel space
static long my_ioctl(struct file *f, unsigned int cmd, 
                     unsigned long arg) {
    int value;
    
    switch(cmd) {
        case IOCTL_SET_VALUE:
            copy_from_user(&value, (int __user *)arg, sizeof(int));
            pr_info("Set value: %d\n", value);
            break;
            
        case IOCTL_GET_VALUE:
            value = 100;
            copy_to_user((int __user *)arg, &value, sizeof(int));
            break;
            
        default:
            return -EINVAL;
    }
    
    return 0;
}

static const struct file_operations my_fops = {
    .unlocked_ioctl = my_ioctl,
};

Linux Kernel & Driver Advanced Interview Questions

1. Process Management

2. Process Scheduling

3. Interrupt Handling

4. Memory Management

5. Kernel Synchronization

6. Device Driver Basics