System design is a critical process that encompasses decision-making regarding many aspects of how the system operates and functions. Within the realm of system design, job scheduling plays a vital role in efficiently managing and executing tasks.
This article explores the intricacies of system and job scheduling design, including high-level design elements like database design, API design, and system architecture.
What is system design?
System design is the process of defining the architecture, components, interfaces, and behavior of a system to meet specific requirements. This process involves making decisions about the organization and structure of the system, including hardware, software, networks, databases, and user interfaces.
The goal of system design is to create a blueprint to guide the development and implementation of the system and ensure it meets desired functionality and performance objectives. Scalability, reliability, security, performance, and maintainability are all important aspects that drive the design process. Designing a job scheduling tool is a common system design interview question.
Designing a Distributed Job Scheduler
When designing a distributed job scheduler, the system requirements are related to the creation, scheduling, and execution of jobs. As a baseline, a series of both functional and non-functional requirements should guide the design process.
Functional Requirements
- Jobs can be created to run at specific times or intervals.
- Jobs can be updated and deleted.
- The priority of a job can be specified to resolve conflicts for jobs with the same start time.
- Jobs can run asynchronously.
Non-functional Requirements
- Durability: Jobs and related status information will not get lost due to system failures.
- Reliability: Jobs should be completed or status information updated with appropriate details. Jobs should run at their specific times or intervals.
- Availability: It is always possible to create new jobs, update existing jobs, and query job status.
- Scalability: Jobs can be added without compromising the reliability or availability of the job scheduling system.
Other design system requirements that are neither functional or non-funtional include those related to traffic, CPU, memory, and throughput. These requirements define the number of jobs that can be created per day, number of jobs that can run at the same time, length of job, and size requirements. For example, if a job can run at a max of five minutes, the distributed system is highly CPU bound.
High-Level Design of a Task Scheduler
In addition to defining the requirements of the system, the design of other foundational elements should be thought through. The high-level design of a job scheduling system involves database design, API design, task scheduler and task runner design, and system architecture.
Database Design
In the database design of a distributed task scheduler, the access pattern of the application will determine the schema. This will involve requests that hit the database related read and write operations.
Examples of a read operation request can include retrieving all jobs associated with a given user ID; retrieving job execution history for a job ID; and finding all running jobs at a given point in time. Examples of write operation requests can include creating or deleting new jobs; adding job execution history to the database; and updating the execution time of a job in the system after the job is complete.
As part of the database design, schema can be used to define a job table to keep track of job metadata like owners, execution time intervals, and retry policies. In this use case, User ID can be used as the partition key, and the Job ID can be used as the sort key.
If complex queries are not in use, shards of both SQL and NoSQL (Cassandra) databases will suffice to handle system requirements. In the case of using a distributed queue and MySQL, whenever a job is submitted, it can be pushed to a Kafka queue. MySQL is beneficial because it has ACID properties that allow transactions and row-based locking.
API Design
For API design of a distributed system, a single machine is not advised because it creates a single point of failure. Setting up multiple replicas of an API server requires a load balancer. The load balancer uses a round-robin algorithm to intelligently distribute requests.
Task Scheduler and Task Runner Design
Optimal system design of a job scheduler breaks up the task runner and scheduler into separate components to allow for better functionality and scalability.
The task scheduler should be set to run a query into the database for all jobs due at a specific timestamp. Then, all due jobs should be enqueued to a distributed message queue in a first-in-first-out manner.
The task runner should be set to fetch messages from the message queue. Each message contains the job ID and URL used for updating the job status on the database.
System Architecture
If the foundation for the job scheduler has been set with database design schema and API calls, there are some service-related design system review items to keep in mind. These include web service, scheduling service, and execution service design.
For further design system optimization, a caching layer can be added by using in-memory cache or Redis. Using a daemon to populate the cache consistently will reduce the load on the database.
ActiveBatch Job Scheduling Tool
Instead of designing a disturbing job scheduling tool from scratch, teams can use ActiveBatch’s cross-platform software to automate and orchestrate diverse systems and processes. Minimize manual efforts with event triggers to handle daily tasks like file transfer, data modifications, and sending email notifications. ActiveBatch supports countless integrations with cloud vendors like Amazon (AWS) EC2, Microsoft Azure, and more. Workflow orchestration and optimization capabilities enable teams to automate, integrate, and monitor multiple technologies from a single point of control.
Frequently Asked Questions
In job scheduling, it’s common to use an algorithm to determine the order in which jobs are executed based on inputted criteria. The three most common methods for job scheduling are:
1. First-Come, First-Served (FCFS): Jobs are scheduled and executed in the order they arrive. The job scheduler maintains a queue of incoming jobs, and the first job in the queue is executed first. FCFS is a non-preemptive scheduling algorithm, meaning once a job starts executing, it continues until completion, even if a higher priority job becomes available.
2. Shortest Job First (SJF): Jobs are scheduled based on estimated execution time. The scheduler prioritizes jobs with the shortest expected execution time and executes them first. SJF can be either non-preemptive or preemptive. Non-preemptive SJF scheduling completes a job once it starts, while preemptive SJF scheduling can interrupt an ongoing job if a shorter job becomes available.
3. Priority: In this job scheduling method, priorities are assigned to each job and then jobs are scheduled based on those priority levels. Priority scheduling can be either non-preemptive or preemptive.
Compare ActiveBatch with other job scheduling methods, like Cron job, and see why teams prefer our modern task scheduling tools.
Batch processing focuses on executing a collection of jobs in a non-interactive manner, while interactive processing emphasizes real-time or near real-time user interaction with the job scheduling system to initiate and execute tasks on demand.
With batch processing, jobs are usually submitted in advance and processed in a sequential or parallel manner without user interaction. The input and output data is predefined and stored in files or databases. Batch processing can be particularly useful for large-scale data processing, running overnight backups, generating reports, and performing complex calculations.
With interactive processing, users provide input and receive immediate feedback from the job scheduling system, and jobs are initiated and executed on demand based on real-time user input or requests. The output data is displayed directly to the user during the session. Interactive processing can be a good method for interactive applications, user interfaces, and other online solutions that require immediate user interaction and feedback, like a chat bot or messaging platform.
See how teams manage critical business and IT jobs with ActiveBatch’s batch scheduler.