LeetCode 1661 - Average Time of Process per Machine
The problem provides a database table named Activity, which stores information about processes executed on different machines.
Difficulty: 🟢 Easy
Topics: Database
Solution
Problem Understanding
The problem provides a database table named Activity, which stores information about processes executed on different machines. Each process has exactly two records:
- A
"start"record, indicating when the process began - An
"end"record, indicating when the process finished
Each row contains four columns:
| Column | Meaning |
|---|---|
machine_id |
The machine executing the process |
process_id |
The specific process running on that machine |
activity_type |
Either "start" or "end" |
timestamp |
The time when the activity occurred |
The combination (machine_id, process_id, activity_type) is unique, which means a process on a machine can only have one start event and one end event.
The goal is to compute the average processing time for each machine. The processing time for a single process is:
end_timestamp - start_timestamp
For every machine, we sum the durations of all its processes and divide by the number of processes executed on that machine.
The final output should contain:
| Column | Meaning |
|---|---|
machine_id |
The machine identifier |
processing_time |
Average execution time rounded to 3 decimal places |
The result can be returned in any order.
A key observation is that every process is guaranteed to have exactly one "start" row and one "end" row. This guarantee eliminates many difficult edge cases such as missing timestamps or duplicate events.
The constraints also imply that all machines run the same number of processes, although the solution does not actually depend on this property. We only need to correctly pair each start and end event.
Important edge cases include:
- Machines with only one process
- Processes whose timestamps are very close together
- Floating-point precision when computing averages
- Multiple machines interleaved in arbitrary row order
A naive implementation could fail if it assumes rows are sorted or if it incorrectly matches start and end events between different processes.
Approaches
Brute Force Approach
The brute-force idea is to process every "start" row and search the entire table again to find its matching "end" row.
For each row:
- If the row is a
"start"event:
- Scan all rows again
- Find the row with the same
machine_idandprocess_id - Ensure the matching row has
activity_type = 'end'
- Compute the duration
- Accumulate totals per machine
- Compute averages at the end
This approach is correct because every process has exactly one matching end event. However, it is inefficient because each lookup requires scanning the entire table.
If there are n rows, and we scan all n rows for each process, the total complexity becomes quadratic.
Optimal Approach
The key insight is that we can pair "start" and "end" rows efficiently using a self join.
Since each process is uniquely identified by:
(machine_id, process_id)
we can directly join the table with itself:
- One alias represents the
"start"row - The other alias represents the
"end"row
Then we compute:
end.timestamp - start.timestamp
for every process.
After obtaining all process durations, we group by machine_id and compute the average.
This avoids repeated scanning and leverages relational database operations efficiently.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(1) | Repeatedly scans the table to match start and end rows |
| Optimal | O(n) | O(1) | Uses self join and aggregation efficiently |
Algorithm Walkthrough
- Create two aliases of the
Activitytable.
One alias represents start events and the other represents end events. This allows us to compare rows from the same table.
2. Join the two aliases on machine_id and process_id.
These two columns uniquely identify a process running on a machine. Matching on both guarantees that the correct start and end rows are paired together. 3. Filter the joined rows.
Ensure that:
- The first alias has
activity_type = 'start' - The second alias has
activity_type = 'end'
This guarantees that we compute a valid process duration. 4. Compute the processing time for each process.
For every matched pair:
processing_time = end.timestamp - start.timestamp
- Group results by
machine_id.
Since the problem asks for the average processing time per machine, we aggregate all process durations belonging to the same machine. 6. Compute the average duration.
Use SQL's AVG() aggregation function to calculate the average processing time.
7. Round the result to three decimal places.
The problem explicitly requires the output to be rounded.
Why it works
The algorithm works because every process is guaranteed to have exactly one start event and one end event. By joining rows using (machine_id, process_id), we correctly pair each process's timestamps. Computing the difference gives the exact execution duration, and averaging these durations per machine produces the required result.
Python Solution
# Write your MySQL query statement below
SELECT
start_activity.machine_id,
ROUND(
AVG(end_activity.timestamp - start_activity.timestamp),
3
) AS processing_time
FROM Activity AS start_activity
JOIN Activity AS end_activity
ON start_activity.machine_id = end_activity.machine_id
AND start_activity.process_id = end_activity.process_id
WHERE start_activity.activity_type = 'start'
AND end_activity.activity_type = 'end'
GROUP BY start_activity.machine_id;
The query begins by creating two aliases of the Activity table:
start_activityend_activity
These aliases allow the table to be joined with itself.
The JOIN condition matches rows with the same:
machine_idprocess_id
This ensures we are pairing events belonging to the same process execution.
The WHERE clause separates start rows from end rows. Once matched, the duration is computed using:
end_activity.timestamp - start_activity.timestamp
The AVG() function computes the mean duration for each machine, and ROUND(..., 3) formats the result exactly as required.
Finally, the query groups all process durations by machine.
Go Solution
// Write your MySQL query statement below
SELECT
start_activity.machine_id,
ROUND(
AVG(end_activity.timestamp - start_activity.timestamp),
3
) AS processing_time
FROM Activity AS start_activity
JOIN Activity AS end_activity
ON start_activity.machine_id = end_activity.machine_id
AND start_activity.process_id = end_activity.process_id
WHERE start_activity.activity_type = 'start'
AND end_activity.activity_type = 'end'
GROUP BY start_activity.machine_id;
Since this is a database problem, the submission is SQL rather than an actual Go implementation. The same query works regardless of the surrounding language environment on LeetCode.
Worked Examples
Example 1
Input table:
| machine_id | process_id | activity_type | timestamp |
|---|---|---|---|
| 0 | 0 | start | 0.712 |
| 0 | 0 | end | 1.520 |
| 0 | 1 | start | 3.140 |
| 0 | 1 | end | 4.120 |
| 1 | 0 | start | 0.550 |
| 1 | 0 | end | 1.550 |
| 1 | 1 | start | 0.430 |
| 1 | 1 | end | 1.420 |
| 2 | 0 | start | 4.100 |
| 2 | 0 | end | 4.512 |
| 2 | 1 | start | 2.500 |
| 2 | 1 | end | 5.000 |
Step 1: Match start and end rows
| machine_id | process_id | start | end | duration |
|---|---|---|---|---|
| 0 | 0 | 0.712 | 1.520 | 0.808 |
| 0 | 1 | 3.140 | 4.120 | 0.980 |
| 1 | 0 | 0.550 | 1.550 | 1.000 |
| 1 | 1 | 0.430 | 1.420 | 0.990 |
| 2 | 0 | 4.100 | 4.512 | 0.412 |
| 2 | 1 | 2.500 | 5.000 | 2.500 |
Step 2: Group by machine and compute averages
Machine 0
(0.808 + 0.980) / 2 = 0.894
Machine 1
(1.000 + 0.990) / 2 = 0.995
Machine 2
(0.412 + 2.500) / 2 = 1.456
Final Output
| machine_id | processing_time |
|---|---|
| 0 | 0.894 |
| 1 | 0.995 |
| 2 | 1.456 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each row participates in the join and aggregation once |
| Space | O(1) | No extra auxiliary data structures are required |
The self join operates efficiently because rows are matched directly using indexed keys. Each process contributes exactly one joined pair, so the overall work scales linearly with the number of rows.
Test Cases
# Example case from the problem statement
# Validates standard multi-machine averaging logic
"""
Input:
Machine 0:
(1.520 - 0.712) = 0.808
(4.120 - 3.140) = 0.980
Average = 0.894
Machine 1:
(1.550 - 0.550) = 1.000
(1.420 - 0.430) = 0.990
Average = 0.995
"""
# Single machine with one process
# Tests smallest valid input
"""
Input:
machine_id = 0
start = 1.000
end = 3.000
Expected average:
2.000
"""
# Multiple processes with decimal timestamps
# Tests floating-point arithmetic and rounding
"""
Input:
machine_id = 1
Process 0:
5.123 - 1.111 = 4.012
Process 1:
7.500 - 4.000 = 3.500
Average:
(4.012 + 3.500) / 2 = 3.756
"""
# Interleaved row order
# Tests that solution does not depend on sorting
"""
Input rows appear in random order:
end rows may appear before unrelated start rows
Matching must still work correctly
"""
# Multiple machines
# Tests independent aggregation per machine
"""
Machine 0 average:
1.000
Machine 1 average:
2.500
Machine 2 average:
0.750
"""
| Test | Why |
|---|---|
| Problem example | Validates the core logic |
| Single process | Tests minimum valid dataset |
| Decimal timestamps | Verifies floating-point precision |
| Interleaved rows | Ensures ordering assumptions are unnecessary |
| Multiple machines | Confirms independent grouping logic |
Edge Cases
Machine With Only One Process
A machine may execute exactly one process. In this case, the average processing time is simply the duration of that single process. Some incorrect solutions may accidentally divide by the wrong count or assume multiple rows per machine. This implementation handles the case naturally because AVG() works correctly even with a single value.
Rows Appearing in Arbitrary Order
The input table is not guaranteed to be sorted. A naive implementation might incorrectly assume that a start row is immediately followed by its matching end row. The self join approach avoids this issue entirely because rows are matched using identifiers rather than positional ordering.
Floating-Point Precision
Timestamps are stored as floating-point values, so the computed durations may contain decimal fractions. Without proper rounding, the output may fail expected formatting requirements. Using ROUND(..., 3) ensures the final result matches the specification exactly.
Multiple Machines Running Concurrently
Different machines may execute processes simultaneously, and process IDs may repeat across machines. Matching only on process_id would create incorrect pairings. The solution correctly joins on both machine_id and process_id, ensuring each process is paired with its own timestamps only.