LeetCode 1661 - Average Time of Process per Machine

Difficulty: 🟢 Easy
Topics: Database

Solution

Problem Understanding

The problem provides a database table named Activity, which stores information about processes executed on different machines. Each process has exactly two records:

A "start" record, indicating when the process began
An "end" record, indicating when the process finished

Each row contains four columns:

Column	Meaning
`machine_id`	The machine executing the process
`process_id`	The specific process running on that machine
`activity_type`	Either `"start"` or `"end"`
`timestamp`	The time when the activity occurred

The combination (machine_id, process_id, activity_type) is unique, which means a process on a machine can only have one start event and one end event.

The goal is to compute the average processing time for each machine. The processing time for a single process is:

end_timestamp - start_timestamp

For every machine, we sum the durations of all its processes and divide by the number of processes executed on that machine.

The final output should contain:

Column	Meaning
`machine_id`	The machine identifier
`processing_time`	Average execution time rounded to 3 decimal places

The result can be returned in any order.

A key observation is that every process is guaranteed to have exactly one "start" row and one "end" row. This guarantee eliminates many difficult edge cases such as missing timestamps or duplicate events.

The constraints also imply that all machines run the same number of processes, although the solution does not actually depend on this property. We only need to correctly pair each start and end event.

Important edge cases include:

Machines with only one process
Processes whose timestamps are very close together
Floating-point precision when computing averages
Multiple machines interleaved in arbitrary row order

A naive implementation could fail if it assumes rows are sorted or if it incorrectly matches start and end events between different processes.

Approaches

Brute Force Approach

The brute-force idea is to process every "start" row and search the entire table again to find its matching "end" row.

For each row:

If the row is a "start" event:

Scan all rows again
Find the row with the same machine_id and process_id
Ensure the matching row has activity_type = 'end'

Compute the duration
Accumulate totals per machine
Compute averages at the end

This approach is correct because every process has exactly one matching end event. However, it is inefficient because each lookup requires scanning the entire table.

If there are n rows, and we scan all n rows for each process, the total complexity becomes quadratic.

Optimal Approach

The key insight is that we can pair "start" and "end" rows efficiently using a self join.

Since each process is uniquely identified by:

(machine_id, process_id)

we can directly join the table with itself:

One alias represents the "start" row
The other alias represents the "end" row

Then we compute:

end.timestamp - start.timestamp

for every process.

After obtaining all process durations, we group by machine_id and compute the average.

This avoids repeated scanning and leverages relational database operations efficiently.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n²)	O(1)	Repeatedly scans the table to match start and end rows
Optimal	O(n)	O(1)	Uses self join and aggregation efficiently

Algorithm Walkthrough

Create two aliases of the Activity table.

One alias represents start events and the other represents end events. This allows us to compare rows from the same table. 2. Join the two aliases on machine_id and process_id.

These two columns uniquely identify a process running on a machine. Matching on both guarantees that the correct start and end rows are paired together. 3. Filter the joined rows.

Ensure that:

The first alias has activity_type = 'start'
The second alias has activity_type = 'end'

This guarantees that we compute a valid process duration. 4. Compute the processing time for each process.

For every matched pair:

processing_time = end.timestamp - start.timestamp

Group results by machine_id.

Since the problem asks for the average processing time per machine, we aggregate all process durations belonging to the same machine. 6. Compute the average duration.

Use SQL's AVG() aggregation function to calculate the average processing time. 7. Round the result to three decimal places.

The problem explicitly requires the output to be rounded.

Why it works

The algorithm works because every process is guaranteed to have exactly one start event and one end event. By joining rows using (machine_id, process_id), we correctly pair each process's timestamps. Computing the difference gives the exact execution duration, and averaging these durations per machine produces the required result.

Python Solution

# Write your MySQL query statement below

SELECT
    start_activity.machine_id,
    ROUND(
        AVG(end_activity.timestamp - start_activity.timestamp),
        3
    ) AS processing_time
FROM Activity AS start_activity
JOIN Activity AS end_activity
    ON start_activity.machine_id = end_activity.machine_id
    AND start_activity.process_id = end_activity.process_id
WHERE start_activity.activity_type = 'start'
    AND end_activity.activity_type = 'end'
GROUP BY start_activity.machine_id;

The query begins by creating two aliases of the Activity table:

start_activity
end_activity

These aliases allow the table to be joined with itself.

The JOIN condition matches rows with the same:

machine_id
process_id

This ensures we are pairing events belonging to the same process execution.

The WHERE clause separates start rows from end rows. Once matched, the duration is computed using:

end_activity.timestamp - start_activity.timestamp

The AVG() function computes the mean duration for each machine, and ROUND(..., 3) formats the result exactly as required.

Finally, the query groups all process durations by machine.

Go Solution

// Write your MySQL query statement below

SELECT
    start_activity.machine_id,
    ROUND(
        AVG(end_activity.timestamp - start_activity.timestamp),
        3
    ) AS processing_time
FROM Activity AS start_activity
JOIN Activity AS end_activity
    ON start_activity.machine_id = end_activity.machine_id
    AND start_activity.process_id = end_activity.process_id
WHERE start_activity.activity_type = 'start'
    AND end_activity.activity_type = 'end'
GROUP BY start_activity.machine_id;

Since this is a database problem, the submission is SQL rather than an actual Go implementation. The same query works regardless of the surrounding language environment on LeetCode.

Worked Examples

Example 1

Input table:

machine_id	process_id	activity_type	timestamp
0	0	start	0.712
0	0	end	1.520
0	1	start	3.140
0	1	end	4.120
1	0	start	0.550
1	0	end	1.550
1	1	start	0.430
1	1	end	1.420
2	0	start	4.100
2	0	end	4.512
2	1	start	2.500
2	1	end	5.000

Step 1: Match start and end rows

machine_id	process_id	start	end	duration
0	0	0.712	1.520	0.808
0	1	3.140	4.120	0.980
1	0	0.550	1.550	1.000
1	1	0.430	1.420	0.990
2	0	4.100	4.512	0.412
2	1	2.500	5.000	2.500

Step 2: Group by machine and compute averages

Machine 0

(0.808 + 0.980) / 2 = 0.894

Machine 1

(1.000 + 0.990) / 2 = 0.995

Machine 2

(0.412 + 2.500) / 2 = 1.456

Final Output

machine_id	processing_time
0	0.894
1	0.995
2	1.456

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Each row participates in the join and aggregation once
Space	O(1)	No extra auxiliary data structures are required

The self join operates efficiently because rows are matched directly using indexed keys. Each process contributes exactly one joined pair, so the overall work scales linearly with the number of rows.

Test Cases

# Example case from the problem statement
# Validates standard multi-machine averaging logic

"""
Input:
Machine 0:
(1.520 - 0.712) = 0.808
(4.120 - 3.140) = 0.980
Average = 0.894

Machine 1:
(1.550 - 0.550) = 1.000
(1.420 - 0.430) = 0.990
Average = 0.995
"""

# Single machine with one process
# Tests smallest valid input

"""
Input:
machine_id = 0
start = 1.000
end = 3.000

Expected average:
2.000
"""

# Multiple processes with decimal timestamps
# Tests floating-point arithmetic and rounding

"""
Input:
machine_id = 1

Process 0:
5.123 - 1.111 = 4.012

Process 1:
7.500 - 4.000 = 3.500

Average:
(4.012 + 3.500) / 2 = 3.756
"""

# Interleaved row order
# Tests that solution does not depend on sorting

"""
Input rows appear in random order:
end rows may appear before unrelated start rows
Matching must still work correctly
"""

# Multiple machines
# Tests independent aggregation per machine

"""
Machine 0 average:
1.000

Machine 1 average:
2.500

Machine 2 average:
0.750
"""

Test	Why
Problem example	Validates the core logic
Single process	Tests minimum valid dataset
Decimal timestamps	Verifies floating-point precision
Interleaved rows	Ensures ordering assumptions are unnecessary
Multiple machines	Confirms independent grouping logic

Edge Cases

Machine With Only One Process

A machine may execute exactly one process. In this case, the average processing time is simply the duration of that single process. Some incorrect solutions may accidentally divide by the wrong count or assume multiple rows per machine. This implementation handles the case naturally because AVG() works correctly even with a single value.

Rows Appearing in Arbitrary Order

The input table is not guaranteed to be sorted. A naive implementation might incorrectly assume that a start row is immediately followed by its matching end row. The self join approach avoids this issue entirely because rows are matched using identifiers rather than positional ordering.

Floating-Point Precision

Timestamps are stored as floating-point values, so the computed durations may contain decimal fractions. Without proper rounding, the output may fail expected formatting requirements. Using ROUND(..., 3) ensures the final result matches the specification exactly.

Multiple Machines Running Concurrently

Different machines may execute processes simultaneously, and process IDs may repeat across machines. Matching only on process_id would create incorrect pairings. The solution correctly joins on both machine_id and process_id, ensuring each process is paired with its own timestamps only.