LeetCode 1336 - Number of Transactions per Visit

The problem asks us to analyze two database tables, Visits and Transactions, and compute how many visitors performed a given number of transactions per visit. Each row in the Visits table represents a unique visit by a user on a specific date.

LeetCode Problem 1336

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

The problem asks us to analyze two database tables, Visits and Transactions, and compute how many visitors performed a given number of transactions per visit. Each row in the Visits table represents a unique visit by a user on a specific date. The Transactions table may contain multiple rows per user per date, indicating multiple transactions on the same visit.

The goal is to produce a summary table where each row corresponds to a specific number of transactions (transactions_count) and contains the number of visits that had exactly that many transactions (visits_count). We also need to include visits with zero transactions, and the transactions_count values must cover the entire range from 0 to the maximum observed number of transactions in a single visit. The result should be sorted by transactions_count.

Key details and constraints include:

  1. Each (user_id, visit_date) in Visits is unique.
  2. Every transaction in Transactions corresponds to a valid visit in Visits.
  3. There may be visits with zero transactions, which we must account for.
  4. Transaction counts per visit may skip numbers (for example, there may be 0, 1, 3 transactions but no visits with 2), yet the output must include all counts from 0 up to the maximum.

Edge cases include users with no transactions, visits with multiple transactions, and potential duplicate transaction rows for the same visit.

Approaches

A brute-force approach would involve iterating over every visit, counting the transactions for that visit by scanning the Transactions table, then building a frequency map for transaction counts. While correct, this is inefficient because it requires a nested scan over all transactions for each visit, leading to potentially O(V * T) time complexity, where V is the number of visits and T is the number of transactions.

The optimal approach leverages SQL aggregation and joins. By performing a LEFT JOIN of Visits with Transactions on (user_id, date) and then using COUNT grouped by (user_id, visit_date), we can calculate the number of transactions per visit efficiently. Finally, we aggregate these counts to determine how many visits had each transaction count, including zero.

Approach Time Complexity Space Complexity Notes
Brute Force O(V * T) O(V) Iterates over every visit and scans transactions, correct but inefficient.
Optimal O(V + T) O(V + T) Uses SQL aggregation with LEFT JOIN and GROUP BY to efficiently compute counts per visit and then frequency of counts.

Algorithm Walkthrough

  1. Perform a LEFT JOIN between Visits and Transactions on (user_id, visit_date) to associate transactions with each visit. Visits with no transactions will have NULL in the transaction column.
  2. Use COUNT(transaction_date) grouped by (user_id, visit_date) to compute transactions_count per visit. Using COUNT(column) ignores NULLs, so visits with no transactions will correctly count as 0.
  3. Aggregate the results from step 2 to compute how many visits have each transactions_count. Use GROUP BY transactions_count and COUNT(*) to get the visits_count.
  4. Generate a series of numbers from 0 to the maximum transactions_count observed to ensure all transaction counts are included. LEFT JOIN this series with the aggregated counts to fill in missing counts with 0.
  5. Order the final results by transactions_count.

Why it works: Each visit is uniquely identified, and by counting transactions per visit and then summarizing the counts, we capture the distribution of transactions per visit. Using a series ensures no transaction count is omitted, fulfilling the requirement to include zero and any skipped counts.

Python Solution

class Solution:
    def numberOfTransactionsPerVisit(self, visits: List[Tuple[int, str]], transactions: List[Tuple[int, str, int]]) -> List[Tuple[int, int]]:
        from collections import defaultdict

        # Count transactions per visit
        visit_tx_count = defaultdict(int)
        for user_id, visit_date, *_ in transactions:
            visit_tx_count[(user_id, visit_date)] += 1

        # Count visits per transaction count
        tx_count_freq = defaultdict(int)
        for user_id, visit_date in visits:
            count = visit_tx_count.get((user_id, visit_date), 0)
            tx_count_freq[count] += 1

        max_count = max(tx_count_freq.keys(), default=0)
        result = [(i, tx_count_freq.get(i, 0)) for i in range(max_count + 1)]
        return result

In this Python implementation, we first build a dictionary visit_tx_count mapping each (user_id, visit_date) to its number of transactions. We then count how many visits fall into each transaction count using tx_count_freq. Finally, we generate the full range from 0 to the maximum count to ensure every transaction count is represented.

Go Solution

package main

type Visit struct {
    UserID    int
    VisitDate string
}

type Transaction struct {
    UserID          int
    TransactionDate string
    Amount          int
}

func NumberOfTransactionsPerVisit(visits []Visit, transactions []Transaction) [][]int {
    visitTxCount := make(map[string]int)
    for _, tx := range transactions {
        key := fmt.Sprintf("%d-%s", tx.UserID, tx.TransactionDate)
        visitTxCount[key]++
    }

    txCountFreq := make(map[int]int)
    maxCount := 0
    for _, v := range visits {
        key := fmt.Sprintf("%d-%s", v.UserID, v.VisitDate)
        count := visitTxCount[key]
        txCountFreq[count]++
        if count > maxCount {
            maxCount = count
        }
    }

    result := make([][]int, maxCount+1)
    for i := 0; i <= maxCount; i++ {
        result[i] = []int{i, txCountFreq[i]}
    }

    return result
}

In Go, we use string keys in maps to represent (user_id, visit_date) pairs. This avoids creating a custom struct for map keys. The logic otherwise mirrors the Python version: first count transactions per visit, then count the number of visits for each transaction count, and finally generate a range from 0 to the maximum.

Worked Examples

Given the example from the problem statement:

Visits Table

user_id visit_date
1 2020-01-01
2 2020-01-02
12 2020-01-01
19 2020-01-03
1 2020-01-02
2 2020-01-03
1 2020-01-04
7 2020-01-11
9 2020-01-25
8 2020-01-28

Transactions Table

user_id transaction_date amount
1 2020-01-02 120
2 2020-01-03 22
7 2020-01-11 232
1 2020-01-04 7
9 2020-01-25 33
9 2020-01-25 66
8 2020-01-28 1
9 2020-01-25 99

Step 1: Count transactions per visit:

(user_id, visit_date) transactions_count
(1, 2020-01-01) 0
(2, 2020-01-02) 0
(12, 2020-01-01) 0
(19, 2020-01-03) 0
(1, 2020-01-02) 1
(2, 2020-01-03) 1
(1, 2020-01-04) 1
(7, 2020-01-11) 1
(9, 2020-01-25) 3
(8, 2020-01-28) 1

Step 2: Count visits per transaction count:

transactions_count visits_count
0 4
1 5
2 0
3 1

Complexity Analysis

Measure Complexity Explanation
Time O(V + T) Count transactions per visit in O(T), then iterate over visits O(V).
Space O(V + T) Dictionary/map stores counts per visit (up to V) and frequencies (up to V).

The dominant operations are linear scans over the input tables, and no nested iterations occur.

Test Cases

# test