The gopher image is created by gophers.
My Dcard interview notes can be found here:
This post focuses on analyzing, designing, and implementing the backend internship assignment for Dcard 2024. Dcard Backend Intern Assignment 2024 . The assignment can be broadly defined as: Developing a read-heavy API (1000 requests per second) for querying time-series data within specific intervals under multiple conditions.
The final implementation can be found here:
Assignment Description#
The full assignment details are available at:
https://github.com/jason810496/Dcard-Advertisement-API/blob/main/2024%20Backend%20Intern%20Assignment.pdf
Below is a simplified version:
Scenario#
Using Golang, implement a simplified advertisement serving service that includes 2 RESTful APIs:
- Admin API: Create advertisements with conditions.
- Serving API: List all matching ads based on query conditions.
Detailed Explanation#
Admin API
Only
Create
needs to be implemented (no need for List/Update/Delete).
Field structure:
Title
: Advertisement titleStartAt
: Ad start time (active time)EndAt
: Ad end time (active time)Conditions
: Ad display conditions- All conditions are optional
- Each condition can have multiple selections
- Condition structure:
Age
:int:1~100
Gender
:enum:M, F
Country
:enum:TW, JP, ...
(ISO 3166-1 compliant)Platform
:enum:android, ios, web
Serving API
Only
List
needs to be implemented (no need for Create/Update/Delete).
- List active ads that match the conditions (
StartAt < Now < EndAt
). - Support query parameters for filtering ads.
- Support pagination:
- Use
limit
andoffset
to control results. - Sort by
endAt
in ascending order (ASC).
- Use
Requirements, Assumptions, and Constraints#
Implementation Requirements#
- The serving API must handle over 10,000 RPS.
- No need to implement authentication.
- Validate parameters and handle errors appropriately.
- Write proper tests.
- Document the design choices and approach.
Assumptions and Constraints#
- No more than 3000 ads will be created daily.
- Any third-party libraries are allowed.
- Any external storage solutions are allowed.
API Example#
Admin API: The following example represents an ad targeting users aged 20–30 in Taiwan or Japan, using Android or iOS, without gender restriction.
curl -X POST -H "Content-Type: application/json" \
"http://<host>/api/v1/ad" \
--data '{
"title": "AD 55",
"startAt": "2023-12-10T03:00:00.000Z",
"endAt": "2023-12-31T16:00:00.000Z",
"conditions": {
{
"ageStart": 20,
"ageEnd": 30,
"country": ["TW", "JP"],
"platform": ["android", "ios"]
}
}
}'
Serving API: Request example
curl -X GET -H "Content-Type: application/json" \
"http://<host>/api/v1/ad?offset=10&limit=3&age=24&gender=F&country=TW&platform=ios"
Serving API: Response example
{
"items": [
{
"title": "AD 1",
"endAt": "2023-12-22T01:00:00.000Z"
},
{
"title": "AD 31",
"endAt": "2023-12-30T12:00:00.000Z"
},
{
"title": "AD 10",
"endAt": "2023-12-31T16:00:00.000Z"
}
]
}
Analysis#
The task can be summarized as:
Designing a read-heavy API capable of querying all records that match specific conditions over a certain time range.
The basic approach starts with caching, primarily using Redis for server-side caching.
The challenge is designing an efficient and low-cost cache solution.
Solution 1#
When an ad is created, it is also stored in Redis.
A Gender:Platform:Country:AgeRange:uuid
String key is created with a TTL based on the EndAt
field.
`F:android:TW:20-30:uuid1`: "title1,end1",
`M:ios:JP:30-40:uuid2`: "title2,end2",
Creating an Ad:
When an ad is created in the DB, a goroutine is also triggered to create a corresponding Redis key.
Querying Ads:
Use SCAN
to search and store results in a cache:Gender:Platform:Country:Age
ZSet key.
The TTL is relatively short (e.g., 5–10 minutes).
This allows subsequent searches with the same conditions but different offset/limit values to query efficiently.
It’s like a secondary cache for searches, preventing repeatedSCAN
operations.
Deleting Ads:
- For String keys: TTL is set during creation, so no further action is needed.
- For ZSet keys: TTL is short (5–10 minutes), so no additional handling is required.
Pros#
- Each ad is stored as an independent key, making modifications easy.
- Ads have corresponding TTLs, avoiding the need for a separate mechanism to expire ads in Redis.
Cons#
- Inefficient use of Redis:
- Storing all DB data in Redis isn’t utilizing Redis purely as a cache.
- Excessive key usage:
- Assuming 10 countries:
- 3000 active ads as Strings.
- 2 * 3 * 100 * 10 (Gender * Platform * Age * Country).
- 6000 ZSets for ad combinations.
- At least 9000 keys.
- Assuming 10 countries:
- Search efficiency is low:
- Without a search cache, each
SCAN
hasO(N)
time complexity. - Additional processing is needed for age-based queries.
- Without a search cache, each
Solution 2#
Partition by Age, using 101 Hashes:
- Key:
ad:Age
(Age =all
for all ages). - Mapping:
- Key:
Gender:Platform:Country
. - Value: Serialized ad list.
- Key:
ad:0: {
`F:android:TW`: [],
`M:ios:JP`: [],
...
},
ad:1: {
`Gender:Platform:Country`: [],
...
},
...
ad:99: {
`Gender:Platform:Country`: [],
...
},
ad:all: {
`Gender:Platform:Country`: [],
...
}
Creating an Ad:
The ad is stored in the corresponding partitions for ageStart
to ageEnd
.
This requires a Lua Script
to ensure ACID properties.
Querying Ads:
Use HSCAN
with the MATCH
option to filter ads in the corresponding partition.HSCAN
returns a cursor
for pagination.
Deleting Ads:
A cron job periodically checks all partitions to clean up expired ads.
Depending on consistency requirements, you might use Lua Script
for ACID guarantees.
Otherwise, HSCAN
+ HDEL
is a sufficient alternative.
System Architecture#
Initial Architecture#
Originally, for Redis caching, I planned to use the Redis Sentinel architecture
- To achieve High Availability and Read Scalability.
Final Architecture#
Data Bottleneck: Postgres#
Even with Redis serving as the server-side cache,
the QPS still did not meet the initial target,
and there was still a large amount of traffic hitting Postgres.
From the CPU and Memory usage:
Postgres CPU utilization reached 90%.
The solution at that time was to add an additional layer of Local Cache at the API level,
with a background goroutine periodically updating the Local Cache.
After adding this Local Cache layer,
the system managed to reach the goal of 10,000 RPS locally.
Areas for Further Optimization#
As I write this retrospective in September 2024,
I’ve noticed some details that were overlooked in the original implementation:
- Database Indexing
- System Stability
- Sharding
Database Indexing#
Back then, I used the Gorm ORM to interact with Postgres.
Gorm only creates an index on the Primary Key,
but does not index other fields.
It would be useful to benchmark different combinations of indexes
and determine which fields benefit from adding specific types of indexes for better performance.
System Stability#
At the time, when I realized that Redis caching still resulted in substantial traffic to Postgres,
I didn’t consider system stability.
Instead, I hastily added a layer of Local Cache to hit the QPS target.
I should have used singleFlight
to prevent high traffic from overwhelming Postgres,
(at least filtering out duplicate queries).
This would have kept Postgres QPS within a reasonable range
and ensured overall system stability.
Sharding#
During the interview follow-up on scalability,
questions like: “What if there are 30 countries and 5 platforms?” arose.
How do you maintain the same QPS target?
It’s easy to mention sharding,
but in practice, implementing a stable, scalable system using sharding isn’t that simple.
If I had the opportunity, I would like to experiment with implementing it.