live comment

准备SD的时候看到一个高频题live comment,除了可以讨论pull和push,还有哪些方面可以讨论呢?这个是不是可以直接用ins的设计呢?

比较接近于 Instagram 的 Fanout,根据这个 post websocket 链接的多少来决定 push 还是 pull.

谢谢老师,网上阅读了一些资料,关于write locally,read globally还是不太理解,老师您能点拨下吗?

你看的应该是这篇文章吧。

这段话给划个重点。

Because of our unique situation, we settled on the completely opposite approach: “write locally, read globally.” This meant deploying distributed storage tiers that only handled writes locally, then less frequently collecting information from across all of our data centers to produce the final result. For example, when a user loads his News Feed through a request to our data center in Virginia, the system writes to a storage tier in the same data center, recording the fact that the user is now viewing certain pieces of content so that we can push them new comments. When someone enters a comment, we fetch the viewership information from all of our data centers across the country, combine the information, then push the updates out. In practice, this means we have to perform multiple cross-country reads for every comment produced. But it works because our commenting rate is significantly lower than our viewing rate. Reading globally saves us from having to replicate a high volume of writes across data centers, saving expensive, long-distance bandwidth.

Write locally, Read globally 的主要原因是这个 use case 非常特殊,所以采用了这个反常的做法。
这个 Use case 是用户在comment 的时候,我们需要去读这个文章都有谁在看。这个“谁在看”的信息是 write locally 的,就是当有人看一篇文章的时候,我们只在本地 cluster 记录“谁在看”。当需要读“谁在看”的时候,我们就去所有的 cluster 找这个信息。
归根结底,是因为 comment QPS 远小于 Post Read QPS,所以我们在 comment 发生的时候多做些事情,换取在 Post Read 时候少做些。

我其实对这个不太理解,既然read QPS 高,为什么不针对对read 做优化,让read的时候少做一些事情?即write globally, read locally。 因为write qps 低,write 到所有reader的feed里面代价比较低吧。这篇文章反过来的,不太理解。老师,能否释惑?

这里的 write/read 的意思你没有理解。这里的 write 指的不是 comment 时候的 write,而是每个人读 post 的时候我们需要记录“谁在读” 这个信息的 write。read 也指的是读取“谁在读”这个信息的 read,不是 read comment。