[web crawler] URL Frontier & Fetcher communication

Does the Fetcher service poll from the Back Queue Selector?

From the final design diagram, the Fetcher service will take take URL from the ULR Frontier(the circled part in the attached diagram).

I am curious on how to implement the Fetcher service in real life.

If the red part is to have the Fetcher service to poll from the Back Queue Selector does this mean the Fetcher service will be a while loop that keeps trying to extract ULR from the URL frontier?

Alternatively if we adopt the push model. The Back Queue Selector will hand task to Fetcher. Will the Back Queue Selector be responsible for tracking if the fetcher dies while executing the task?

Most likely the push model. There are a bunch of fetcher worker waiting in a worker pool. Back queue selector would hand over task to a fetcher. The fetcher would execute. If fetcher dies, back queue selector would spin up a new one.

1 个赞