tribler.core.database.ranks

Search utilities.

Author(s): Jelle Roozenburg, Arno Bakker, Alexander Kozlovsky

Attributes

SECONDS_IN_DAY

word_re

POSITION_COEFF

MISSED_WORD_PENALTY

REMAINDER_COEFF

RANK_NORMALIZATION_COEFF

Functions

item_rank(→ float)

Calculates the torrent rank for item received from remote query. Returns the torrent rank value in range [0, 1].

torrent_rank(→ float)

Calculates search rank for a torrent.

seeders_rank(→ float)

Calculates rank based on the number of torrent's seeders and leechers.

freshness_rank(→ float)

Calculates a rank value based on the torrent freshness. The result is normalized to the range [0, 1].

title_rank(→ float)

Calculate the similarity of the title string to a query string as a float value in range [0, 1].

calculate_rank(→ float)

Calculates the similarity of the title to the query as a float value in range [0, 1].

find_word_and_rotate_title(→ tuple[bool, int])

Finds the query word in the title. Returns whether it was found or not and the number of skipped words in the title.

Module Contents

tribler.core.database.ranks.SECONDS_IN_DAY = 86400
tribler.core.database.ranks.item_rank(query: str, item: dict) float

Calculates the torrent rank for item received from remote query. Returns the torrent rank value in range [0, 1].

Parameters:
  • query – a user-defined query string

  • item – a dict with torrent info. Should include key name, can include num_seeders, num_leechers, created

Returns:

the torrent rank value in range [0, 1]

tribler.core.database.ranks.torrent_rank(query: str, title: str, seeders: int = 0, leechers: int = 0, freshness: float | None = None) float

Calculates search rank for a torrent.

Parameters:
  • query – a user-defined query string

  • title – a torrent name

  • seeders – the number of seeders

  • leechers – the number of leechers

  • freshness – the number of seconds since the torrent creation. Zero or negative value means the torrent creation date is unknown. It is more convenient to use comparing to a timestamp, as it avoids using the time() function call and simplifies testing.

Returns:

the torrent rank value in range [0, 1]

tribler.core.database.ranks.seeders_rank(seeders: int, leechers: int = 0) float

Calculates rank based on the number of torrent’s seeders and leechers.

Parameters:
  • seeders – the number of seeders for the torrent.

  • leechers – the number of leechers for the torrent.

Returns:

the torrent rank based on seeders and leechers, normalized to the range [0, 1]

tribler.core.database.ranks.freshness_rank(freshness: float | None) float

Calculates a rank value based on the torrent freshness. The result is normalized to the range [0, 1].

Parameters:

freshness – number of seconds since the torrent creation. None means the actual torrent creation date is unknown. Negative values treated as invalid values and give the same result as None

Returns:

the torrent rank based on freshness. The result is normalized to the range [0, 1]

tribler.core.database.ranks.word_re
tribler.core.database.ranks.title_rank(query: str, title: str) float

Calculate the similarity of the title string to a query string as a float value in range [0, 1].

Parameters:
  • query – a user-defined query string

  • title – a torrent name

Returns:

the similarity of the title string to a query string as a float value in range [0, 1]

tribler.core.database.ranks.POSITION_COEFF = 5
tribler.core.database.ranks.MISSED_WORD_PENALTY = 10
tribler.core.database.ranks.REMAINDER_COEFF = 10
tribler.core.database.ranks.RANK_NORMALIZATION_COEFF = 10
tribler.core.database.ranks.calculate_rank(query: list[str], title: list[str]) float

Calculates the similarity of the title to the query as a float value in range [0, 1].

Parameters:
  • query – list of query words

  • title – list of title words

Returns:

the similarity of the title to the query as a float value in range [0, 1]

tribler.core.database.ranks.find_word_and_rotate_title(word: str, title: collections.deque[str]) tuple[bool, int]

Finds the query word in the title. Returns whether it was found or not and the number of skipped words in the title.

This is a helper function to efficiently answer a question of how close a query string and a title string are, taking into account the ordering of words in both strings.

For efficiency reasons, the function modifies the title deque in place by removing the first entrance of the found word and rotating all leading non-matching words to the end of the deque. It allows to efficiently perform multiple calls of the find_word_and_rotate_title function for subsequent words from the same query string.

An example: find_word_and_rotate_title(‘A’, deque([‘X’, ‘Y’, ‘A’, ‘B’, ‘C’])) returns (True, 2), where True means that the word ‘A’ was found in the title deque, and 2 is the number of skipped words (‘X’, ‘Y’). Also, it modifies the title deque, so it starts looking like deque([‘B’, ‘C’, ‘X’, ‘Y’]). The found word ‘A’ was removed, and the leading non-matching words (‘X’, ‘Y’) were moved to the end of the deque.

Parameters:
  • word – a word from the user-defined query string

  • title – a deque of words in the title

Returns:

a two-elements tuple, whether the word was found in the title and the number of skipped words