tribler.core.database.ranks

Search utilities.

Author(s): Jelle Roozenburg, Arno Bakker, Alexander Kozlovsky

Attributes

`SECONDS_IN_DAY`
`word_re`
`POSITION_COEFF`
`MISSED_WORD_PENALTY`
`REMAINDER_COEFF`
`RANK_NORMALIZATION_COEFF`

Functions

`item_rank`(→ float)	Calculates the torrent rank for item received from remote query. Returns the torrent rank value in range [0, 1].
`torrent_rank`(→ float)	Calculates search rank for a torrent.
`seeders_rank`(→ float)	Calculates rank based on the number of torrent's seeders and leechers.
`freshness_rank`(→ float)	Calculates a rank value based on the torrent freshness. The result is normalized to the range [0, 1].
`title_rank`(→ float)	Calculate the similarity of the title string to a query string as a float value in range [0, 1].
`calculate_rank`(→ float)	Calculates the similarity of the title to the query as a float value in range [0, 1].
`find_word_and_rotate_title`(→ tuple[bool, int])	Finds the query word in the title. Returns whether it was found or not and the number of skipped words in the title.

Module Contents

tribler.core.database.ranks.SECONDS_IN_DAY = 86400

tribler.core.database.ranks.item_rank(query: str, item: dict) → float

Calculates the torrent rank for item received from remote query. Returns the torrent rank value in range [0, 1].

Parameters:

query – a user-defined query string
item – a dict with torrent info. Should include key name, can include num_seeders, num_leechers, created

Returns:

the torrent rank value in range [0, 1]

tribler.core.database.ranks.torrent_rank(query: str, title: str, seeders: int = 0, leechers: int = 0, freshness: float | None = None) → float

Calculates search rank for a torrent.

Parameters:

query – a user-defined query string
title – a torrent name
seeders – the number of seeders
leechers – the number of leechers
freshness – the number of seconds since the torrent creation. Zero or negative value means the torrent creation date is unknown. It is more convenient to use comparing to a timestamp, as it avoids using the time() function call and simplifies testing.

Returns:

the torrent rank value in range [0, 1]

tribler.core.database.ranks.seeders_rank(seeders: int, leechers: int = 0) → float

Calculates rank based on the number of torrent’s seeders and leechers.

Parameters:

seeders – the number of seeders for the torrent.
leechers – the number of leechers for the torrent.

Returns:

the torrent rank based on seeders and leechers, normalized to the range [0, 1]

tribler.core.database.ranks.freshness_rank(freshness: float | None) → float

Calculates a rank value based on the torrent freshness. The result is normalized to the range [0, 1].

Parameters:: freshness – number of seconds since the torrent creation. None means the actual torrent creation date is unknown. Negative values treated as invalid values and give the same result as None
Returns:: the torrent rank based on freshness. The result is normalized to the range [0, 1]

tribler.core.database.ranks.word_re

tribler.core.database.ranks.title_rank(query: str, title: str) → float

Calculate the similarity of the title string to a query string as a float value in range [0, 1].

Parameters:

query – a user-defined query string
title – a torrent name

Returns:

the similarity of the title string to a query string as a float value in range [0, 1]

tribler.core.database.ranks.POSITION_COEFF = 5

tribler.core.database.ranks.MISSED_WORD_PENALTY = 10

tribler.core.database.ranks.REMAINDER_COEFF = 10

tribler.core.database.ranks.RANK_NORMALIZATION_COEFF = 10

tribler.core.database.ranks.calculate_rank(query: list[str], title: list[str]) → float

Calculates the similarity of the title to the query as a float value in range [0, 1].

Parameters:

query – list of query words
title – list of title words

Returns:

the similarity of the title to the query as a float value in range [0, 1]

tribler.core.database.ranks.find_word_and_rotate_title(word: str, title: collections.deque[str]) → tuple[bool, int]

Finds the query word in the title. Returns whether it was found or not and the number of skipped words in the title.

This is a helper function to efficiently answer a question of how close a query string and a title string are, taking into account the ordering of words in both strings.

For efficiency reasons, the function modifies the title deque in place by removing the first entrance of the found word and rotating all leading non-matching words to the end of the deque. It allows to efficiently perform multiple calls of the find_word_and_rotate_title function for subsequent words from the same query string.

An example: find_word_and_rotate_title(‘A’, deque([‘X’, ‘Y’, ‘A’, ‘B’, ‘C’])) returns (True, 2), where True means that the word ‘A’ was found in the title deque, and 2 is the number of skipped words (‘X’, ‘Y’). Also, it modifies the title deque, so it starts looking like deque([‘B’, ‘C’, ‘X’, ‘Y’]). The found word ‘A’ was removed, and the leading non-matching words (‘X’, ‘Y’) were moved to the end of the deque.

Parameters:

word – a word from the user-defined query string
title – a deque of words in the title

Returns:

a two-elements tuple, whether the word was found in the title and the number of skipped words