Metadata-Version: 2.1
Name: mdsearch
Version: 0.0.4
Summary: mdsearch is a python package for paper search
Home-page: https://github.com/lydia07/mdsearch
Author: Low-level Maseter Do Search
Author-email: 3120201103@bit.edu.cn
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Intended Audience :: Developers
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: elasticsearch5
Requires-Dist: scipy (==1.5.4)
Requires-Dist: torch (==1.7.1)
Requires-Dist: transformers (==4.1.1)
Requires-Dist: flask (==1.1.2)

#### 接口文档

##### 0   整体流程为通过search_info封装用户的查询内容，然后调用相关API返回检索结果。

##### 1   初始化

```python
import Searcher
S = Searcher(index_name='paperdb', doc_type='papers')
```

##### 2   检索论文

```python
paper, paper_id, paper_num = S.search_paper_by_name(search_info)
```

```python 
返回结果说明：
paper_num   : A int number of paper.
paper_id    : A list of string, each string means paper id.
paper       : A list of dicts, each dict stores information of a paper.
```



##### 3   检索单个论文视频中的相关内容(可通过前一步检索论文返回的paper_id或者paper，注意是单个)

```python
video_pos = S.get_video_pos_by_paper_id(search_info, paper_id)
video_pos = S.get_video_pos_by_paper(search_info, paper)
```

##### 4   search_info 格式

```python
# 综合检索
    search_info = {
        'query_type': 'integrated_search',
        'query': string,                                # 用户查询的内容
        'match': {
            'title': bool,                              # True/False表示是否检索这个字段的内容
            'abstract': bool,
            'paperContent': bool,
            'videoContent': bool,
        },
        'filter': {
            'yearfrom': 1000,                           # paper的年份限制
            'yearbefore': 3000,
        },
        # 'sort': 'relevance',
        'sort': 'year',                                 # 排序方式：year/cited/relevance
        'is_filter': False,                             # 是否先过滤后排序，建议True，提升检索效率
        'is_rescore': False,                            # 是否采用重排序，依据relevance排序时建议True，增强排序效果
        'is_cited': False                               # 是否使用引用量参与排序，由于未爬取引用量字段，只能为False
    }
# 高级检索
    search_info_2 = {
        'query_type': 'advanced_search',
        'match': {
            'title': string,                            # 用户查询内容
            'abstract': string,                         # 若不含有某项，设置成 None/False
            'paperContent': string,
            'videoContent': string,
        },
        'filter': {
            'yearfrom': 1000,
            'yearbefore': 3000,
        },
        'sort': 'year',
        'is_filter': False,                             # 高级检索无论True还是False都不会进行过滤
        'is_rescore': False,
        'is_cited': False
    }
```



