@armand1m/papercut

Index

Interfaces

Type aliases

Functions

Type aliases

ScrapeResultType

ScrapeResultType<T, B>: B extends true ? { [ Prop in keyof T]: Awaited<ReturnType<T[Prop]>> } : { [ Prop in keyof T]?: Awaited<ReturnType<T[Prop]>> }

Type parameters

T: SelectorMap
B: boolean

Scraper

Scraper: ReturnType<typeof createScraper>

SelectorFunction

SelectorFunction: (utils: SelectorUtilities, self: SelectorMap) => any

Type declaration

- (utils: SelectorUtilities, self: SelectorMap): any
- Function to be used when scraping the target node for specific data.
  
  Parameters
  - utils: SelectorUtilities
  - self: SelectorMap
  Returns any

SelectorMap

SelectorMap: Record<string, SelectorFunction>

Map of selector functions.

This type is meant to be checked with an extended type, as users are going to implement a derived version of this for custom scrapers.

SelectorUtilities

SelectorUtilities: ReturnType<typeof createSelectorUtilities>

Functions

Const createRunner

createRunner(props: CreateRunnerProps): <T, B>(props: RunProps<T, B>) => Promise<ScrapeResultType<T, B>[]>

- Defined in scraper/createRunner.ts:129
Creates a runner instance.

This method is called by the createScraper function, but can also be externally used if needed to use an external pino logger or prefer full control over the scraper options.

Parameters
- props: CreateRunnerProps
  
  The runner logger and options.
Returns <T, B>(props: RunProps<T, B>) => Promise<ScrapeResultType<T, B>[]>
- - <T, B>(props: RunProps<T, B>): Promise<ScrapeResultType<T, B>[]>
  - The scraper runner.
    
    When executed, it will fetch the base url and build a JSDOM using the received HTML payload in order to make a virtual window and document available for scraping.
    
    Once these are ready, the scraper will start to spawn promise pools to deal with more intensive tasks, such as pagination, node scraping and selector scraping in parallel.
    
    All these settings will depend on the options given during the creation of the scraper struct.
    
    Type parameters
    - T: SelectorMap
      
      A mapped type based on the given selectors.
    - B: boolean
      
      The strict mode boolean type. Used to tweak the scrape result type strictness.
    Parameters
    - props: RunProps<T, B>
      
      The scraping runner properties and selectors.
    Returns Promise<ScrapeResultType<T, B>[]>
    result Type-safe scraping results based on the given selectors and strict mode.

Const createScraper

createScraper(props: ScraperProps): { run: <T, B>(props: RunProps<T, B>) => Promise<ScrapeResultType<T, B>[]> }

- Defined in scraper/createScraper.ts:81
Creates a new scraper runner.

This method is papercut entrypoint. It will create an Scraper struct containing a runner that you can tweak as needed.

The runner is going to abide to the settings given during the creation of this object.

This function will also create a pino logger and embed it within the runner.

In case you prefer to manage the logger yourself, please use createRunner instead.

Parameters
- props: ScraperProps
Returns { run: <T, B>(props: RunProps<T, B>) => Promise<ScrapeResultType<T, B>[]> }
- run: <T, B>(props: RunProps<T, B>) => Promise<ScrapeResultType<T, B>[]>
  - - <T, B>(props: RunProps<T, B>): Promise<ScrapeResultType<T, B>[]>
    - The scraper runner.
      
      When executed, it will fetch the base url and build a JSDOM using the received HTML payload in order to make a virtual window and document available for scraping.
      
      Once these are ready, the scraper will start to spawn promise pools to deal with more intensive tasks, such as pagination, node scraping and selector scraping in parallel.
      
      All these settings will depend on the options given during the creation of the scraper struct.
      
      Type parameters
      - T: SelectorMap
        
        A mapped type based on the given selectors.
      - B: boolean
        
        The strict mode boolean type. Used to tweak the scrape result type strictness.
      Parameters
      - props: RunProps<T, B>
        
        The scraping runner properties and selectors.
      Returns Promise<ScrapeResultType<T, B>[]>
      result Type-safe scraping results based on the given selectors and strict mode.

Const createSelectorUtilities

createSelectorUtilities(element: Element): { all: (selector: string) => { asArray: Element[]; nodes: NodeListOf<Element> }; attr: (selector: string, attribute: string) => string; className: (selector: string) => string; createWindow: (htmlContent: string) => { close: () => void; document: Document; window: DOMWindow }; element: Element; fetchPage: (url: string) => Promise<string>; geosearch: (q: string, limit?: number) => Promise<GeosearchResult>; href: (selector: string) => string; mapNodeListToArray: (nodeList: NodeList) => Element[]; src: (selector: string) => string; text: (selector: string) => string }

- Defined in selectors/createSelectorUtilities.ts:25
This method creates the selector utilities provided to every selector function given to the scrape method.

These utilities are meant to make the experience of using papercut a bit more pleasant. They're currently not extendable, but one could, in theory, create higher order functions extension.

Almost every single one of these methods have a default fallback of an empty string, in case it fails to find the element or a specific property.

At the same time, you also have direct access to the elementfrom selector functions if needed for more complex tasks.

Parameters
- element: Element
Returns { all: (selector: string) => { asArray: Element[]; nodes: NodeListOf<Element> }; attr: (selector: string, attribute: string) => string; className: (selector: string) => string; createWindow: (htmlContent: string) => { close: () => void; document: Document; window: DOMWindow }; element: Element; fetchPage: (url: string) => Promise<string>; geosearch: (q: string, limit?: number) => Promise<GeosearchResult>; href: (selector: string) => string; mapNodeListToArray: (nodeList: NodeList) => Element[]; src: (selector: string) => string; text: (selector: string) => string }
- all: (selector: string) => { asArray: Element[]; nodes: NodeListOf<Element> }
  - - (selector: string): { asArray: Element[]; nodes: NodeListOf<Element> }
    - Parameters
      - selector: string
      Returns { asArray: Element[]; nodes: NodeListOf<Element> }
      - asArray: Element[]
      - nodes: NodeListOf<Element>
- attr: (selector: string, attribute: string) => string
  - - (selector: string, attribute: string): string
    - Parameters
      - selector: string
      - attribute: string
      Returns string
- className: (selector: string) => string
  - - (selector: string): string
    - Parameters
      - selector: string
      Returns string
- createWindow: (htmlContent: string) => { close: () => void; document: Document; window: DOMWindow }
  - - (htmlContent: string): { close: () => void; document: Document; window: DOMWindow }
    - Parameters
      - htmlContent: string
      Returns { close: () => void; document: Document; window: DOMWindow }
      - close: () => void
        (): void
        Returns void
      - document: Document
      - window: DOMWindow
- element: Element
- fetchPage: (url: string) => Promise<string>
  - - (url: string): Promise<string>
    - Parameters
      - url: string
      Returns Promise<string>
- geosearch: (q: string, limit?: number) => Promise<GeosearchResult>
  - - (q: string, limit?: number): Promise<GeosearchResult>
    - Parameters
      - q: string
      - limit: number = 1
      Returns Promise<GeosearchResult>
- href: (selector: string) => string
  - - (selector: string): string
    - Parameters
      - selector: string
      Returns string
- mapNodeListToArray: (nodeList: NodeList) => Element[]
  - - (nodeList: NodeList): Element[]
    - Parameters
      - nodeList: NodeList
      Returns Element[]
- src: (selector: string) => string
  - - (selector: string): string
    - Parameters
      - selector: string
      Returns string
- text: (selector: string) => string
  - - (selector: string): string
    - Parameters
      - selector: string
      Returns string

Const createWindow

createWindow(htmlContent: string): { close: () => void; document: Document; window: DOMWindow }

- Defined in utilities/createWindow.ts:3
Parameters
- htmlContent: string
Returns { close: () => void; document: Document; window: DOMWindow }
- close: () => void
  - - (): void
    - Returns void
- document: Document
- window: DOMWindow

Const fetchPage

fetchPage(url: string): Promise<string>

- Defined in http/fetchPage.ts:10
Parameters
- url: string
Returns Promise<string>

Const geosearch

geosearch(q: string, limit?: number): Promise<GeosearchResult>

- Defined in http/geosearch.ts:30
Parameters
- q: string
- limit: number = 1
Returns Promise<GeosearchResult>

scrape

scrape<T, B>(props: ScrapeProps<T, B>): Promise<ScrapeResultType<T, B>[]>

- Defined in scraper/scrape.ts:53
the scrape function

this function will select all target nodes from the given document and spawn promise pools for triggering selector scraping.

this function is used by papercut runner with the managed jsdom instances.

if you want to have more control over jsdom but still leverage papercut, you can use this function directly instead of using createScraper or createRunner

Type parameters
- T: SelectorMap
  
  A mapped type based on the given selectors.
- B: boolean
  
  The strict mode boolean type. Used to tweak the scrape result type strictness.
Parameters
- props: ScrapeProps<T, B>
  
  The scraping properties and selectors.
Returns Promise<ScrapeResultType<T, B>[]>

Index

Interfaces

Type aliases

Functions

Type aliases

ScrapeResultType

Type parameters

T: SelectorMap

B: boolean

Scraper

SelectorFunction

Type declaration

Parameters

utils: SelectorUtilities

self: SelectorMap

Returns any

SelectorMap

SelectorUtilities

Functions

Const createRunner

Parameters

props: CreateRunnerProps

Returns <T, B>(props: RunProps<T, B>) => Promise<ScrapeResultType<T, B>[]>

Type parameters

T: SelectorMap

B: boolean

Parameters

props: RunProps<T, B>

Returns Promise<ScrapeResultType<T, B>[]>

Const createScraper

Parameters

props: ScraperProps

Returns { run: <T, B>(props: RunProps<T, B>) => Promise<ScrapeResultType<T, B>[]> }

run: <T, B>(props: RunProps<T, B>) => Promise<ScrapeResultType<T, B>[]>

Type parameters

T: SelectorMap

B: boolean

Parameters

props: RunProps<T, B>

Returns Promise<ScrapeResultType<T, B>[]>

Const createSelectorUtilities

Parameters

element: Element

all: (selector: string) => { asArray: Element[]; nodes: NodeListOf<Element> }

Parameters

selector: string

Returns { asArray: Element[]; nodes: NodeListOf<Element> }

asArray: Element[]

nodes: NodeListOf<Element>

attr: (selector: string, attribute: string) => string

Parameters

selector: string

attribute: string

Returns string

className: (selector: string) => string

Parameters

selector: string

Returns string

createWindow: (htmlContent: string) => { close: () => void; document: Document; window: DOMWindow }

Parameters

htmlContent: string

Returns { close: () => void; document: Document; window: DOMWindow }

close: () => void

Returns void

document: Document

window: DOMWindow

element: Element

fetchPage: (url: string) => Promise<string>

Parameters

url: string

Returns Promise<string>

geosearch: (q: string, limit?: number) => Promise<GeosearchResult>

Parameters

q: string

limit: number = 1

Returns Promise<GeosearchResult>

href: (selector: string) => string

Parameters

selector: string

Returns string