Community Help

ntyler1 · ‎04-21-2020

Hello,

I am trying to asynchronously fetch people's page views. I'm trying a queue approach that I found on Stackoverflow.

import asyncio
from aiohttp import ClientSession, TCPConnector
async def get(session, url):
    headers = {
      'Authorization': 'Bearer KEY',
     }
     async with session.get(url, headers=headers) as response:
            json = await response.json()
            return json, response
async def process(session, url, q):
    try:
        try:
            views, response = await get(session, url)
            scode = response.status
            if scode == 404:
                return
        except Exception as e:
            print(e)
            return
        try:
            await q.put(str(response.links["next"]["url"]))
        except:
            pass
        <do something with views>
    except Exception as e:
        print(e)
async def fetch_worker(session, q):
    while True:
        url = await q.get()
        try:
            await process(session, url, q)
        except Exception as e:
            print(e)
        finally:
            q.task_done()
async def d():
    <code to query and put data into stdrows>
    connector = TCPConnector(limit=500)
    async with ClientSession(connector=connector) as session:
        url = '<some base url>'
        for i in range(500):
            tasks.append(asyncio.create_task(fetch_worker(session, url_queue)))
        for row in stdrows:
            await url_queue.put(url.format(row[1]))
        await asyncio.gather(*tasks)
        await url_queue.join()
asyncio.run(d())

This appears not to be going at 500 tasks/sec. is it even possible to get to this rate without knowing all the URLs ahead of time? I am hoping to fetch the next url from whatever initial url (or from its paginated url) while i work with `views`. The bucket instantly drops to 699.8... and stays around there for the rest of the run.This matches up with when I print the URL in process - it prints the initial say 24 then it slows down. There are definitely more than 500 generated urls. If I put 2000 connections/tasks, it's still the same time

What is the best way of handling paginated pages with asyncio?

Open API

"Question Title" in New Quizzes - Please add a Che...

Assistance required with SAML just-in-time provisi...

tool proxy registration invalid_capabilities error

LTI 1.3 tool placement for assignment selection en...

LTI 1.1 Transition signature mis-match in LTI 1.3 ...

"Question Title" in New Quizzes - Please add a Che...

Rubrics API Terms Clarification

Assistance required with SAML just-in-time provisi...

Python Canvas API library

tool proxy registration invalid_capabilities error

You're signed out

What is the best way of handling paginated pages with asyncio?

Community Help

View our top guides and resources: