Community Help

ntyler1 · ‎04-21-2020

Hello,

I am trying to asynchronously fetch people's page views. I'm trying a queue approach that I found on Stackoverflow.

import asyncio
from aiohttp import ClientSession, TCPConnector
async def get(session, url):
    headers = {
      'Authorization': 'Bearer KEY',
     }
     async with session.get(url, headers=headers) as response:
            json = await response.json()
            return json, response
async def process(session, url, q):
    try:
        try:
            views, response = await get(session, url)
            scode = response.status
            if scode == 404:
                return
        except Exception as e:
            print(e)
            return
        try:
            await q.put(str(response.links["next"]["url"]))
        except:
            pass
        <do something with views>
    except Exception as e:
        print(e)
async def fetch_worker(session, q):
    while True:
        url = await q.get()
        try:
            await process(session, url, q)
        except Exception as e:
            print(e)
        finally:
            q.task_done()
async def d():
    <code to query and put data into stdrows>
    connector = TCPConnector(limit=500)
    async with ClientSession(connector=connector) as session:
        url = '<some base url>'
        for i in range(500):
            tasks.append(asyncio.create_task(fetch_worker(session, url_queue)))
        for row in stdrows:
            await url_queue.put(url.format(row[1]))
        await asyncio.gather(*tasks)
        await url_queue.join()
asyncio.run(d())

This appears not to be going at 500 tasks/sec. is it even possible to get to this rate without knowing all the URLs ahead of time? I am hoping to fetch the next url from whatever initial url (or from its paginated url) while i work with `views`. The bucket instantly drops to 699.8... and stays around there for the rest of the run.This matches up with when I print the URL in process - it prints the initial say 24 then it slows down. There are definitely more than 500 generated urls. If I put 2000 connections/tasks, it's still the same time

What is the best way of handling paginated pages with asyncio?

Open API

New Media Player in open source version

Question on the File API UUID Deprecation

displaying and submitting a student_annotation ass...

Auralia / Musitian LTI given and surname are swapp...

Gray screen for Microsoft OneDrive LTI - Local hos...

Does Canvas Data Services (Webhooks / Live Events)...

API id datatype inconsistencies

Get submission for all the users for course

Getting unsubmitted/draft assignment submissions

Insert Reftagger into Discussion Forum Posts

You're signed out

What is the best way of handling paginated pages with asyncio?

Community Help

View our top guides and resources: