This table consists of 42,500 pages, with each row representing one page.
The file is formatted in JSONL and compressed with zstd. It includes the following columns:
┌─host────────────────────┬─path────────────────────────────────────┬─query─┬─body─┬─latency_ms─┬─err─┐
│ trends.builtwith.com │ /shop/Shopify-Email-Marketing/Lithuania │ │ ... │ 233 │ │
│ trends.builtwith.com │ /cms/Subsplash/BRIC │ │ ... │ 133 │ │
│ spiderbites.nytimes.com │ /iht/1994/02/1856/2020/1969/1931/ │ │ ... │ 160 │ │
│ www.guoxuedashi.com │ /hydcd/631579l.html │ │ ... │ 350 │ │
│ ja.m.wikipedia.org │ /wiki/%E5%8F%A4%E8%B0%B7%E5%BE%B9 │ │ ... │ 28 │ │
│ www.jdsupra.com │ /authors/joseph-blinick/cryptoassets/ │ │ ... │ 64 │ │
│ www.nieuwsblad.be │ /cnt/dmf20131011_00786965 │ │ ... │ 614 │ │
│ en.m.wikipedia.org │ /wiki/Rama_Prabha │ │ ... │ 9 │ │
│ spiderbites.nytimes.com │ /iht/1994/02/1994/1910/1902/1862/ │ │ ... │ 97 │ │
│ i.redd.it │ /tnac3562s8cc1.gif │ │ ... │ 243 │ │
└─────────────────────────┴─────────────────────────────────────────┴───────┴──────┴────────────┴─────┘
To load the data into a Pandas Dataframe,
%pip install pandas zstandard
import pandas as pd
pd.read_json("~/data.jsonl.zstd", lines=True, compression='zstd')
============================================
{% block message %}{% endblock %}