Discuz! Board

 找回密碼
 立即註冊
搜索
熱搜: 活動 交友 discuz
查看: 1|回復: 0

Hunting for duplicates with Screaming Frog's Custom Extraction

[複製鏈接]

1

主題

1

帖子

5

積分

新手上路

Rank: 1

積分
5
發表於 2024-3-7 12:08:48 | 顯示全部樓層 |閱讀模式
Screaming Frog is known and loved for his versatility. Among its most useful functions we remember Custom Extraction , with which we extract certain information on the pages of a site using a set of rules , such as Regex and CSS or Xpath selectors .Within an e-commerce we expect to find a duplication of content, for example in the description within the product sheet. For example, let's think about the typical situation of non-canonicalized filters and sorting .By way of example, let's take a card from a historic leather goods brand.example product sheetOnce we have identified the section containing the description we can peek into the code through Chrome's Inspection tool .



product sheet code examplesIn this way we discover that the Special Data description is inserted inside a tag p identified by the class named description . We expect that the pages are built with the same logic and therefore that all product descriptions are identified with the same class.We can build a CSS selector to delimit the portion of the page that we are going to extract. In our specific case the selector will be a trivial and simple .description .Fast on CSS selectors?There are two ways: review this guide or let Google Chrome help you . Once inside "inspect", click on the portion of code you are interested in and then right click. Like magic, Chrome allows you to extrapolate ready -to-use CSS or XPath selectors .example product sheet code-copy selectorsGreat, now that we have our selector let's set the Custom Extraction (pathconfiguration > custom > extraction ) and then we launch the site crawl. For an analysis of this type it is best to configure the crawler to respect noindex and canonical .Here is the setting to extrapolate all the descriptions of the product sheets for the ecommerce used in our example.custom extraction screaming frogWe specify the Extract Text option to better read the data.






Once the crawl is finished, we go into the custom extraction tab . By clicking on the selector label ( product description in our case) we display the information in alphabetical order . In this way we find blocks of similar and/or identical contents at a glance .custom extraction result screaming frogOnce you have the data set you will need to export the data and pass it onto spreadsheets such as Excel and Google Sheet . We recommend using Excel because it allows you to quickly highlight duplicates using conditional formatting rules . If you want to know more, please leave us a comment.This function is also suitable for other strategic uses. For example, it can give you input and ideas useful for analyzing competitors . For example, how do you write the description of the product sheet? Does it follow rigid patterns or are they compiled naturally? Furthermore, we can set up a custom extraction to obtain product prices and analyze the pricing .

回復

使用道具 舉報

您需要登錄後才可以回帖 登錄 | 立即註冊

本版積分規則

Archiver|手機版|自動贊助|z

GMT+8, 2025-4-28 07:30 , Processed in 0.036019 second(s), 18 queries .

抗攻擊 by GameHost X3.4

Copyright © 2001-2021, Tencent Cloud.

快速回復 返回頂部 返回列表
一粒米 | 中興米 | 論壇美工 | 設計 抗ddos | 天堂私服 | ddos | ddos | 防ddos | 防禦ddos | 防ddos主機 | 天堂美工 | 設計 防ddos主機 | 抗ddos主機 | 抗ddos | 抗ddos主機 | 抗攻擊論壇 | 天堂自動贊助 | 免費論壇 | 天堂私服 | 天堂123 | 台南清潔 | 天堂 | 天堂私服 | 免費論壇申請 | 抗ddos | 虛擬主機 | 實體主機 | vps | 網域註冊 | 抗攻擊遊戲主機 | ddos |