This update contains an important bugfix to handle malformed UTF-8 in user agent strings.
This update will simply ignore any user agents with malformed UTF-8, avoiding errors when trying to send updates via the API. These user agents are invalid and so there is no point undertaking any further analysis - thus they are silently discarded.
This new version also contains an additional CLI tool for importing user agents from a text file, for testing purposes.
Minor bug fix - no need to update unless you are experiencing problems sending bot updates.
- when sending bots via email, include the bot list as an attachment rather than in the body
- new CLI tool to send bots via email directly, used for debugging bot sending issues
- new CLI tool known-bots😛arse to parse web server log files and display detected bots
- new CLI tool known-bots:send to send newly detected user agents to the KnownBots API for analysis
- new CLI tool known-bots:check-token to validate that the API token successfully authenticates - and optionally have the system regenerate a new API token if it has expired
- knownbots@hampel.io email address is deprecated and will be removed soon - emails should no longer be sent to this address
- new configuration option to "Send user agents via API", which requires configuration by entering a XenForo license validation token. New agents are sent directly via api and no longer by email
- the "Email user agents" option remains - but is used only for forum administrators to send themselves emails if they choose. Upgrading to v6 of the addon removes any reference to knownbots@hampel.io from this configuration option.
- addon now uses v3 of the bot fetch API, which includes new functionality
- v2 of the bot fetch API remains operational for sites still using addon v5.x
- v1 of the bot fetch API is now deprecated and will soon stop functioning - sites still using addon v4.x should upgrade as soon as possible
- new functionality for the addon - a list of regex based ignore strings to remove malformed or obfuscated user agents from analysis. This also allows us to ignore user agents containing sql-injection and other forms of attack which typically flood a system with a large number of unique user agents in a short period of time.
- performance enhancement - we no longer do browser or ignored checks for user agents of users who are logged in. We assume that anyone logged in with a valid XenForo user id is using a valid browser. Note that bot detection is still run, just in case. This significantly reduces the amount of processing performed by the addon for valid users.
v5.0.0 is a major rewrite of the core functionality of this addon aimed at improving processing speed, bot detection sophistication and greatly enhancing our ability to identify new bots.
Note that the options have changed - so please check the options after upgrading. More information about each option is provided on the main addon page.
- major rewrite - no longer use "bot|spider|crawl" search strings and false-positive lists to identify possible bots, rely instead on search strings supplied by API to identify valid browsers and store them directly in the database rather than the SimpleCache, ready for emailing
- more complete agent reprocessing - check for valid browsers and ignored agents
- change the core userAgentMatchesRobot function to use strpos instead of preg_match, it's much faster and won't fall over with extremely high numbers of bot match strings
- allow BotFetcher to be manually configured to bypass untrusted http agent - used for testing when API source is on a .local domain. Default action remains to use the untrusted http agent to allow for proxying outbound API calls.
- change email cron to daily send
- using new v2 API from KnownBots
- replace generic bots with complex (regex) based searches
- add "Fetch new bots" button to Known Bots List in admin UI
- automatically reprocess user agents after loading new bot data
- new Cli command for reprocessing user agents, including the option to force all user agents to be reprocessed
- improvements to user agent test in admin ui to be more descriptive
- bcc additional email address to keep them private
- bugfix: don't linkify known bot list when no links supplied
This release includes additional sanity checks to prevent bad data returned from the API from breaking the forums.
If any of the data returned by the API is not in the exact format we expect, the entire download is discarded and no changes applied to the forum. An error message will be logged prompting further investigation.
After upgrading to 4.0.1, you should manually force a fetch of new API data by executing the following command from your forum root:
php cmd.php known-bots:fetch -f
KnownBots v4 is a completely new build - bots are no longer hard coded, but updated via API calls and uses the XF code cache to store bot data
- raw bot data downloaded from API is stored in internal_data/knownbots.json
- new CLI tool for manually fetching bots from API (Cron task is also provided)
- new CLI tool for manually loading bots from knownbots.json
- new CLI tool for testing user agent matches
new bots discovered in June 2021
- btbot: BT Bot
- catchbot: Catchbot
- comodospider: Comodo SSL Spider
- deepnoc: deepnoc bot (network optimized crawling)
- dispenserbot: Dispenser Dab Solver Checklinks Bot
- epicbot: Epictions EpicBot
- esperanzabot: EsperanzaBot
- fast enterprise crawler: Fast enterprise crawler 6 used by Schibsted
- fleabot: Mercadopar Fleabot
- fuseonbot: Fuseon Link Affinity Bot
- google/bot: google/bot
- greenflare seo crawler: Greenflare SEO Crawler
- gsitecrawler: GSiteCrawler
- holmes: Morfeo Holmes Bot
- infotigerbot: InfoTiger Search Engine Bot
- internet security survey bot: Internet Security Survey Bot
- jetpack-bot: JetPack Bot
- mojoo robot: Mojoo Bot
- nicecrawler: NiceCrawler
- prem.moe crawler: Prem.moe Crawler
- sayindexbot: SayIndex Bot
- sbl-bot: SoftByte Labs Bot
- siteliner: Siteliner Bot
- swjschecketbot: Swjschecketbot
- trade desk ads.txt & sellers.json crawler: Trade Desk ads.txt & sellers.json crawler
- vortex: Marty Anstey Vortex Bot
- www.hlabs.co.ke: hLabs Bot
- zspider: Red Kolibri ZSpider
ew false positives:
new bots:
- cubot; j5
- baiduboxapp
- 200pleasebot
- a8bot
- abilogicbot
- acoonbot
- adform robot
- arhpostbot
- atomseobot
- awariorendererbot
- badoobot
- bl.uk_ldfc_bot
- brobot
- charityengine bot
- charlotte
- cosmos
- coveobot
- crawlbot/1.0.0
- cxensebot
- facebot
- fandomopengraphbot
- freshpingbot
- fuelbot
- geedobot
- getlocalbot
- google-safety
- gpcsupbot
- grub-client
- gynxbot
- healrworld crawler
- hgfalphaxcrawl
- hoodle crawler
- idmarch automatic
- imrbot
- jambot
- justlocal.nl
- kantarsifomediaauditbot
- keobsbot
- keybasebot
- koepabot
- lanaibot
- landsbokasafn
- lapozzbot
- linkpulse metacrawler
- linksmanager.com_bot
- lxrbot
- mbot v
- moreoverbot
- netpeakspiderbot
- www.niraiya.com
- node/simplecrawler
- nu.marginalia.wmsa.edge-crawler
- nutchcvs
- oer commons bot
- omniexplorer_bot
- onefuncbot
- oozbot
- pickybot
- piepmatz bot
- plukkie
- pu_in crawler
- punkspider
- pwa-crawler
- reasonalbot
- revuebot
- runet-research-crawler
- screenerbot crawler
- searchenginecrawler
- sebot-wa
- seekbot
- shopwiki
- showyoubot
- siteauditbot
- sitescorebot
- spinn3r
- squirrobot
- ssl-crawler
- thinkbot
- tsmbot
- tweetedtimes bot
- ucrawl
- umichbot
- urlappendbot
- verticalleap-sitestatusbot
- webgraph
- weblinkchecker
- websquash.com
- wellknownbot
- wizenozebot