正则表达式:参考 https://deerchao.cn/t…
正则表达式参考 https://deerchao.cn/tutorials/regex/regex.htm常用的操作使用re替换的函数123456789101112import reinputStr = 'hello 234 world 567 额外rwe2121'def _add111(matched): intStr = matched.group("number") intValue = int(intStr) addedValue = intValue + 111 addedValueStr = str(addedValue) return addedValueStrreplacedStr = re.sub("(?P<number>\d+)", _add111, inputStr);print(replacedStr) 123456789def replace(matched): group_str = matched.group("page"...
Scrapy中间件源码解读:源码解读
源码解读class MiddlewareManagerE:\python3.7.6\Lib\site-packages\scrapy\middleware.py class Spider(MiddlewareManager)E:\python3.7.6\Lib\site-packages\scrapy\core\spidermw.py class Download(MiddlewareManager)E:\python3.7.6\Lib\site-packages\scrapy\core\downloader\middleware.py class Extensions(MiddlewareManager)E:\python3.7.6\Lib\site-packages\scrapy\extension.py ItemPipelineManager(MiddlewareManager)E:\python3.7.6\Lib\site-packages\scrapy\pipelines_init_.py 关于中间件如何调用1234567891011121314151617181920...
Scrapy调试技巧:scrapy fetch
scrapy fetch12scrapy fetch https://segmentfault.com/a/1190000017087999scrapy fetch https://segmentfault.com/a/1190000017087999 --nolog --headers scrapy shell 带请求头 1scrapy shell -s USER_AGENT="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0" https://www.zhihu.com/question/285908404 1234$ scrapy shell>>> from scrapy import Request>>> req = Request('yoururl.com', headers={"header1":"value1"})>...
Scrapy的extensions:EXTENSIONS
EXTENSIONSEXTENSIONS_BASE 注意:关于scrapy爬虫extensions 执行顺序的问题 查看默认的爬虫中间件scrapy settings –get EXTENSIONS_BASE 12345678910{"scrapy.extensions.corestats.CoreStats": 0,"scrapy.extensions.telnet.TelnetConsole": 0, "scrapy.extensions.memusage.MemoryUsage": 0, "scrapy.extensions.memdebug.MemoryDebugger": 0,"scrapy.extensions.closespider.CloseSpider": 0, "scrapy.extensions.feedexport.FeedExporter": 0, "scrapy.extensions.logstats.Log...
Scrapy的download中间件
下载中间件DOWNLOADER_MIDDLEWARES 注意:关于scrapy下载中间件执行顺序的问题 scrapy本身有默认的一些中间件DOWNLOADER_MIDDLEWARES_BASE,可以通过scrapy settings –get DOWNLOADER_MIDDLEWARES_BASE命令查看 12345678910111213141516{"scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware": 100,"scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware": 300,"scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware": 350,"scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddlewar...
Scrapy的spider中间件:爬虫中间件
爬虫中间件SPIDER_MIDDLEWARES 注意:关于scrapy爬虫中间件执行顺序的问题 查看默认的爬虫中间件scrapy settings –get SPIDER_MIDDLEWARES_BASE 1234567{ "scrapy.spidermiddlewares.httperror.HttpErrorMiddleware": 50, "scrapy.spidermiddlewares.offsite.OffsiteMiddleware": 500, "scrapy.spidermiddlewares.referer.RefererMiddleware": 700, "scrapy.spidermiddlewares.urllength.UrlLengthMiddleware": 800, "scrapy.spidermiddlewares.depth.DepthMiddleware": 900} SPIDER_MIDDLEWARES 设置...
mitmproxy / mitmdump
mitmdump文档https://docs.mitmproxy.org/stable/ 安装pip install mitmproxy mitmdump -q -s inect_js.py -p 9999 -q:屏蔽mitmdump默认的控制台日志,只显示自己脚本中的 -s:入口脚本文件 -p:更改端口,默认为8080 修改脚本文件时,不用重启也会生效 针对 HTTP 生命周期的事件 请求:def request(self, flow: mitmproxy.http.HTTPFlow): 响应:def response(self, flow: mitmproxy.http.HTTPFlow): 其它: def http_connect(self, flow: mitmproxy.http.HTTPFlow): def requestheaders(self, flow: mitmproxy.http.HTTPFlow): def responseheaders(self, flow: mitmproxy.http.HTTPFlow): def err...
git笔记:提交格式:
常用的git命令 git mv file_from file_to(重命名文件,删除原来的文件,添加新文件到暂存区) git switch 命令专门用于切换分支,可以用来替代 checkout 的部分用途。 git update-index –skip-worktree git branch -u origin/branch 建立当前分支与远程分支的映射关系 git update-index –assume-unchanged git log –all –since “2021-03-01” –oneline –author=”Zhang-Jane” git remote update origin —prune 更新远程分支本地列表 git branch -vv 查看分支映射 git push origin –delete 删除远程分支 git rev-list —all | xargs git grep -F 关键词 git ls-files 查看哪些文件在版本控制下 git clone -b 远程分支名 仓库地址 git blame 查找文件修改者...
Frida的hook脚本:Hook RegisterNatives
Hook RegisterNatives命令frida -U --no-pause -f package_name -l xx.js 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109var ishook_libart = false;function hook_libart() { if (ishook_libart === true) { return; } var symbols = Module.enumerateSymbolsSync("libart.so"); var addrGetStringUTF...
Appium启动多个移动设备:元素的判断
元素的判断from selenium.webdriver.support import expected_conditions as EC expected_condtions提供了16种判断页面元素的方法: 1.title_is:判断当前页面的title是否完全等于预期字符串,返回布尔值 2.title_contains:判断当前页面的title是否包含预期字符串,返回布尔值 3.presence_of_element_located:判断某个元素是否被加到dom树下,不代表该元素一定可见 4.visibility_of_element_located:判断某个元素是否可见,可见代表元素非隐藏,并且元素的宽和高都不为0 5.visibility_of:跟上面的方法是一样的,只是上面需要传入locator,这个方法直接传定位到的element就好 6.presence_of_all_elements_located:判断是否至少一个元素存在于dom树中,举个例子,如果页面上有n个元素的class都是’coumn-md-3’,name只要有一个元素存在,这个方法就返回True 7....