C # Grab free agents in batches and verify their effectiveness

Previously, I saw that the number of articles on the official website of a company would increase once when the page was refreshed, which gave me a bad feeling. The official website of a company gave people such a straightforward loophole. When I made a batch of requests, I found that all the pages were open with errors. When the official website of a company with more than 100 people was refreshed, you can show me this. This company had come to our school to promote recruitment before+found that it had recruited xamarin before when searching the garden for recruitment, Very curious, so I paid attention to it. Well, let's not talk about that. It's just bullshit. To return to the topic, I want to say that the csdn article can refresh the article's browsing volume by setting the proxy ip. So the first thing to do is the topic of this article, "Use c # to verify the validity of the proxy ip".

Of course, the source of proxy IP must be free, so the efficiency is average. The proxy IP captured from some free proxy IP pages may not be all useful, so we need to verify the proxy IP captured by us. The effective time of proxy IP is also limited, from a few seconds to an hour. Most of the time is very short. So for example, we need 100 proxy IPs a minute, so we can get them once a minute, Each time you get 100 (in the ideal state, the captured proxy IP is valid), in principle, it should be used immediately after being captured.

Of course, this article is relatively basic, and I always think that reptiles are more interesting. In fact, I am also a little white in terms of reptiles. I just make a simple record. If there are any mistakes, I hope I can make suggestions. For the following problems, we can complete the detection of how to verify the validity of the proxy IP.

1. From which web pages can I grab free proxy IP addresses?

http://www.xicidaili.com

http://www.ip3366.net

http://www.66ip.cn

Baidu has a lot of "free proxy ip".

2. Is the proxy IP stable? What's the use?

The timeliness and effectiveness of this free proxy IP are not strong. The timeliness of the above three free proxy websites ranges from about ten seconds to one hour. Generally, they need to be used after verification to improve the hit rate. It can be used to hide web IP (some websites are not allowed to use proxy IP, such as Douban, which is embarrassing in fact, but is the content so expensive). It is usually used for space messages, website traffic, online earning tasks, batch account registration, etc. As long as there are no other restrictions, it can be used if the IP needs to be changed frequently.


3. Is pinging the IP address valid? How to verify whether the proxy is valid

Well, this is a bit of nonsense. Port testing is the most effective. Ping does not mean that the proxy is effective. If it cannot be flat, the proxy may not be unavailable. You can use either HttpWebRequest or Scott. Of course, HttpWebRequest is slower than the socket connection proxy IP and port.



4. How many agents are appropriate for one extraction?

The proxy IP is not efficient and effective, so it can only be obtained from some proxy IP websites in batches and at regular intervals. Some proxies can only be used within one minute, so there are many restrictions.



5. What is the difference between http proxy and https proxy?

Websites that need to access https need to use https proxy. For example, Baidu needs to access http proxy, which can be used. This is not 100%.

The steps to detect the validity of the proxy ip are as follows:

1. Use HttpWebRequest and HttpWebResponse to request the web page of proxy ip and obtain the web page content containing the proxy

2. Use HtmlAgilityPack or regular expression to intercept the captured content and save it to the proxy collection

3. Get the proxy collection, and send http requests through multiple threads, such as Baidu, to check whether the visit is successful. If it is successful, it will be saved in Redis.

The effect picture is as follows:

Initiate a request using HttpWebRequest

Request.cs is as follows. There are two main methods. One is to verify whether the proxy ip is valid, set the proxy property of HttpWebRequest, and request Baidu. Most of the articles will get the response content. If the content meets the request URL, the proxy is valid. In fact, you can judge whether the verification is valid according to HttpStatusCode 200.

[Note] The console program is built, and asynchrony is used. So let's build. net core, version 7.1 of c # language.


Grab the free agent and check whether it is valid

There are four main methods in ProxyIpHelper.cs to check whether the IP address can be checked by CheckProxyIpAsync, grab the proxy GetXicidailiProxy of xicidaili.com, grab the proxy GetIp3366Proxy of ip3366.net, and grab the proxy GetIp3366Proxy of 66ip. cn. If you want to capture more websites, you can write more.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us