Talking to websites

-=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- (c) WidthPadding Industries 1987 0\|109\|0 -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=-
Socoder -> Blitz -> Talking to websites
Sat, 09 Jan 2010, 16:15
mindstorm8191	Hey guys, I am interested in making a sort of bot for a certain website game. The problem is, though, that it uses cookies, and requires interaction by html forms. Does anyone know how to have Blitz / Blitz3D send form data, and store & send cookie data? I could do some testing on the cookies thing, but I'd first have to figure out the forms part. -=-=- Vesuvius web game
Sat, 09 Jan 2010, 16:38
HoboBen	Cookies are set/returned in the HTML header when you receive/request a web page. see wikipedia on HTTP cookies Forms usually use a POST http request instead of a GET request. For an example, The POST Method With the above links, you should be able to modify the BlitzGet code to do what you want. View the HTML source on any form-using webpage for names of the variables you want to supply (look for input tags, and use the name/id - and send the HTTP request to the <form> target (the "action" attribute) ), and send any needed cookie data along in the HTML header. Hope I was clear enough there, feel free to ask me to elaborate on specific parts. -=-=- blog \| work \| code \| more code
Sat, 09 Jan 2010, 18:06
mindstorm8191	Thanks for the reply. I'm still not sure how to actually send and receive cookie data and the like with regular statements (such as connecting by blitz's TCP commands). The Wikipedia page was showing some stuff that would make sense, like "GET /spec.html HTTP/1.1", so would sending this kind of data work? -=-=- Vesuvius web game
Sat, 09 Jan 2010, 18:22
HoboBen	I'll code you up an example tomorrow (bed time!), but yeah, if you replace the GET in a BlitzGet request with POST you'd be on the right track - set up a PHP form handler on your own web server (use the PHP $_POST['variable_name'] superglobal) to debug properly. -=-=- blog \| work \| code \| more code
Sun, 10 Jan 2010, 03:06
shroom_monk	A few useful things I used when doing communication with websites a while back, in case they're helpful: This has all the various commands and stuff you can send to a server: www.networksorcery.com/enp/protocol/http.htm This code will download data from a website (based on some stuff from Blitz help): ; connect to google site = OpenTCPStream("www.google.com",80) ; check if a connectiong could be made If Not site Then Print "Connection could not be established." WaitKey() End End If ; ask for contents of page WriteLine site,"GET https://www.google.com HTTP/1.0" WriteLine site,Chr$(10) ; write data to screen While Not Eof(site) Print ReadLine$(site) Wend ; close stream and end program CloseTCPStream site WaitKey() End --v I'm not too sure about cookies though, but I seem to recall a few people having some old IK bots lying around somewhere... -=-=- A mushroom a day keeps the doctor away... Keep It Simple, Shroom!
Sun, 10 Jan 2010, 05:53
flying_cucco	Did someone say Inselkampf?! {1/3} working with web servers HTTP is all just messages made of lines of human readable text. You (the client) make a request of the server, and the server replies with the infotmation you want. It is pretty easy to write and parse these messages with blitz, but before we get onto that, a brief description of how it works. These messages are made of two parts, the header and the message body, or payload. The client's headers typically specify a resource (ie file) to request and the capabilities of the client. The server's headers contain meta-data about the capabilities of the server, the outcome of your request and the format of the message (ie file). GET The basic method of requesting files is to use GET, and HTTP version 1.0. This is the old version of HTTP, which needs fewer parameters specified in the header. Say you want the page at https://google.co.uk (First load it up in your browser and view the page source, that's how it will appear to the broswer). How did the browser get that page? Like this (but probably more complicated using HTTP/1.1): client browser opens a TCP/IP connection to www.google.co.uk on port 80 client requests the page using "GET", then specifing the resource (page) then giving the version of the protocol we are using (1.0) GET https://www.google.com HTTP/1.0 --v client sends a blank line to signal the end of the request (all HTTP lines are terminated by <CR><LF> ) Codes server will receive this and respond with a code, telling you the outcome of the request HTTP/1.0 200 OK --v then a bunch of headers then the actual resource. If the page wasn't found, it might say 401 instead, or 302 if the page had moved to another server and so on. 200 is the good one. Headers Because we used a HTTP/1.0 request, we didn't have to include any headers, but the server still responded with some, most of which we will ignore Headers are always the name of the header field, then a colon and a space, then the data. ... Set-Cookie: NID=really-long-tracking-cookie-XXX; expires=Mon, 12-Jul-2010 12:36:49 GMT; path=/; domain=.google.co.uk; HttpOnly ... --v The Content-Type: header is useful because it tells you what format the file is, be it an image or a web page. Payload Depending on the code (always for 200), after the headers (and a blank line as before) will be the file or page or whatever.
Sun, 10 Jan 2010, 06:13
flying_cucco	{2/3} Doing this from blitz Go back to the GET section where we outlined how a browser gets a page, well we are going to do exactly the same! The command to connect to a server using TCP is OpenTCPStream("url", port) tcp_stream = OpenTCPStream("www.google.co.uk",80) --v This creates a 'stream', similar to how blitz handles reading from/writing to files, except over the internet In fact we can use the same commands to get data as we would do from a local file. Next to make the request we use the WriteLine stream, data. Don't forget to send a blank line to tell the server you are done. WriteLine tcp_stream,"GET https://www.google.co.uk HTTP/1.0" WriteLine tcp_stream,"" --v So now the server will reply. We need to set up a loop and read all the data. ReadLine$(stream) will get the next line of data and Eof(stream) (End-Of-File) tells us when we are done to break the loop. While Not Eof(tcp_stream) DebugLog ReadLine$(tcp_stream) Wend --v ReadLine is suitable for text data, like the headers and web pages, but you might want to use ReadByte for binary data, like images. Now we are done, close the connection. If we had many files to get, we could actually make another request without closing, but for that we need HTTP/1.1 CloseTCPStream tcp_stream --v
Sun, 10 Jan 2010, 06:40
flying_cucco	{3/3} Cookies Cookies are small pieces of information the the server can store with the client. These are often used to track sessions and check that a user is logged in. The server sets a cookie by using Set-Cookie: name=value in the headers, and then every time the client visits a page it will include the same cookie with a header as well Cookie: name=value Example of how to read a session cookie While Not Eof (tcp) temp$ = ReadLine (tcp) If Left$ (temp$, 24) = "Set-Cookie: _inselkampf=" Then cookie$=Mid$(temp$, 25, 32) EndIf Wend --v Why Mid$(temp$, 25, 32)? We only want the string of 32 characters, starting at character 25. A better/more general version would handle different length/named cookies; I'd forgotton how bad my IK code was! Set-Cookie: _inselkampf=4b79714be38f567a48a25cda49748b69; path=/ ------------------cut-->8------------------------------8<--cut-- --v Then in subsequent requests we can use that same cookie and the server will know we are logged on WriteLine tcp, "GET blah.htm HTTP/1.0" WriteLine tcp, "Cookie: _inselkampf=" + cookie$ WriteLine tcp, "" --v Stuff after the semicolon controls how long the client should keep the cookie, what scope it should be included on and so on. You probably won't need that. Forms Forms are defined in the html page. Here is a simple example that includes two input fields. <form name="form" action="page.php" method="get"> <input type="text" name="query"> <input type="text" name="query2"> <input type="submit" name="Click here to search!"> --v When the form is submitted, a GET request is made of the resource (page) specified by action. The data in the fields (inputs) is appended to the url. Example: GET page.php?query=Foobar&query2=Sheep HTTP/1.0 --v * The form starts with a ? * each field is writen as name=value * & separates fields In blitz, you could substitute variables for the values. q1 = "Foobar" q2 = "Sheep" WriteLine tcp_stream, "GET page.php?query=" + q1$ + "&query2=" + q2$ + " HTTP/1.0 WriteLine tcp_stream, "" --v The POST method is similar, except the form is sent in the body of the message. POST page.php HTTP/1.0 query=Foobar&query2=Sheep --v Wireshark Many servers can be quite picky about how they respond. The easiest way to get it working is to as much as possible ape a real browser. You can follow the exchange between client and server using a network protocol analyzer. Wireshark It is FTW.
Sun, 10 Jan 2010, 20:10
mindstorm8191	Hey, thanks for the detailed tutorial Cucco. But I still have questions on what data is sent and received. Lets see if I can explain what I understand: client -> server POST page.php http/1.0 user=danny1 pass=blahblah --v server -> client HTTP/1.0 200 OK set-cookie:userid=57 pagestyle=green --v client -> server POST userhome.php HTTP/1.0 makefile=brad cookie:userid=57 cookie:pagestyle=green --v Am I on the right track here? -=-=- Vesuvius web game
Mon, 11 Jan 2010, 13:24
flying_cucco	Almost! With POST there is a special format to encode the data, which I only touched on above. Basically you must put it in the format that could be used for URLs (web addresses). * It is all on one line with no 'white space' * Fields are stored as key pairs, with the name first, then a =, then the value * Between each field is an & * Spaces are replaced by + * reserved characters can only be used if they are escaped with a % then the ascii code for that character (in hex) user=danny1 pass=blahblah becomes user=danny1&pass=blahblah Finally don't forget that HTTP needs a blank line between the header and the body of a message. client->server (1) POST page.php http/1.0 user=danny1&pass=blahblah --v
Mon, 11 Jan 2010, 13:39
flying_cucco	This is the minimum a server might send (see below), this is just the headers, the output of page.php would follow. server->client (2) HTTP/1.0 200 OK Set-Cookie: userid=57 Set-Cookie: pagestyle=green --v Cookies are part of the header, so come before the body: client->server (3) POST userhome.php HTTP/1.0 Cookie: userid=57 Cookie: pagestyle=green makefile=brad --v In the real world You are much more likely to see this sort of thing as a response from a server: HTTP/1.1 200 OK Date: Mon, 11 Jan 2010 20:45:03 GMT Server: Apache/1.3.34 (Unix) mod_fastcgi/2.4.2 PHP/5.1.1 Set-Cookie: userid=57; path=/ Set-Cookie: pagestyle=green; path=/ Cache-Control: no-cache Content-Length: 2751 Connection: close Content-Type: text/html; charset=utf-8 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "https://www.w3.org/TR/html4/loose.dtd"> <html> <head> ... --v
Tue, 12 Jan 2010, 02:12
Afr0	Http Made Really Easy Edit: If you're thinking about using the HTTP protocol for anything else than writing a custom web-browser, I'd drop it. The HTTP protocol is old, text-based (thus has alot of overhead) and generally messy. Use a custom binary protocol instead. -=-=- Afr0 Games Project Dollhouse on Github - Please fork!
Tue, 12 Jan 2010, 04:31
Jayenkai	Some people prefer to learn things.
Tue, 12 Jan 2010, 04:44
Afr0	What are you on about?! He wanted to learn about the HTTP protocol, I gave him a link for it. I just added a warning simply because some things are not worth learning unless you're aiming to do something very specific. -=-=- Afr0 Games Project Dollhouse on Github - Please fork!
Tue, 12 Jan 2010, 11:54
Mog	Not to slam on you, Afr0, but why do you always suggest rolling your own crazy fandangled new method/file format/ algorithm/protocol? The link is good, but the blurb afterwards is not at all useful. Even though http is "old, text-based, and messy", everything uses it- Notice the HTTP for nearly all websites you stumble onto? He's trying to communicate data as if he were a browser, so it's only logical to talk in the same way a browser would with a server. Mindstorm - Why don't you look into cURL? I'm very certain Blitzmax has a lib, but not so sure on Blitz3D, It's great for spoofing HTTP information and taking control of webpages. -=-=- I am Busy Mongoose - My Website Dev PC: AMD 8150-FX, 16gb Ram, GeForce GTX 680 2gb Current Project: Pyroxene
Tue, 12 Jan 2010, 12:00
flying_cucco	HTTP is: very widely used supported by lots of software easy to learn simple to implement flexible backwards compatible Using free/cheap web hosting could work out much better than expensive dedicated servers for any 'custom binary protocol'. The scoreboards here are an example.
Tue, 12 Jan 2010, 12:05
Afr0	Notice I said for anything else than writing a custom web-browser. I suppose I should have been more specific. Specifically, the HTTP protocol should not be and wasn't designed to be used for (real-time) game(s), videostreaming, IM (although MSN Messenger uses a similarily bloated protocol) and/or large filetransfers (scales hideously bad on the serverside). It also should not be used for transfering any personal information such as passwords unless the information has been encrypted beforehand and the server knows the key. That isn't to say you can't do it, and that many people aren't doing it, but you shouldn't do it. -=-=- Afr0 Games Project Dollhouse on Github - Please fork!
Tue, 12 Jan 2010, 13:14
JL235	Afro Specifically, the HTTP protocol should not be and wasn't designed to be used for (real-time) game(s), videostreaming, IM (although MSN Messenger uses a similarily bloated protocol) and/or large filetransfers (scales hideously bad on the serverside). then it's a good thing that he requires it for... mindstorm making a sort of bot for a certain website game. Having said that I do now hate to do an Afro. As someone who has written a simple web-game bot I would recommend switching to a language that includes a regular expression library. Why you ask? Your problem is a perfect example of where you might use regular expressions. You need to strip out specific bits of text from the (X)HTML you get back from the server. Regexes will make this job WAAAAAAAAAAAAAY simpler and shorter to code. A one line regular expression can easily be the equivalent of pages and pages of string-twiddling code. Python, Ruby, PHP and Perl spring to mind but regular expressions are pretty common. They are supported by plenty of other languages. There is even a regex module for BMax. Finally I can also try and dig-out my Inselkampf bot code next week when I'm back at uni.
Tue, 12 Jan 2010, 13:19
Afr0	Having said that I do now hate to do an Afro. Thanks for that... [/sarcasm] But yeah, I agree with JL235. -=-=- Afr0 Games Project Dollhouse on Github - Please fork!
Sat, 16 Jan 2010, 17:23
JL235	As promised, here is my Inselkampf bot. I have no idea if this code actually runs (i.e. if I was refactoring it last time I looked at it) or if it even still works with Inselkampf. It was only really a proof of concept; you can do far better just playing the game yourself. It consists of a Ruby script (the code): require 'socket' require 'open-uri' class Building attr :name attr :building_expression attr :building_link def initialize(name, building_expression, building_link) @name = name @building_expression = building_expression @building_link = building_link end def to_s @name end end class InselkampfBot HALF_HOUR= 6030 FILE_PATH = 'Inselkampf.dat' def initialize(username, password) @username = username @password = password @session_id = get_session_id unless @session_id raise 'Log-in failed: no session ID' end @gold_mine = Building.new( 'gold mine', /(Gold Mine).(<\/a>).(color:green).(<\/a>)/, 'https://213.203.194.123/us/1/index.php?s=' + @session_id + '&p=b1&a=order&id=b2' ) @stone_quarry = Building.new( 'stone quarry', /(Stone Quarry).(<\/a>).(color:green).(<\/a>)/, 'https://213.203.194.123/us/1/index.php?s=' + @session_id + '&p=b1&a=order&id=b3' ) @lumber_mill = Building.new( 'lumber mill', /(Lumber Mill).(<\/a>).(color:green).(<\/a>)/, 'https://213.203.194.123/us/1/index.php?s=' + @session_id + '&p=b1&a=order&id=b4' ) select_building run end def get_session_id tcp = TCPSocket.open('www.inselkampf.com',80) tcp.send("POST https://www.inselkampf.com/index.php?controller=sessions&action=create&player=#{@username}&password=#{@password}&world=1 HTTP/1.0 \n\n", 0) tcp.each do \|line\| line_match = (/index\.php\?s=/).match(line) return line_match.post_match.chomp if line_match end tcp.close end def get_page return open("https://213.203.194.123/us/1/index.php?s=#{@session_id}&p=b1").read end def run while true @session = get_session_id unless @session_id raise 'Log-in failed: no session ID' end build sleep(HALF_HOUR) end end def select_building begin file = File.open(FILE_PATH, 'r') rescue @building = @gold_mine else @building = case file.gets.chomp when /gold/ @gold_mine when /stone/ @stone_quarry when /lumber/ @lumber_mill else @gold_mine end file.close end end def build puts "at: #{Time.new}" @page = get_page if @page =~ @building.building_expression open(@building.building_link) puts "built: #{@building.name}" incriment_building else puts "did not build: #{@building.name}" end end def incriment_building @building = case @building when @gold_mine @stone_quarry when @stone_quarry @lumber_mill when @lumber_mill @gold_mine end file = File.new(FILE_PATH, 'w') file.print(@building.name) file.close end end InselkampfBot.new('username', 'password') --v and a '.dat' file (I think this was where I listed the order of stuff I wanted built). stone quarry --v That's it! \|edit\| Actually the original code and whole topic is still online here. \|edit\|
Sat, 16 Jan 2010, 17:39
flying_cucco	blitz web crawler/web server/database for the same game, lots of examples on interacting with the web.
Thu, 21 Jan 2010, 16:20
mindstorm8191	...okay, apparently I'm having some issues here. This query gives me a funny error: SeedRnd MilliSecs() Graphics 640,480,0,2 SetBuffer BackBuffer() stream = OpenTCPStream("https://www.travian.com", 80) WriteLine stream, "GET index.php HTTP/1.0"+Chr$(10) WriteLine stream, Chr$(10) While Not(Eof(stream)) Print ReadLine$(stream) Wend CloseTCPStream stream WaitKey:End --v I get results when connecting to Travian.com, but it only says Bad Request. If I connect to Google.com though, the stream doesn't get opened. This is essentially following the example provided in the Blitz help files. Does anyone know what I'm doing wrong here? -=-=- Vesuvius web game
Thu, 21 Jan 2010, 17:18
flying_cucco	Your example is correct, but the server did not accept it. Possibly because it couldn't work out the host name? stream = OpenTCPStream("https://www.travian.com", 80) WriteLine stream, "GET https://www.travian.com/index.php HTTP/1.0" WriteLine stream, "" --v This works
Fri, 22 Jan 2010, 02:51
Sticky	I've connected to websites with PuTTY sometimes so I can see what's being sent and you can do either of the following (when connecting to Google as an example) GET / HTTP/1.1 Host: https://www.google.com/ --v or GET https://www.google.com/ HTTP/1.1 --v I tend to use the "Host:" method. -=-=- last.fm
Sun, 24 Jan 2010, 17:48
mindstorm8191	Ah - that works. Thanks guys!