123
-=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- (c) WidthPadding Industries 1987 0|652|0 -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=-
Socoder -> Blitz -> Talking to websites

Sat, 09 Jan 2010, 16:15
mindstorm8191
Hey guys, I am interested in making a sort of bot for a certain website game. The problem is, though, that it uses cookies, and requires interaction by html forms. Does anyone know how to have Blitz / Blitz3D send form data, and store & send cookie data? I could do some testing on the cookies thing, but I'd first have to figure out the forms part.

-=-=-
Vesuvius web game
Sat, 09 Jan 2010, 16:38
HoboBen
Cookies are set/returned in the HTML header when you receive/request a web page. see wikipedia on HTTP cookies

Forms usually use a POST http request instead of a GET request. For an example, The POST Method

With the above links, you should be able to modify the BlitzGet code to do what you want.

View the HTML source on any form-using webpage for names of the variables you want to supply (look for input tags, and use the name/id - and send the HTTP request to the <form> target (the "action" attribute) ), and send any needed cookie data along in the HTML header.

Hope I was clear enough there, feel free to ask me to elaborate on specific parts.

-=-=-
blog | work | code | more code
Sat, 09 Jan 2010, 18:06
mindstorm8191
Thanks for the reply. I'm still not sure how to actually send and receive cookie data and the like with regular statements (such as connecting by blitz's TCP commands). The Wikipedia page was showing some stuff that would make sense, like "GET /spec.html HTTP/1.1", so would sending this kind of data work?

-=-=-
Vesuvius web game
Sat, 09 Jan 2010, 18:22
HoboBen
I'll code you up an example tomorrow (bed time!), but yeah, if you replace the GET in a BlitzGet request with POST you'd be on the right track - set up a PHP form handler on your own web server (use the PHP $_POST['variable_name'] superglobal) to debug properly.

-=-=-
blog | work | code | more code
Sun, 10 Jan 2010, 03:06
shroom_monk
A few useful things I used when doing communication with websites a while back, in case they're helpful:

This has all the various commands and stuff you can send to a server:
www.networksorcery.com/enp/protocol/http.htm

This code will download data from a website (based on some stuff from Blitz help):


I'm not too sure about cookies though, but I seem to recall a few people having some old IK bots lying around somewhere...

-=-=-
A mushroom a day keeps the doctor away...

Keep It Simple, Shroom!
Sun, 10 Jan 2010, 05:53
flying_cucco
Did someone say Inselkampf?!

{1/3} working with web servers

HTTP is all just messages made of lines of human readable text. You (the client) make a request of the server, and the server replies with the infotmation you want.

It is pretty easy to write and parse these messages with blitz, but before we get onto that, a brief description of how it works.

These messages are made of two parts, the header and the message body, or payload. The client's headers typically specify a resource (ie file) to request and the capabilities of the client. The server's headers contain meta-data about the capabilities of the server, the outcome of your request and the format of the message (ie file).

GET

The basic method of requesting files is to use GET, and HTTP version 1.0. This is the old version of HTTP, which needs fewer parameters specified in the header. Say you want the page at https://google.co.uk (First load it up in your browser and view the page source, that's how it will appear to the broswer).

How did the browser get that page? Like this (but probably more complicated using HTTP/1.1):

client browser opens a TCP/IP connection to www.google.co.uk on port 80
client requests the page using "GET", then specifing the resource (page) then giving the version of the protocol we are using (1.0)

client sends a blank line to signal the end of the request (all HTTP lines are terminated by <CR><LF> )

Codes

server will receive this and respond with a code, telling you the outcome of the request

then a bunch of headers then the actual resource.

If the page wasn't found, it might say 401 instead, or 302 if the page had moved to another server and so on. 200 is the good one.

Headers

Because we used a HTTP/1.0 request, we didn't have to include any headers, but the server still responded with some, most of which we will ignore
Headers are always the name of the header field, then a colon and a space, then the data.


The Content-Type: header is useful because it tells you what format the file is, be it an image or a web page.

Payload

Depending on the code (always for 200), after the headers (and a blank line as before) will be the file or page or whatever.
Sun, 10 Jan 2010, 06:13
flying_cucco
{2/3} Doing this from blitz

Go back to the GET section where we outlined how a browser gets a page, well we are going to do exactly the same!

The command to connect to a server using TCP is OpenTCPStream("url", port)


This creates a 'stream', similar to how blitz handles reading from/writing to files, except over the internet In fact we can use the same commands to get data as we would do from a local file.

Next to make the request we use the WriteLine stream, data. Don't forget to send a blank line to tell the server you are done.


So now the server will reply. We need to set up a loop and read all the data. ReadLine$(stream) will get the next line of data and Eof(stream) (End-Of-File) tells us when we are done to break the loop.

ReadLine is suitable for text data, like the headers and web pages, but you might want to use ReadByte for binary data, like images.

Now we are done, close the connection. If we had many files to get, we could actually make another request without closing, but for that we need HTTP/1.1

Sun, 10 Jan 2010, 06:40
flying_cucco


{3/3} Cookies

Cookies are small pieces of information the the server can store with the client. These are often used to track sessions and check that a user is logged in.

The server sets a cookie by using Set-Cookie: name=value in the headers, and then every time the client visits a page it will include the same cookie with a header as well Cookie: name=value

Example of how to read a session cookie


Why Mid$(temp$, 25, 32)? We only want the string of 32 characters, starting at character 25. A better/more general version would handle different length/named cookies; I'd forgotton how bad my IK code was!


Then in subsequent requests we can use that same cookie and the server will know we are logged on


Stuff after the semicolon controls how long the client should keep the cookie, what scope it should be included on and so on. You probably won't need that.

Forms

Forms are defined in the html page. Here is a simple example that includes two input fields.



When the form is submitted, a GET request is made of the resource (page) specified by action. The data in the fields (inputs) is appended to the url.

Example:


* The form starts with a ?
* each field is writen as name=value
* & separates fields

In blitz, you could substitute variables for the values.


The POST method is similar, except the form is sent in the body of the message.



Wireshark

Many servers can be quite picky about how they respond. The easiest way to get it working is to as much as possible ape a real browser. You can follow the exchange between client and server using a network protocol analyzer.

Wireshark

It is FTW.
Sun, 10 Jan 2010, 20:10
mindstorm8191
Hey, thanks for the detailed tutorial Cucco. But I still have questions on what data is sent and received. Lets see if I can explain what I understand:

client -> server


server -> client


client -> server


Am I on the right track here?


-=-=-
Vesuvius web game
Mon, 11 Jan 2010, 13:24
flying_cucco
Almost!

With POST there is a special format to encode the data, which I only touched on above. Basically you must put it in the format that could be used for URLs (web addresses).

* It is all on one line with no 'white space'
* Fields are stored as key pairs, with the name first, then a =, then the value
* Between each field is an &
* Spaces are replaced by +
* reserved characters can only be used if they are escaped with a % then the ascii code for that character (in hex)

user=danny1
pass=blahblah


becomes

user=danny1&pass=blahblah

Finally don't forget that HTTP needs a blank line between the header and the body of a message.

client->server (1)


Mon, 11 Jan 2010, 13:39
flying_cucco
This is the minimum a server might send (see below), this is just the headers, the output of page.php would follow.
server->client (2)


Cookies are part of the header, so come before the body:
client->server (3)


In the real world

You are much more likely to see this sort of thing as a response from a server:



Tue, 12 Jan 2010, 02:12
Afr0
Http Made Really Easy

Edit: If you're thinking about using the HTTP protocol for anything else than writing a custom web-browser, I'd drop it. The HTTP protocol is old, text-based (thus has alot of overhead) and generally messy.
Use a custom binary protocol instead.

-=-=-
Afr0 Games

Project Dollhouse on Github - Please fork!
Tue, 12 Jan 2010, 04:31
Jayenkai
Some people prefer to learn things.
Tue, 12 Jan 2010, 04:44
Afr0
What are you on about?!
He wanted to learn about the HTTP protocol, I gave him a link for it.
I just added a warning simply because some things are not worth learning unless you're aiming to do something very specific.

-=-=-
Afr0 Games

Project Dollhouse on Github - Please fork!
Tue, 12 Jan 2010, 11:54
Mog
Not to slam on you, Afr0, but why do you always suggest rolling your own crazy fandangled new method/file format/ algorithm/protocol? The link is good, but the blurb afterwards is not at all useful. Even though http is "old, text-based, and messy", everything uses it- Notice the HTTP for nearly all websites you stumble onto? He's trying to communicate data as if he were a browser, so it's only logical to talk in the same way a browser would with a server.

Mindstorm - Why don't you look into cURL? I'm very certain Blitzmax has a lib, but not so sure on Blitz3D, It's great for spoofing HTTP information and taking control of webpages.

-=-=-
I am Busy Mongoose - My Website

Dev PC: AMD 8150-FX, 16gb Ram, GeForce GTX 680 2gb

Current Project: Pyroxene
Tue, 12 Jan 2010, 12:00
flying_cucco
HTTP is:
  • very widely used
  • supported by lots of software
  • easy to learn
  • simple to implement
  • flexible
  • backwards compatible

Using free/cheap web hosting could work out much better than expensive dedicated servers for any 'custom binary protocol'. The scoreboards here are an example.
Tue, 12 Jan 2010, 12:05
Afr0
Notice I said for anything else than writing a custom web-browser.
I suppose I should have been more specific.

Specifically, the HTTP protocol should not be and wasn't designed to be used for (real-time) game(s), videostreaming, IM (although MSN Messenger uses a similarily bloated protocol) and/or large filetransfers (scales hideously bad on the serverside).
It also should not be used for transfering any personal information such as passwords unless the information has been encrypted beforehand and the server knows the key.
That isn't to say you can't do it, and that many people aren't doing it, but you shouldn't do it.


-=-=-
Afr0 Games

Project Dollhouse on Github - Please fork!
Tue, 12 Jan 2010, 13:14
JL235
Afro Specifically, the HTTP protocol should not be and wasn't designed to be used for (real-time) game(s), videostreaming, IM (although MSN Messenger uses a similarily bloated protocol) and/or large filetransfers (scales hideously bad on the serverside).

then it's a good thing that he requires it for...
mindstorm making a sort of bot for a certain website game.


Having said that I do now hate to do an Afro. As someone who has written a simple web-game bot I would recommend switching to a language that includes a regular expression library.

Why you ask? Your problem is a perfect example of where you might use regular expressions. You need to strip out specific bits of text from the (X)HTML you get back from the server. Regexes will make this job WAAAAAAAAAAAAAY simpler and shorter to code. A one line regular expression can easily be the equivalent of pages and pages of string-twiddling code.

Python, Ruby, PHP and Perl spring to mind but regular expressions are pretty common. They are supported by plenty of other languages. There is even a regex module for BMax.

Finally I can also try and dig-out my Inselkampf bot code next week when I'm back at uni.
Tue, 12 Jan 2010, 13:19
Afr0
Having said that I do now hate to do an Afro.


Thanks for that... [/sarcasm]

But yeah, I agree with JL235.

-=-=-
Afr0 Games

Project Dollhouse on Github - Please fork!
Sat, 16 Jan 2010, 17:23
JL235
As promised, here is my Inselkampf bot. I have no idea if this code actually runs (i.e. if I was refactoring it last time I looked at it) or if it even still works with Inselkampf.

It was only really a proof of concept; you can do far better just playing the game yourself.

It consists of a Ruby script (the code):

and a '.dat' file (I think this was where I listed the order of stuff I wanted built).


That's it!

|edit| Actually the original code and whole topic is still online here. |edit|
Sat, 16 Jan 2010, 17:39
flying_cucco
blitz web crawler/web server/database for the same game, lots of examples on interacting with the web.
Thu, 21 Jan 2010, 16:20
mindstorm8191
...okay, apparently I'm having some issues here. This query gives me a funny error:


I get results when connecting to Travian.com, but it only says Bad Request. If I connect to Google.com though, the stream doesn't get opened.

This is essentially following the example provided in the Blitz help files. Does anyone know what I'm doing wrong here?

-=-=-
Vesuvius web game
Thu, 21 Jan 2010, 17:18
flying_cucco
Your example is correct, but the server did not accept it. Possibly because it couldn't work out the host name?



This works
Fri, 22 Jan 2010, 02:51
Sticky
I've connected to websites with PuTTY sometimes so I can see what's being sent and you can do either of the following (when connecting to Google as an example)

or


I tend to use the "Host:" method.


-=-=-
last.fm
Sun, 24 Jan 2010, 17:48
mindstorm8191
Ah - that works. Thanks guys!