Latest Uploads
Extraterre ... .0.1 (zip)

dantheman363

Monty Teas ... Screenie 1

steve_ancell

Santa Clau ... ed his bag

waroffice

manic_platdude.png

spinal

Tetris Clone

steve_ancell

Super blues bros.

spinal

Forum Home

Talking to websites

UserMessage
Posted : Saturday, 09 January 2010, 16:15 | Permalink | Mark Here
mindstorm8191


Hey guys, I am interested in making a sort of bot for a certain website game. The problem is, though, that it uses cookies, and requires interaction by html forms. Does anyone know how to have Blitz / Blitz3D send form data, and store & send cookie data? I could do some testing on the cookies thing, but I'd first have to figure out the forms part.

-----
Vesuvius web game
Posted : Saturday, 09 January 2010, 16:38 | Permalink | Mark Here
HoboBen


WW Entries : 9
Cookies are set/returned in the HTML header when you receive/request a web page. see wikipedia on HTTP cookies

Forms usually use a POST http request instead of a GET request. For an example, The POST Method

With the above links, you should be able to modify the BlitzGet code to do what you want.

View the HTML source on any form-using webpage for names of the variables you want to supply (look for input tags, and use the name/id - and send the HTTP request to the <form> target (the "action" attribute) ), and send any needed cookie data along in the HTML header.

Hope I was clear enough there, feel free to ask me to elaborate on specific parts.

-----
github
Posted : Saturday, 09 January 2010, 18:06 | Permalink | Mark Here
mindstorm8191


Thanks for the reply. I'm still not sure how to actually send and receive cookie data and the like with regular statements (such as connecting by blitz's TCP commands). The Wikipedia page was showing some stuff that would make sense, like "GET /spec.html HTTP/1.1", so would sending this kind of data work?

-----
Vesuvius web game
Posted : Saturday, 09 January 2010, 18:22 | Permalink | Mark Here
HoboBen


WW Entries : 9
I'll code you up an example tomorrow (bed time!), but yeah, if you replace the GET in a BlitzGet request with POST you'd be on the right track - set up a PHP form handler on your own web server (use the PHP $_POST['variable_name'] superglobal) to debug properly.

-----
github
Posted : Sunday, 10 January 2010, 03:06 | Permalink | Mark Here
shroom_monk


WW Entries : 8
A few useful things I used when doing communication with websites a while back, in case they're helpful:

This has all the various commands and stuff you can send to a server:
www.networksorcery.com/enp/protocol/http.htm

This code will download data from a website (based on some stuff from Blitz help):
-->

I'm not too sure about cookies though, but I seem to recall a few people having some old IK bots lying around somewhere...

-----
A mushroom a day keeps the doctor away...

Keep It Simple, Shroom!
Posted : Sunday, 10 January 2010, 05:53 | Permalink | Mark Here
flying_cucco


WW Entries : 4
Did someone say Inselkampf?!

{1/3} working with web servers

HTTP is all just messages made of lines of human readable text. You (the client) make a request of the server, and the server replies with the infotmation you want.

It is pretty easy to write and parse these messages with blitz, but before we get onto that, a brief description of how it works.

These messages are made of two parts, the header and the message body, or payload. The client's headers typically specify a resource (ie file) to request and the capabilities of the client. The server's headers contain meta-data about the capabilities of the server, the outcome of your request and the format of the message (ie file).

GET

The basic method of requesting files is to use GET, and HTTP version 1.0. This is the old version of HTTP, which needs fewer parameters specified in the header. Say you want the page at http://google.co.uk (First load it up in your browser and view the page source, that's how it will appear to the broswer).

How did the browser get that page? Like this (but probably more complicated using HTTP/1.1):

client browser opens a TCP/IP connection to www.google.co.uk on port 80
client requests the page using "GET", then specifing the resource (page) then giving the version of the protocol we are using (1.0)
-->
client sends a blank line to signal the end of the request (all HTTP lines are terminated by <CR><LF> )

Codes

server will receive this and respond with a code, telling you the outcome of the request
-->
then a bunch of headers then the actual resource.

If the page wasn't found, it might say 401 instead, or 302 if the page had moved to another server and so on. 200 is the good one.

Headers

Because we used a HTTP/1.0 request, we didn't have to include any headers, but the server still responded with some, most of which we will ignore
Headers are always the name of the header field, then a colon and a space, then the data.
-->

The Content-Type: header is useful because it tells you what format the file is, be it an image or a web page.

Payload

Depending on the code (always for 200), after the headers (and a blank line as before) will be the file or page or whatever.
Posted : Sunday, 10 January 2010, 06:13 | Permalink | Mark Here
flying_cucco


WW Entries : 4
{2/3} Doing this from blitz

Go back to the GET section where we outlined how a browser gets a page, well we are going to do exactly the same!

The command to connect to a server using TCP is OpenTCPStream("url", port)
-->

This creates a 'stream', similar to how blitz handles reading from/writing to files, except over the internet In fact we can use the same commands to get data as we would do from a local file.

Next to make the request we use the WriteLine stream, data. Don't forget to send a blank line to tell the server you are done.
-->

So now the server will reply. We need to set up a loop and read all the data. ReadLine$(stream) will get the next line of data and Eof(stream) (End-Of-File) tells us when we are done to break the loop.
-->
ReadLine is suitable for text data, like the headers and web pages, but you might want to use ReadByte for binary data, like images.

Now we are done, close the connection. If we had many files to get, we could actually make another request without closing, but for that we need HTTP/1.1
-->
Posted : Sunday, 10 January 2010, 06:40 | Permalink | Mark Here
flying_cucco


WW Entries : 4


{3/3} Cookies

Cookies are small pieces of information the the server can store with the client. These are often used to track sessions and check that a user is logged in.

The server sets a cookie by using Set-Cookie: name=value in the headers, and then every time the client visits a page it will include the same cookie with a header as well Cookie: name=value

Example of how to read a session cookie
-->

Why Mid$(temp$, 25, 32)? We only want the string of 32 characters, starting at character 25. A better/more general version would handle different length/named cookies; I'd forgotton how bad my IK code was!
-->

Then in subsequent requests we can use that same cookie and the server will know we are logged on
-->

Stuff after the semicolon controls how long the client should keep the cookie, what scope it should be included on and so on. You probably won't need that.

Forms

Forms are defined in the html page. Here is a simple example that includes two input fields.

-->

When the form is submitted, a GET request is made of the resource (page) specified by action. The data in the fields (inputs) is appended to the url.

Example:
-->

* The form starts with a ?
* each field is writen as name=value
* & separates fields

In blitz, you could substitute variables for the values.
-->

The POST method is similar, except the form is sent in the body of the message.

-->

Wireshark

Many servers can be quite picky about how they respond. The easiest way to get it working is to as much as possible ape a real browser. You can follow the exchange between client and server using a network protocol analyzer.

Wireshark

It is FTW.
Posted : Sunday, 10 January 2010, 20:10 | Permalink | Mark Here
mindstorm8191


Hey, thanks for the detailed tutorial Cucco. But I still have questions on what data is sent and received. Lets see if I can explain what I understand:

client -> server
-->

server -> client
-->

client -> server
-->

Am I on the right track here?


-----
Vesuvius web game
Posted : Monday, 11 January 2010, 13:24 | Permalink | Mark Here
flying_cucco


WW Entries : 4
Almost!

With POST there is a special format to encode the data, which I only touched on above. Basically you must put it in the format that could be used for URLs (web addresses).

* It is all on one line with no 'white space'
* Fields are stored as key pairs, with the name first, then a =, then the value
* Between each field is an &
* Spaces are replaced by +
* reserved characters can only be used if they are escaped with a % then the ascii code for that character (in hex)

user=danny1
pass=blahblah


becomes

user=danny1&pass=blahblah

Finally don't forget that HTTP needs a blank line between the header and the body of a message.

client->server (1)
-->
Posted : Monday, 11 January 2010, 13:39 | Permalink | Mark Here
flying_cucco


WW Entries : 4
This is the minimum a server might send (see below), this is just the headers, the output of page.php would follow.
server->client (2)
-->

Cookies are part of the header, so come before the body:
client->server (3)
-->

In the real world

You are much more likely to see this sort of thing as a response from a server:

-->
Posted : Tuesday, 12 January 2010, 02:12 | Permalink | Mark Here
Afr0


WW Entries : 3
Http Made Really Easy

Edit: If you're thinking about using the HTTP protocol for anything else than writing a custom web-browser, I'd drop it. The HTTP protocol is old, text-based (thus has alot of overhead) and generally messy.
Use a custom binary protocol instead.

-----
Afr0 Games

Project Dollhouse on Github - Please fork!
Posted : Tuesday, 12 January 2010, 04:31 | Permalink | Mark Here
Jayenkai


Some people prefer to learn things.
Posted : Tuesday, 12 January 2010, 04:44 | Permalink | Mark Here
Afr0


WW Entries : 3
What are you on about?!
He wanted to learn about the HTTP protocol, I gave him a link for it.
I just added a warning simply because some things are not worth learning unless you're aiming to do something very specific.

-----
Afr0 Games

Project Dollhouse on Github - Please fork!
Posted : Tuesday, 12 January 2010, 11:54 | Permalink | Mark Here
Mog


Not to slam on you, Afr0, but why do you always suggest rolling your own crazy fandangled new method/file format/ algorithm/protocol? The link is good, but the blurb afterwards is not at all useful. Even though http is "old, text-based, and messy", everything uses it- Notice the HTTP for nearly all websites you stumble onto? He's trying to communicate data as if he were a browser, so it's only logical to talk in the same way a browser would with a server.

Mindstorm - Why don't you look into cURL? I'm very certain Blitzmax has a lib, but not so sure on Blitz3D, It's great for spoofing HTTP information and taking control of webpages.

-----
I am Busy Mongoose - My Website

Dev PC: AMD 8150-FX, 16gb Ram, GeForce GTX 680 2gb

Current Project: Pyroxene
Posted : Tuesday, 12 January 2010, 12:00 | Permalink | Mark Here
flying_cucco


WW Entries : 4
HTTP is:
  • very widely used
  • supported by lots of software
  • easy to learn
  • simple to implement
  • flexible
  • backwards compatible

Using free/cheap web hosting could work out much better than expensive dedicated servers for any 'custom binary protocol'. The scoreboards here are an example.
Posted : Tuesday, 12 January 2010, 12:05 | Permalink | Mark Here
Afr0


WW Entries : 3
Notice I said for anything else than writing a custom web-browser.
I suppose I should have been more specific.

Specifically, the HTTP protocol should not be and wasn't designed to be used for (real-time) game(s), videostreaming, IM (although MSN Messenger uses a similarily bloated protocol) and/or large filetransfers (scales hideously bad on the serverside).
It also should not be used for transfering any personal information such as passwords unless the information has been encrypted beforehand and the server knows the key.
That isn't to say you can't do it, and that many people aren't doing it, but you shouldn't do it.


-----
Afr0 Games

Project Dollhouse on Github - Please fork!
Posted : Tuesday, 12 January 2010, 13:14 | Permalink | Mark Here
JL235


WW Entries : 7
Afro Specifically, the HTTP protocol should not be and wasn't designed to be used for (real-time) game(s), videostreaming, IM (although MSN Messenger uses a similarily bloated protocol) and/or large filetransfers (scales hideously bad on the serverside).

then it's a good thing that he requires it for...
mindstorm making a sort of bot for a certain website game.


Having said that I do now hate to do an Afro. As someone who has written a simple web-game bot I would recommend switching to a language that includes a regular expression library.

Why you ask? Your problem is a perfect example of where you might use regular expressions. You need to strip out specific bits of text from the (X)HTML you get back from the server. Regexes will make this job WAAAAAAAAAAAAAY simpler and shorter to code. A one line regular expression can easily be the equivalent of pages and pages of string-twiddling code.

Python, Ruby, PHP and Perl spring to mind but regular expressions are pretty common. They are supported by plenty of other languages. There is even a regex module for BMax.

Finally I can also try and dig-out my Inselkampf bot code next week when I'm back at uni.

-----
PlayMyCode.com - build and play in your browser, Blog, Twitter.
Posted : Tuesday, 12 January 2010, 13:19 | Permalink | Mark Here
Afr0


WW Entries : 3
Having said that I do now hate to do an Afro.


Thanks for that... [/sarcasm]

But yeah, I agree with JL235.

-----
Afr0 Games

Project Dollhouse on Github - Please fork!
Posted : Saturday, 16 January 2010, 17:23 | Permalink | Mark Here
JL235


WW Entries : 7
As promised, here is my Inselkampf bot. I have no idea if this code actually runs (i.e. if I was refactoring it last time I looked at it) or if it even still works with Inselkampf.

It was only really a proof of concept; you can do far better just playing the game yourself.

It consists of a Ruby script (the code):
-->
and a '.dat' file (I think this was where I listed the order of stuff I wanted built).
-->

That's it!

|edit| Actually the original code and whole topic is still online here. |edit|

-----
PlayMyCode.com - build and play in your browser, Blog, Twitter.
Posted : Saturday, 16 January 2010, 17:39 | Permalink | Mark Here
flying_cucco


WW Entries : 4
blitz web crawler/web server/database for the same game, lots of examples on interacting with the web.
Posted : Thursday, 21 January 2010, 16:20 | Permalink | Mark Here
mindstorm8191


...okay, apparently I'm having some issues here. This query gives me a funny error:
-->

I get results when connecting to Travian.com, but it only says Bad Request. If I connect to Google.com though, the stream doesn't get opened.

This is essentially following the example provided in the Blitz help files. Does anyone know what I'm doing wrong here?

-----
Vesuvius web game
Posted : Thursday, 21 January 2010, 17:18 | Permalink | Mark Here
flying_cucco


WW Entries : 4
Your example is correct, but the server did not accept it. Possibly because it couldn't work out the host name?

-->

This works
Posted : Friday, 22 January 2010, 02:51 | Permalink | Mark Here
Sticky


I've connected to websites with PuTTY sometimes so I can see what's being sent and you can do either of the following (when connecting to Google as an example)
-->
or
-->

I tend to use the "Host:" method.


-----
last.fm
Posted : Sunday, 24 January 2010, 17:48 | Permalink | Mark Here
mindstorm8191


Ah - that works. Thanks guys!
Latest Posts
Professor Oak, Honest!!
spinal Wed 23:24
Consumer Lockout
spinal Wed 23:21
Position in Mind
steve_ancell Wed 20:11
Sonic Adventure v0.x
Jayenkai Wed 07:08
Coder's Block
Jayenkai Wed 06:03
FIRST!!!
steve_ancell Wed 03:55
Noel's Graduation
waroffice Wed 02:55
Monkey vs iOS tweaks
spinal Tue 22:51
Fucking Pound Sign Unicode Bullshit Bollocks
Dabz Tue 13:00
Progress / Location Bars
dna Tue 08:55
More

Latest Items
News : Newsletter #176
Jayenkai Sat 04:49
News : Newsletter #175
Dabz Tue 09:38
Blog : Snow: More Material Junk
Cower Sat 23:17
Dev-Diary : Mutant Monty: Amstrad CPC to Windows conversion
rockford Fri 13:14
Techy : AppleTV
Jayenkai Thu 09:40
Blog : Graphviz
steve_ancell Sat 14:17
Pets : Top-Down Shadow Hack
Jayenkai Tue 05:52
Snippet : JNKrunch v1.0
Jayenkai Sat 07:20
News : Newsletter #173
waroffice Fri 04:47
Blog : Material Loading
Cower Fri 02:08
Pets : I Done Won A Thing
shroom_monk Sun 11:31
Pets : Repurposing A Lexer
Cower Mon 22:06
Bah : Feeling a Little Angry
spinal Mon 11:26
News : Newsletter #170
Dabz Sat 00:34
Showcase : sbfgen
Cower Sat 16:57
More

Who's Online
shroom_monk
Thu, at 02:07
rockford
Thu, at 01:47
spinal
Thu, at 01:25
Cower
Thu, at 00:50
Evil Roy Ferguson
Wed, at 21:32
dna
Wed, at 21:11
steve_ancell
Wed, at 20:11
Afr0
Wed, at 18:04
CodersRule
Wed, at 18:00
HoboBen
Wed, at 16:21
Link to this page
Site : Jayenkai 2006-Infinity |
MudChat's origins, BBCode's former life, Image Scaler.